Easy stream parsing using Groovy, CVS example

You use every combination of options but that dam command won’t give you what you want?

I faced this last week at work. I had to get a list of my commits to CVS. I tried a bunch of stuff and also searched for a solution. None really worked well. An example of an approach is shown here: “how to search cvs comment history“.

Problem
The root problem is that the output of many tools are not always easily reusable. In this situation (and I’m sure in more modern tools like Subversion, Git, or Mercurial) the output resembles (I took out work related info):

=============================================================================
RCS file: /cvs/A...
Working file: Java So..
head: 1.1
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 4;     selected revisions: 3
description:
----------------------------
revision 1.1
date: 2011/03/  
filename: Produc...tsA
branches:  1....;
file Produ...
----------------------------
revision 1.1.4.1
date: 201....
filename: ProductsA....;
AS.....
----------------------------
revision 1.1.2.1
date: 2011/0
filename: ProductsA....;
ExampleNightMare - ....
=============================================================================

RCS file: /cvs/Am...
Working file: Java S..
head: 1.1
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 4;     selected revisions: 3
description:
----------------------------
revision 1.1
date: 2011/03/  
filename: Pro...

This output goes on for thousands of lines! Sure if you use a tool often and dug into its idioms or have a guru near by, you could probably get what you want, but …. (of topic, but why don’t Man pages and other docs give examples for every option?).

Options
There is no need to take out the dragon book and start writing a parser (is ‘parser’ the correct term in this context?), or even create a DSL. If your very familiar with real scripting languages like Python, Perl, or even pure shell utilities, this is easy. If your not, on Windows (and don’t use Powershell), or just as another approach, Groovy is easy to use.

Algorthm
The usual pattern I would imagine is to just just read the input and trigger on a start phrase to indicate a block of interest, then the data is captured when the including line is subsequently detected in the input stream. However, in my situation depicted above, I did the opposite, I got the data I needed, but only printed it out when I got a subsequent trigger phrase, the commit comment.

Sure you could generalize or find some tool that does this, but you’d probably spend more time learning the tool or creating a reusable system that only you need or understand.

Example

// file: ParseCvsLog_1.groovy
// Author: jbetancourt

def inside = false
def workingFile

new BufferedReader(new InputStreamReader(System.in)).eachLine(){ s ->
	
	if(s.startsWith("Working file:")){
		inside = true
		workingFile = s.split("Working file:")[1] // got what I want?
	}
  
	// this indicates that it is.
	def found = s ==~ /.*ExampleNightMare.*/
	if(found){
		println(workingFile)   // send to next pipe
		inside = false
	}  
	
}

Probably not a good example of idiomatic Groovy code, but easy to follow. A Groovy expert could probably do it on one line (I don’t like those smarty one-liners; one week later, you don’t know what you did.).

Usage
This is used as (all one line):

cvs inscrutable bunch of gibberish that doesn't answer question | groovy ParseCvsLog_1.groovy > myChanges.txt

Conclusion
Nothing new in this post, of course. The value of any scripting approach is that it is infinitely adaptable. And, when the scripting language is easy to use, the results could even be reusable. Perl, Python, and Ruby, for example, have great facilities for sharing of snippets and modular code solutions. Groovy and other JVM based languages like Scala are beginning to add this capability to Java environments.

Updates

  • 20110323T1906-5: Cleaned up the sample code a little; don’t want to give the wrong impression.
  • 20110402T1702-5: While looking thru the book “Groovy In Action” noticed that section 13.5.3 Inspecting version control, deals with this subject.

Further Reading

Similar Posts:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

4 thoughts on “Easy stream parsing using Groovy, CVS example”

  1. More and more I also do my shell scripting stuff (like this) in Groovy and love it. Don’t forget if you want lots of useful classes and utilities (such as Apache Commons stuff) you can @Grab them. Great stuff.

    1. Yes, that @Grab support is excellent. The only problem I had is that, when I tried it, it would not work in the Groovy Eclipse plugin. I’ll have to try again one day.

Leave a Reply

Your email address will not be published. Required fields are marked *