Category Archives: cvs

Remove CVS folders from Git version control

Open a bash shell and just do:
$ find . -type d -name CVS -exec git rm -r –cached ‘{}’ ;

Is there an easier or more direct method just using Git commands?

Note the –cached option (there are two dashes there). This will ensure we don’t remove the CVS folders in the work area, only those in the index.

Note:
If your on Windows, the msysGit or Cygwin installs include a Bash shell. Git, though usable on Windoze, still has a lot of *nixisms.

But
before doing the above, make a backup or a backup clone of the project. Maybe even do a: git fsck

Check
if everything was done correctly. Do a git status and it should print a bunch of:
# deleted: some/path/to/CVS/cvs track file

and, a reminder to add the .gitignore and .cvsignore files that you finally created, right?

Then to make sure the folders and their content are in the work area do:
$ find -type d -name CVS -print

In a Windows shell this would be: dir /S CVS
but, I didn’t test that.

Update ignores
; the .gitignore file should contain a line: CVS/
Make sure there is no trailing space at the end of the line.

I did a: git add -n . | grep CVS
to make sure nothing was going to be added.

Or instead, add this line to the .git/info/exclude file.

Don’t forget to modify the .cvsignore file to ignore the .git folder.

Now do
git commit

But wait …

WARNING:
CVS keeps file tracking version information in CVS subfolders (Subversion uses SVN). These are in each subdirectory of your working files.

It is very important that only CVS manage these. Thus, using Git (or Mercurial, btw) should have valid ignore files set up.

Just the existence of these subfolders is a good reason to abandon these systems. They make the use of external tools, even search, even more complicated. You have to use filters for everything. To be fair, good tools already have these filters in place.

Why
would someone want to do all this? Well, if you forget to create a .gitignore file before you put all your CVS officially managed project files into Git. Of course, no one forgets to create the ignore file first.

Why
would you ever use Git parallel with CVS? Well, a centralized repository discourages checking in of code. You have to have perfect code before you commit stuff. Don’t even think about branching off a branch (twigs).

An aspect of a centralized VCS is:

When you check new code in, everybody else gets it.

Since all new code that you write has bugs, you have a choice.

You can check in buggy code and drive everyone else crazy, or
You can avoid checking it in until it’s fully debugged.

Subversion always gives you this horrible dilemma. Either the repository is full of bugs because it includes new code that was just written, or new code that was just written is not in the repository.

As Subversion users, we are so used to this dilemma that it’s hard to imagine it not existing.

— from HgInit

So, if you can’t use the VCS to check in your work what do you do? Make manual backups? If you say use the IDE to manage it, you can’t. For example, an IDE (in my experience) cannot do “undo” and “redo” across separate source files, thus the ‘undo’ can only get you so far. For small changes not really an issue. But, if your working on a subsystem or module, or a small team, this can be an issue.

An alternative is
to just add a DVCS to the Integrated Development Environment (IDE) to do true local versioning (maybe embedded Git?). That way I can just indicate to the IDE that I want to mark my current work as a milestone, for example, and it will know what to do. Now I can proceed and do all the fancy little versioning stuff, like branch, log, etc. It is just local.

I don’t even need to know about the arcana of the embedded VCS, it is just the IDE doing its thing. Further, higher-order functions like tasks, bug tracking, could also relate to the IDE’s repo, not only the centralized repository.

When it is finally debugged brilliant code, as usual, I just push to the ‘real’ local version control system, which could just be the same DVCS or a centralized one like SVN or CVS, and then commit as usual.

I bet some IDE already does this. Oh well.

Updates
5Sep11: Took a look at the IntelliJ site. Their IDE does allow directory history and ability to set a version labels. Nice.

System
Git: git version 1.7.3.1.msysgit.0
OS: Windows 7 64bit Professional
PC: AMD quad-core, 8GB ram.

Further Reading

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Hudson/Jenkins CI Server, can’t edit a job?

I was looking at a possible use of a Continuous Integration Server to quickly set up a build process.

Downloaded the Jenkins war file, put into Tomcat and defined a simple Job that invoked an Ant file to echo “Hello World!”. Cool, that was easy. But, then I wanted to expand that job to do more. Could not find a modify or edit capability. Huh? What’s up with that?

I searched and found very little. There was even some mention of using SED to edit the Job configuration XML, yeeech! Edit using a text tool for a tree-based data structure?

Anyway, not impressed. Of course, this was a quick tryout. Or maybe Linux people are so perfect they never have to edit their work. 🙂

Is Jenkins/Hudson just a pretty face on *nix utilities?

I looked at a few other CI Servers. So far Pulse and Team City look interesting, but they are not free.

Updates
Mar 2, 2012: Used one of the latest Jenkins version. Much much better! Though I’m having issues getting Active Directory authentication going. Can log in ok, but then it uses the wrong user “name” that our PCs must use. You know how LDAP has all this distinguished this or that.

Links

  1. Jenkins
  2. Active Directory Plugin
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

RoboCopy — ignore timestamp?

Apparently there is no way to make RoboCopy ignore file timestamps. Yes, there are ways to tweak the invocation to ignore two-second time differences and all that. But, come on, rsync has the “-I” switch:

Rsync has:
-I, --ignore-times
    Normally rsync will skip any files that are already the same size and have the same modification timestamp. This option turns off this "quick check" behavior, causing all files to be updated. 

Besides, is the file timestamp really part of the file? Nope, so tools should have the option of using it or not. Sure not using timestamps may make certain things take longer, but if data is really important, you’ll want to examine each byte of a file to see if you really have the “latest”.

What instigated the above rant: Dealing with deployment from CVS repo.

Further stuff

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Easy stream parsing using Groovy, CVS example

You use every combination of options but that dam command won’t give you what you want?

I faced this last week at work. I had to get a list of my commits to CVS. I tried a bunch of stuff and also searched for a solution. None really worked well. An example of an approach is shown here: “how to search cvs comment history“.

Problem
The root problem is that the output of many tools are not always easily reusable. In this situation (and I’m sure in more modern tools like Subversion, Git, or Mercurial) the output resembles (I took out work related info):

=============================================================================
RCS file: /cvs/A...
Working file: Java So..
head: 1.1
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 4;     selected revisions: 3
description:
----------------------------
revision 1.1
date: 2011/03/  
filename: Produc...tsA
branches:  1....;
file Produ...
----------------------------
revision 1.1.4.1
date: 201....
filename: ProductsA....;
AS.....
----------------------------
revision 1.1.2.1
date: 2011/0
filename: ProductsA....;
ExampleNightMare - ....
=============================================================================

RCS file: /cvs/Am...
Working file: Java S..
head: 1.1
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 4;     selected revisions: 3
description:
----------------------------
revision 1.1
date: 2011/03/  
filename: Pro...

This output goes on for thousands of lines! Sure if you use a tool often and dug into its idioms or have a guru near by, you could probably get what you want, but …. (of topic, but why don’t Man pages and other docs give examples for every option?).

Options
There is no need to take out the dragon book and start writing a parser (is ‘parser’ the correct term in this context?), or even create a DSL. If your very familiar with real scripting languages like Python, Perl, or even pure shell utilities, this is easy. If your not, on Windows (and don’t use Powershell), or just as another approach, Groovy is easy to use.

Algorthm
The usual pattern I would imagine is to just just read the input and trigger on a start phrase to indicate a block of interest, then the data is captured when the including line is subsequently detected in the input stream. However, in my situation depicted above, I did the opposite, I got the data I needed, but only printed it out when I got a subsequent trigger phrase, the commit comment.

Sure you could generalize or find some tool that does this, but you’d probably spend more time learning the tool or creating a reusable system that only you need or understand.

Example

// file: ParseCvsLog_1.groovy
// Author: jbetancourt

def inside = false
def workingFile

new BufferedReader(new InputStreamReader(System.in)).eachLine(){ s ->
	
	if(s.startsWith("Working file:")){
		inside = true
		workingFile = s.split("Working file:")[1] // got what I want?
	}
  
	// this indicates that it is.
	def found = s ==~ /.*ExampleNightMare.*/
	if(found){
		println(workingFile)   // send to next pipe
		inside = false
	}  
	
}

Probably not a good example of idiomatic Groovy code, but easy to follow. A Groovy expert could probably do it on one line (I don’t like those smarty one-liners; one week later, you don’t know what you did.).

Usage
This is used as (all one line):

cvs inscrutable bunch of gibberish that doesn't answer question | groovy ParseCvsLog_1.groovy > myChanges.txt

Conclusion
Nothing new in this post, of course. The value of any scripting approach is that it is infinitely adaptable. And, when the scripting language is easy to use, the results could even be reusable. Perl, Python, and Ruby, for example, have great facilities for sharing of snippets and modular code solutions. Groovy and other JVM based languages like Scala are beginning to add this capability to Java environments.

Updates

  • 20110323T1906-5: Cleaned up the sample code a little; don’t want to give the wrong impression.
  • 20110402T1702-5: While looking thru the book “Groovy In Action” noticed that section 13.5.3 Inspecting version control, deals with this subject.

Further Reading

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.