Tag Archives: Version control

Using Git with BitTorrent Sync

The various cloud sync services provide a good way to backup or remote a Git repository for ‘single developer’ situations. The main advantages are that the cost is minimal and no ‘server’ is required. In this post we use something quite different, the new BitTorrent Sync (BTSync) system.

Use Cases
1. Single developer on multiple devices, PC, laptop, mobile. Also see Single Developer Git Workflow, .
2. Easy, fast, and lightweight code sharing setup. See also Sneakernet with Git
3. Ad Hoc Version Control: How to do Ad Hoc Version Control, Ad Hoc Version Control With Git

BTSync is a serverless folder syncing system. Instead of using a remote server storage system, it creates a fast private peer-to-peer file sync system using a P2P protocol. Note it is not necessarily a replacement for a server, backup system, or even other services such as DropBox, more like a welcome addition that covers some limits that others may have, such as file size limitations, speed, and privacy.

Using this type of service is very easy. I took the easy way out and “forked” a very well written blog post by Sergei Shvetsov that did the same thing, only using DropBox. Using Git with Dropbox. In this post, however, I use BTSync and I am running on Windows (most blogs show examples on *nix). Of course, experienced Git users may approach this very differently. The following uses a console shell UI.

1. Create a local repo
2. Create a “bare” repo that lives in the Synced folders
3. Add the bare repo as the origin of the local repo.

Now on another system, the synced folder, which contains the bare repo is available as if it was created locally. During development or other uses, since we are using the working repo and only occasionally the ‘bare’ or origin repo in the synced folder, the synced folder is not constantly transferring data over the network to any other synced locations (there can be many).

This approach is diagrammed below. (On mobile device this ASCII diagram using <PRE> tag looks horrible.) Two systems with the two Git repos:

+-----------------+                     +----------------+
|     Local       |                     |     Local      |
|     Repo        |                     |     Repo       |
+-----------------+                     +----------------+
      ^ +                                    +   ^
      | |                                    |   |
      | | push/pull                          |   |  push/pull
      | |                                    |   |
      | |                                    +   |
      + v                                    v   +
+-----------------+         BT Sync     +----------------+
|      Bare       | <-----------------+ |      Bare      |
|      repo       | +------------------>|      repo      |
+-----------------+                     +----------------+

Of course whenever possible direct access to remote repos from a clone is preferred, or via a server. For private use and 5 users, Bitbucket offers free code repositories.

Below is a walkthrough of this process using a Windows cmd shell.

Create a local repository

C:\temp\git-on-BTSync>git init new-project
Initialized empty Git repository in C:/temp/git-on-BTSync/new-project/.git/

C:\temp\git-on-BTSync>cd new-project
C:\temp\git-on-BTSync\new-project>echo "" > README.txt

git add .
C:\temp\git-on-BTSync\new-project>git commit -m "Initial Commit"
[master (root-commit) dcb3f2b] Initial Commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.txt

Create a new ‘bare’ repo inside of local BTSynced folder

new-project>mkdir C:\Users\jbetancourt\BTSync\git
C:\new-project>git init --bare C:\Users\jbetancourt\BTSync\git\new-project.git
Initialized empty Git repository in C:/Users/jbetancourt/BTSync/git/new-project.git/

Add this new bare repo as upstream remote to the local repo

new-project>git remote add dropbox C:\Users\jbetancourt\BTSync\git\new-project.git

Push local changes to the bare repo

C:\temp\git-on-BTSync\new-project>git push -u dropbox master
Counting objects: 3, done.
Writing objects: 100% (3/3), 223 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To C:\Users\jbetancourt\BTSync\git\new-project.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from dropbox.

Use another system
On another system that is running BTSync, like a laptop, the bare repository folder is already synced. Now we can clone the repo.

cd \temp
mkdir remote-workspace
cd remote-workspace
git clone -o dropbox \Users\jbetancourt\BTSync\git\new-project.git
Cloning into 'new-project' ...
cd new-project
11/29/2013  10:48 AM      5 README.txt

Now make some changes
On the laptop we modify a file and commit it.

echo "Hello world" > README.txt
C:\temp\remote-workspace\new-project>type README.txt
"Hello world"
C:\temp\remote-workspace\new-project>git add README.txt

C:\temp\remote-workspace\new-project>git commit -m "changed readme"
[master 452b02e] changed readme
1 file changed, 1 insertion(+), 1 deletion(-)

Now push the changes to the local bare repo

C:\temp\remote-workspace\new-project>git push
Counting objects: 5, done.
Writing objects: 100% (3/3), 260 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To \Users\jbetancourt\BTSync\git\new-project.git
   dcb3f2b..452b02e  master -> master

Back to the original repo on the PC
We pull the changes that were automatically synced via BTSync into
the bare repo.

type README.txt
C:\temp\git-on-BTSync\new-project&gt;git pull
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From C:\Users\jbetancourt\BTSync\git\new-project
   dcb3f2b..452b02e  master     -> dropbox/master
Updating dcb3f2b..452b02e
 README.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

C:\temp\git-on-BTSync\new-project>type README.txt
"Hello world"

That was easy!

Issues with syncing git repos with BTSync?

  • BitTorrentSync has been updated and now the process of syncing folders is either easier or harder, depending on you viewpoint. There are also some limitation for the free product, like ten folder limit.
  • The machine having a source sync must be on to allow syncing, of course. Not true with a server based sync solution like Dropbox. This is only required while the local and remote folders are syncing of course. In many BTSync articles and blog posts the wrong impression is given that this is a continuous requirement. In fact, as soon as you see the sync complete, if you have the BTSync app visible you can shut down the source machine.
  • Damage is also synced: If one of the synced repos gets damaged, that damage is reproduced in all correlated syncs. This can be prevented by using BTsync’s read only share feature. This would introduce some limitations or other complexities.
  • Repository ignored files are synced
  • There was a discussion on whether the .git folder should be synced. Not sure I follow the rational.
  • I don’t know if there are any issues with BitTorrent Sync for long term work with a Git repo. People have complained of such issues with Dropbox. See the link for Mercurial use on DropBox below. In the comments of that blog post, robhamilton posts: “… found that it would break the Mercurial repo. Mercurial locks files and creates temp journal files which get sync’d by the dropbox daemon. My advice is to stop dropbox, perform your push/commit, then restart dropbox. Pulls and clones are readonly.” Is this an issue with Git? I don’t think so since we are using the bare repo approach.


    Mobile Git

  • Dec 24, 2013: I did not investigate the mobile Git use with BTSync as shown above. BTSync has a mobile app that allows the sync to mobile devices. On that device a mobile Git client can access the synced bare repo to clone into a mobile local repo. There are now a few mobile Git clients, for example, SGit.
  • June 26, 2015: Bittorrent Sync has an API: BitTorrent Gives Developers A Cloud-Free Alternative.

Some links

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Git does subdirectory globs

Finally more tools are starting to use this convention. I first saw its use in the Ant build framework. I even mentioned it to some Linux nerd on how to improve his utility and got an angry rebuff. Oh well.

From the Git v1.8.2 release notes:

* The patterns in .gitignore and .gitattributes files can have **/, as a pattern that matches 0 or more levels of subdirectory. E.g. “foo/**/bar” matches “bar” in “foo” itself or in a subdirectory of “foo”.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Git status not showing changes or new files?

In a git repository created using the –separate-git-dir option executing “git status” is not showing any changes. This was working correctly for weeks.

I created a new file in the top level folder, it doesn’t show as untracked. Tried “git status -u -v”. Nothing. Removed the .gitignore file. Nothing. Strange.

The separate directory is on a shared drive in Windows with a connected drive “k:”.
“git fsck” shows no issue, etc.

Work around
Anyway, what I did. I deleted the .git file. Then I reinited:

git init –separate-git-dir K:/where/it/is/at.git

Now it works.

OS: Windows 7, 64bit.
Git: 1.7.9.msysgit.0

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Easy stream parsing using Groovy, CVS example

You use every combination of options but that dam command won’t give you what you want?

I faced this last week at work. I had to get a list of my commits to CVS. I tried a bunch of stuff and also searched for a solution. None really worked well. An example of an approach is shown here: “how to search cvs comment history“.

The root problem is that the output of many tools are not always easily reusable. In this situation (and I’m sure in more modern tools like Subversion, Git, or Mercurial) the output resembles (I took out work related info):

RCS file: /cvs/A...
Working file: Java So..
head: 1.1
locks: strict
access list:
keyword substitution: kv
total revisions: 4;     selected revisions: 3
revision 1.1
date: 2011/03/  
filename: Produc...tsA
branches:  1....;
file Produ...
date: 201....
filename: ProductsA....;
date: 2011/0
filename: ProductsA....;
ExampleNightMare - ....

RCS file: /cvs/Am...
Working file: Java S..
head: 1.1
locks: strict
access list:
keyword substitution: kv
total revisions: 4;     selected revisions: 3
revision 1.1
date: 2011/03/  
filename: Pro...

This output goes on for thousands of lines! Sure if you use a tool often and dug into its idioms or have a guru near by, you could probably get what you want, but …. (of topic, but why don’t Man pages and other docs give examples for every option?).

There is no need to take out the dragon book and start writing a parser (is ‘parser’ the correct term in this context?), or even create a DSL. If your very familiar with real scripting languages like Python, Perl, or even pure shell utilities, this is easy. If your not, on Windows (and don’t use Powershell), or just as another approach, Groovy is easy to use.

The usual pattern I would imagine is to just just read the input and trigger on a start phrase to indicate a block of interest, then the data is captured when the including line is subsequently detected in the input stream. However, in my situation depicted above, I did the opposite, I got the data I needed, but only printed it out when I got a subsequent trigger phrase, the commit comment.

Sure you could generalize or find some tool that does this, but you’d probably spend more time learning the tool or creating a reusable system that only you need or understand.


// file: ParseCvsLog_1.groovy
// Author: jbetancourt

def inside = false
def workingFile

new BufferedReader(new InputStreamReader(System.in)).eachLine(){ s ->
	if(s.startsWith("Working file:")){
		inside = true
		workingFile = s.split("Working file:")[1] // got what I want?
	// this indicates that it is.
	def found = s ==~ /.*ExampleNightMare.*/
		println(workingFile)   // send to next pipe
		inside = false

Probably not a good example of idiomatic Groovy code, but easy to follow. A Groovy expert could probably do it on one line (I don’t like those smarty one-liners; one week later, you don’t know what you did.).

This is used as (all one line):

cvs inscrutable bunch of gibberish that doesn't answer question | groovy ParseCvsLog_1.groovy &gt; myChanges.txt

Nothing new in this post, of course. The value of any scripting approach is that it is infinitely adaptable. And, when the scripting language is easy to use, the results could even be reusable. Perl, Python, and Ruby, for example, have great facilities for sharing of snippets and modular code solutions. Groovy and other JVM based languages like Scala are beginning to add this capability to Java environments.


  • 20110323T1906-5: Cleaned up the sample code a little; don’t want to give the wrong impression.
  • 20110402T1702-5: While looking thru the book “Groovy In Action” noticed that section 13.5.3 Inspecting version control, deals with this subject.

Further Reading

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Why a Repository for Java Dev?

Excellent article on why a Repository Manager is crucial for software development process.  Makes the case that not using a Repository Manager is the cause of many anti-patterns. It could be that the Repository is the next essential besides the VCS in development best practices.

The article is using Maven as a case in point and also it is selling a product (nothing wrong with that) so perhaps one could be a little wary. However there are other dependency managers like Ivy which are used by other Build systems like Gradle available.

I have seen places that do not use a Software Configuration Managment (SCM) Version Control System (VCS).  And, then there are places that use a VCS incorrectly; as this article points out the VCS becomes an ad hoc file store for everything.   I remember one place where our partner gave us access to their SCM to download a project source, and we got everything!  They had application executables, utilities, documents, binaries and other things like their Office apps and other tool chains, which had nothing to do with the project we wanted the source of.  Someone must have accidentally  imported their whole PC into CVS, yikes.

Enter the Repository which, I believe, first became “popular” with the introduction of Maven.  When I tried to introduce use of an internal Repository into a former company I got push back:  “It’s very easy to just put one’s jars and dependent binaries into version control” or  “Who needs that Repository play toy stuff!”  Oh well. In that situation, it was probably the best decision, there is initial complexity in adopting any tool that aims to reduce complexity.

Is just using Mavan or Gradle with an internal Repository as proxy to external ones enough Repository Management (which Sonatype calls stage one: Proxying Remote Repositories) or does one have to use a full blown Repository Manager subsystem? Why does using the internal Repository for one’s own output destination require a Repository Manager (which Sonatype calls ‘stage two’)? The Maven site has this to say:

Aside from the benefits of mediating access to remote repositories, a repository manager also provides something essential to full adoption of Maven. Unless you expect every member of your organization to download and build every single internal project, you will want to provide a mechanism for developers and departments to share both SNAPSHOT and releases for internal project artifacts. A Maven repository manager provides your organization with such a deployment target. Once you install a Maven repository manager, you can start using Maven to deploy snapshots and releases to a custom repository managed by the repository manager. Over time, this central deployment point for internal projects becomes the fabric for collaboration between different development teams. — Repository Management with Maven Repository Managers.

If you don’t think this is important you probably have not been on a project where disasters like xxx.jar was sent to a customer and we don’t know what version it is and who made it. You know, using version numbers as part of binary files would defeat the purpose of using a VCS no?

On a side note: Why did the Java development community develop its own repository system when there were plenty out there such as the application-level package systems used by the Linux community?


Maven Repository Managers for the Enterprise

Why Do I Need a Repository Manager?: link

Maven Repository Manager Feature Matrix: link




Gradle:  http://www.gradle.org/

Ivy:  http://ant.apache.org/ivy/

Manage dependencies with Ivy

Maven:  http://maven.apache.org/

Ant:  http://ant.apache.org/

Continuous Integration:  http://en.wikipedia.org/wiki/Continuous_integration

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

How to do Ad Hoc Version Control

By Josef Betancourt,  Created 12/27/09    Posted 28 Mar 2010


You will modify a group of files to accomplish something.  For example, you’re trying to get a server or network operational again after some security requirements.   During this process you also need to keep track of changes so that you can back out of dead ends or combine different approaches.  Or you just simply need to temporarily keep revisions of a group of files for a task based project.


Version Control, Revision Control (RCS), Configuration Management (CM), Version Control System (VCS), Distributed Version Control System (DVCS), Mercurial, Git, Agile, Subversion

Lightweight Solution

An lightweight Ad-Hoc Version Control (AHVC) approach may be desirable.  Note that even when there are other solutions in place, a lightweight approach may still be desirable.  What are the requirements of a lightweight and workable solution?

  • Automated:  Thru human error a file or setting may not get versioned or even lost.  Thus, all changes must be tracked.
  • Small:  A large sprawling system that could not even fit on a thumb drive is too big.
  • Multiplatform:  It should be able to run on the major operating systems.
  • Non-intrusive:   Use of this system should not change the target systems in any way.  Ideally should run from a thumb drive or CD.  And, if there is a change, backing it out should be foolproof.
  • Simple:  Anything that requires training or complexity will not be used or adopted.  This reduces collaborative adoption and improvements in tools and process.
  • Fast:   Should be fast and optimized for local use.
  • Distributed:   Since issues can span “boxes”, it should be able to work with network resources.  This reduces the applicability of GUI based solution.
  • Scripting:  Should be easy to optimize patterns of use by creating high-level scripts.
  • Small load:  Of course, we don’t want to grab excessive CPU and memory resources from the target system.
  • Non-Admin:  Even in support situations, full admin access may not be available.
  • Transactional:  Especially in server configuration, changes should be consistent.  Failure to save or revert even one file could be disastrous.
  • Agile:  Not about tools but results.


At home when I create a folder to work on some files, like documents or programming projects, I will usually create a version control system right in the folder.  This has saved me a few times.  I also tried to do this at a prior professional assignment and was partially successful (will be discussed later).

I used a Distributed Version Control System (DVCS).  Since it does not require a centralized server or complicated setup to use, a DVCS meets most of the lightweight requirements.  Though, a VCS is usually used for collaborative management of changing source files it may be ideal here.  One popular use case is managing one’s /etc folder in Linux with a VCS.

Seems contradictory that a DVCS is great for local ad hoc use. But, that is just a misconception of the concept of a DVCS.


A good example of a DVCS is:


“(n) a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.” – http://mercurial.selenic.com/

Another is Git
” Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows. ”

Note:  I use Mercurial as the suggested system simply because I started with it.  Git and others are just as applicable.

To create a repository in a folder, one simply executes three commands init, add, and commit.  The “init” creates a subfolder that serves as the folder history or true repository.  The “add” is recursive, adding all the files to version control, and the “commit”, makes these changes permanent.  Of course, one can ‘add’ a subset of files and create directives for files to skip and so forth.



In a command shell

hg init
hg add
hg commit -m “initial commit of project”



The terminology may be a little confusing.  What happened is that now the GIZMO folder has a Mercurial repository which consists of a new .hg folder, and the other existing files and folders comprise the working directory (see Mercurial docs for a more accurate description).  There are no other changes!

That’s all it takes to create a repository.  No puzzling about storage, unique names, hierarchy, and all the details that goes with central servers.  The Mercurial docs show how to do the other required tasks, like go back to a previous changeset or retrieve file versions and so forth.  Here is how to view the list of files in a particular changeset:

c:Usersjbetancourt…ctsadhocVersioning>hg -v log -r 0

   changeset:   0:f29a0b0ad03c    user:        Josef Betancourt <josef.betancourt>l    date:        Sat Jun 21 10:53:11 2008 -0400    files:       AdhocVersioning.doc   description: first commit

And, here is a log output using the optional graph log extension (http://mercurial.selenic.com/wiki/GraphlogExtension)

c:Usersjbetancourt...adhocVersioning>hg glog -l 2
@  changeset:   9:25f4c55e4860
|  tag:         tip
|  user:        Josef <josef.betancourt>
|  date:        Fri Mar 26 22:43:56 2010 -0400
|  summary:     removed repo.bat
o  changeset:   8:43a33533c992
|  user:        Josef <josef.betancourt>
|  date:        Thu Mar 25 22:08:35 2010 -0400
|  summary:     removed old files

For the lone individual using ad hoc versioning a sample workflow is give at Learning Mercurial in Workflows.

Ad Hoc Sharing

A DVCS, true to its name, shines in how it allows Distributed versioning sharing of these local repositories.  Thus, when a team is working on a technical issue (ad hoc) it is very easy to share each others work. Mercurial includes an embedded web server that can be used for this.

Mercurial’s hg serve command is wonderfully suited to small, tight-knit, and fast-paced group environments.  It also provides a great way to get a feel for using Mercurial commands over a network.

This is illustrated with the coffee shop scenario, see manual.

A sprint or a hacking session in a coffee shop are the perfect places to use the hg serve command, since hg serve does not require any fancy server infrastructure … Then simply tell the person next to you that you’re running a server, send the URL to them in an instant message, and you immediately have a quick-turnaround way to work together. They can type your URL into their web browser and quickly review your changes; or they can pull a bugfix from you and verify it; or they can clone a branch containing a new feature and try it out.

Of course, this would not scale and is for “on-site” use between task focused group members.

A great workflow image by Leon Bambridge for team sharing.

Another simple scenario is taking a few file documents from one location to another with a flash drive (in lieu of using a Cloud storage service). Instead of doing a copy or cp one can simply create a DVCS repository at the work directory, then clone it on the flash drive. Then at home one pulls to the DVCS repository at home. When finished editing the files, one then pushes to the flash repo, and does the reverse process at the work site. Not only are you not missing any files, you are also keeping track of prior versions. Note, for security reasons, not everyone has unfettered web access or should they.

Revisiting the flash drive scenario above; if you plan to use a flash drive for transport multiple times and the group of files are large, using the “bundle/unbundle” hg commands are a good tool, see Communicating Changes on the Mercurial site.

Every connection must be secure and every file must be encrypted, especially if on flash drives. The security policies of the employer come first. Even if only for your own personal ad-hoc use, you should be careful with exposing your data.


  • Easy to use.The commands needed to perform normal tracking of changes are few and simple.  The conceptual model is also simple, especially if one is not fixated on use of centralized Version Control System.
  • Some file changes may be dependent on or result in other file changes.In a DVCS, commits or check-ins create a “changeset” in the local repository.  This naturally keeps track of related changes.
  • You may need to work on different operating systems.Mercurial runs on many systems including Windows.
  • You don’t want to change the existing system, low intrusion.  Mercurial can be deployed to a single folder, and the repositories it creates do not pollute the target folders.  For example, in the Subversion VCS, “.svn” folders are created in each subfolder in the target.  Not a drawback but complicates things down the line, such as when using file utilities and filters.


Unfortunately, the use of a DVCS is not perfect and has its own complexities.  For Mercurial, in the context of this post, these are handling binary files, versioning non-nested folders, and probably for any VCS is the semantic gap between the project task based view and the versioning mindset.

1. Binary Files

Mercurial is really for tracking non-binary files.  That way the advantages of versioning are realized.  Diffs and merges are not normally applied to Binary files. Further the size of binary files impact performance and storage when they reach a certain size.  Yet, for ad hoc use, binary files will have to be easily tracked.  Binary files could be images, libraries, jars, zips, documents, or data.

Large binaries are a problem with all VCS systems.  One author discussed a technique to allow Git to handle them in lieu of his continued use of Unison.  He said use Git’s “–shared” option:  git clone –shared /mnt/fileserver/stuff.git stuff

Note that Mercurial extensions exist to handle binary files.  One of these is the BigFiles extension.  In essence, BigFiles and other similar approaches, handle large binaries using versioned references to the actual binaries which are stored elsewhere.

Update Oct 29, 2011: Looks like Mercurial 2.0 will have a built-in extension for handling binary files, LargeFiles extension.

Another issue is that since binary files may not be diffed within the dvcs tool set.  In a DVCS one can set an external merge agent.   If one is not available, using the app that created the binary diff and merge is cumbersome.    For example, a Word doc is binary (even though internally it could be all XML) in effect.   Thus, a diff would not reveal a usable view.  One must ‘checkout’ particular revisions and then use Word to do a diff or just manually eyeballing it.  Same thing with a zip, jar, image, etc.

Update 02-02-2012: Some tools allow direct use of external tools to diff “binary” files. I think TortoiseSVN does this, allowing Microsoft Word, for example, to diff.

2. Non-nested target folders.

A scenario may involve the manipulation of folders that are not nested. For example, a business system employs two servers and changes must be made to both for a certain service to work, further, physically moving these folders or creating links is not possible or allowed. Mercurial, at this time, works on a single folder tree, and AFAIK there is no way to use symlinks or junctions to create a network folder graph, at least with my testing.  The ForestExtension or subrepositories experimental feature in Mercurial 1.3 do not qualify since they only enable the handling of a folder tree as multiple repositories.

Sure each folder tree in the graph can be managed, but if a particular change effects files in each tree, there is no easy way to transactionally version them into one changeset, though there are ways to share history between repositories (as in the ShareExtension).

A possible solution is to allow the use of indirect folders.  In Mercurial, work files and the actual repository, the .hg folder, are colocated.  Instead the repository can point to the target folders (containing the work files) to be versioned.  In this way multiple non-nested folders can be managed.  Note that this is not a retreat to the centralized VCS since the repository is still local and distributed using DVCS operations.   Below, the user has created a new Mercurial repository in folder “project”.  This creates the actual repo subdirectory “.hg”, and the indirect actual folders to be versioned are pointed to in a “repos” directive file or using actual symlinks.



repos ——> src_folder1

—–> src_folder2

—–> src_folderN

Whether this is useful, possible, or already planned for is unknown.

I mentioned this “limitation” on the Mercurial mailing list and was told that this is not a use case for a DVCS. There are many good reasons why all (?) VCS are focused on the single folder tree.

Update, 2011-08-31 T 11:37
Just learned that Git does have an interesting capability?

It is also possible to have a working tree where .git is a plain ASCII file containing gitdir: , i.e. the path to the real git repository

Though this doesn’t fulfill the non-nested project folders scenario, it does help Git be more applicable in an ad-hoc solution. For example, the repo could be located in a different storage location when the target folder is in a constrained unit.

3. Non-admin install

Updated 25 Aug 2010: In the requirements, non-admin install of the VCS was mentioned. This is where Mercurial fails, somewhat. The default install using the binary, at least on Windows, requires admin privileges. I got around this by first installing on another Windows system, then copying the install target folder to the PC I need to work on. This worked even when I installed on a Windows 7 Pro, and then copied to a Windows XP Pro. No problems yet. The Fossil DVCS does not have this problem.

4. Ignore Files

This is, perhaps, a minor issue. Mercurial, as most VCS do, allow one to filter the files that are versioned in the repo.
In Mercurial one creates an .hgignore file and within it, one can use glob or regular expression syntax to specify the ignore files. Well, this can be tricky. See newsgroup discussion that was started by this post. IMHO, having another syntax declaration that allows specification of directories and files explicitly is the way to go. How do other systems do this? Ant patternsets seem to be pretty usable.

5. Semantic Gap

There is a semantic gap when working on a maintenance problem and switching to the versioning viewpoint.   When versioning systems are routinely used, as in Software Development, this is not an issue, just part of the Software Development Lifecycle or best practice (amazing that some shops don’t use version control).   But, when one uses VC only occasionally as a last resort it’s another story.  QA, Support, and Project Managers, may not be comfortable with repositories, branches, tags, labels, pull, push, and so forth.

When I first tried to use Mercurial for Ad hoc service professionally it quickly lost some of its advantages as the task (fixing a system) reached panic levels (usually the case with customer support and deployment schedules) and simply creating and looking at commit messages failed to follow the workflow.  Manually tracking which tag or branch related to which event of system testing was cumbersome.  Further use would have eventually revealed the patterns of use that would have worked better, but that was a onetime experiment.

A partial solution, other than just getting more expert with the DVCS and better work patterns, is to implement a higher level Domain Specific Language (DSL) that hides the low level DVCS command line and repository centric view.  This could even have a GUI counterpart.  This is not the same as a GUI interface to the DVCS such as TortoiseHg or the Eclipse HG plugin.  What should that DSL  be and is it even possible or useful?

work flow Updates

June 26, 2011: git-flow, is an example of providing high-level operations to enable a specific work flow or model. Perhaps such an approach would be applicable in this AHVC requirements.

Sept 17, 2011: Mercurial Flow-Extension
implements the git-flow branching model.


Naive Approach

The usual approach is to just make copies of the effected folder or files that you will be changing.  You can use suffixes to distinguish them, such as gizmo.1.conf.   It’s very common to see (even in production!) files or folders with people’s initials, gizmo.jb.19.conf.

This gets out of hand very quickly, especially if you are multitasking or working as part of a team and may forget after a good lunch what file “gizmo.24.conf” solved.   This problem is compounded when you need to change multiple files, so for example, gizmo.jb.19.conf may depend on changes to “widget.22.conf”.   This also gets very chaotic when the files to change and track are in different folder trees or even storage system.  Most importantly this will not withstand the throw clay at the wall and see what sticks school of real world maintenance.

One method I’ve seen and used myself is to just clone each folder tree to be modified.  This gives an easy way to back out any changes.  This, alas, is also error prone, resource intensive, and may not be possible on large file sets or very constrained systems.

Client-Server VCS

A Traditional client-server VCS like Subversion can, of course, be used for Ad Hoc Versioning.   With Subversion one can run svnserve, its lightweight server, in daemon mode.  Then create a task based repository:

svnadmin create /var/svn/adHoc

And, import your tree of files:

svn import c:\Users\XXX\Documents\projects\adhocVersioning file:///var/svn/adHoc/project

Plus, Subversion supports offline use.  I think.  Have not used Subversion in a while.

Another effective Subversion feature is the use of local repositories using the “file:// protocol”.

Management consoles

Many systems are managed by various forms of management consoles, graphical user interfaces.  These are client or web based and may be part of the system or a third-party solution.  This is a big advantage from a hands-on administrative point of view.  However, from an automation and scripting viewpoint this is not optimal.  Thus, there is hopefully a API or tool based method of accessing the actual configuration files or databases of the system.  So in this sense, these systems are within the scope of this discussion.

This is not always the case.  One application server comes to mind that was (is?) so complex that there was no way to script it.  Thus, no way to automate the build and release process and versioning of the system.  Consequently, there was also no way to automate the various QA tests that were always panic driven and manually applied.

Managed Approach

The correct or more tractable method is to use a managed approach.  This is a software configuration and distribution system that is usually under the control of the IT staff, for example Microsoft’s System management Server (SMS) or System Center Essentials (SCE) for SMB.  Non-Microsoft solutions are of course available, such as those from IBM Tivoli’s product lineup.

Why is this not always the best approach?  There may be situations where a subset of a managed resource must be modified.  For example, you are a field service engineer and must travel or remotely connect to a client’s system to handle a service call.   This process may also entail making changes to other hosted apps and system configurations, such as network configurations.  Trying to get the IT department to collaborate or change the configuration or schedule of the managed solution may not be possible or timely.  In fact, this would be discouraged (rightly so) since it can wreak havoc on a production system.  Thus, even changing some resource may entail admin of multiple systems, not just a push of a few files and run of some batch files.  It could require interactive set and test.  Picture the Mars Rovers and how the OS reset problem was finally solved.

Closely related to the managed approach is to use a centralized version control system (VCS) or backup support.  Fortunately many operating systems have versioning capabilities built in or readily available.  For example, in the Windows platform one can make System Restore points or use the supported backup subsystems (as found in Windows 7 Professional).  Many *nix’s also have built-in versioning support in the form of installable Version Control Systems or differential backup.  In high-end virtualized systems there are facilities for backup or making snapshots and even transport of live systems.

While these work, there is a certain amount of complexity involved. Also there are issues using the same approach on multiple operating systems.  Another important drawback is that one cannot always modify the target system and, for example, install a VCS, even temporarily.  The common factor in these approaches is that there is a central “server” and associated repository for revision storage.  This is fine when available but not very conducive to ad hoc use.

Versioning File System

A VFS could be of some use.  As far as I know there are no popular file systems that support versioning (as used here).  Digital Equipment’s VAX system had single file versioning and now openVMS.  Microsoft’s Windows was supposed to have this in the future winfiles, but is no longer in the plan(?), though Windows 7 and current servers can allow access to previous file versions as a feature of its   system protection feature.  Plan 9 has a snapshot feature. ZFS has advanced stuff too and I would not be surprised if one can set a folder to be ‘versioned’.

However, a VFS would not help in task based versioning since as discussed previously, there may be a need to change multiple subsystems and track these changes as “change sets”.  Thus, a VFS is not a Revision Control System.

Of course, using a scripted solution (discussed next) in conjunction with a file change notification system (inotify), one could cobble together a folder based VCS.  However, this is outside of our lightweight requirements.

Scripted Solution

Of course, it should be possible, especially in *nix systems, to use the utilities available and construct a tool chain for a lightweight versioning support.  The rich scripting and excellent tools like rsync make this doable.  Some languages such as Perl or Python are ideal for gluing this together.

Yet, this is not optimal since these same tools will not work on all operating systems or require duplication.  For various reasons, for example, it is not always possible to install cygwin on Windows and make use of the excellent *nix utilities and scripting approach.  Likewise, it is not possible to use the outstanding Windows PowerShell in Linux.  This is only a problem of course if we are referring to empowering staff to work seamlessly on different OS or resources.  Having the same tools and workflow are valuable.

Another thing about this alternative is that a custom solution will eventually become or have functions of a version control system like Git, so why not just start with one?


One approach possible by the use of the aforementioned scripted solution is to create a snapshot system.  The DVCS gives us fine grained control of file revisions.  But, do we really need to diff and find out that a command in one batch file used the ‘-R’ option or just get the file with the desired option.  We would know which file we want using task based snapshots.  Before a task is begun, we initiate a snapshot.  This is analogous to the OS type of restore points, except we do this for a specific target set.

NoSQL Database

Finally, there have been alternatives to the Relational Database Management System (DBMS) for many years.  Most recently, this is the NoSQL group of projects such as CouchDB.    CouchDB claims that it is: “Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.”   Those features sound like something an ad hoc version control system should have.  Yet, CouchDB, all?, are document centric.  Still, worth pondering.


Presented were a few thoughts on an approach to ad hoc versioning.  A DVCS was proposed as a lightweight solution and some issues were mentioned.  Alternatives were looked at.  More research is required to evaluate proposal and determine best practices for the discussed scenarios.


7/15/10:  Changed “maintain” to “accomplish” in Scenario as per feedback from K. Grover.

7/23/10:  Forgot that I visited Ben Tsai’s blog where he discusses using Mercurial within an existing VCS such as Subversion, which I’ve also done, but not really the topic I discussed.

Further Reading

“HgInit: Ground up Mercurial”, http://hginit.com/01.html

Setting up for a Team

“Easy Automated Snapshot-Style Backups with Linux and Rsync” http://www.mikerubel.org/computers/rsync_snapshots/

Using Mercurial as ad-hoc local version control

“Intro to Distributed Version Control (Illustrated)”

Version Control, infrastructures.org

“The Risks of Distributed Version Control”,

Subversion Re-education

“Subverting your homedir, or keeping your life in svn”
http://kitenet.net/~joey/svnhome/ (He now uses Git)http://microseeds.com/blog/?p=95

Home directory version control notification

“Managing your web site with Mercurial”, Tim Post, http://echoreply.us/tuto/mercurial_site_management.html


Mercurial by example

Mercurial (hg) with Dropbox

Mercurial for Git users

Versioning File System

Agile Operations in the Enterprise
Michael Nygard, http://www.infoq.com/articles/agile-operations



Microsoft System Center Essentials

A utility that keeps track of changes to the etc configuration folder:

Version Control for Multiple Agile Teams


DVCS vs Subversion smackdown, round 3

“Using Mercurial as ad-hoc local version control”; Tsai, Ben;

Tracking /etc etc


For a more detailed exposition, see the mecurial tutorial:

The Hg manpage is available at:  http://www.selenic.com/mercurial/hg.1.html

There’s also a very useful FAQ that explains the terminology:

There’s also a good README:  http://www.selenic.com/mercurial/README

HG behind the scenes:


Mercurial Basic workflows

Mercurial BigFiles Extension

Mercurial LargeFiles Extension

Mercurial Subrepos: A past example revisited with a new technique

Mercurial(hg) Cheatsheet for Xen  http://xen.org/files/hg-cheatsheet.txt

A Guide to Branching in Mercurial


Nested Repositories


Tracking 3rd-party sources



Git as an alternative to unison

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.