Acorns

Marcel's blog

Tracking CVS with Git using cvs2git

At work, we use CVS to manage code, but I want something better: Git.  The git-cvsimport tool can do efficient incremental updates from CVS into Git — just what you need if you want to work in Git while your team's primary VCS is CVS.  But git-cvsimport is based on cvsps, and cvsps is a dead project.  And worse still, cvsps segfaults on my employer's repository.  Enter cvs2git.

Based on cvs2svn, cvs2git is basically the same program with a special config file that tells cvs2svn to output Git's fast-import data format.  It can import an entire CVS repository or just a CVS submodule.  What matters most to me is that it can import my employer's repository without crashing.  But it always imports the entire history; there is no support for incremental imports.  So use git-cvsimport instead if it can grok your repository; it'll be easier.

If my team were ready to make the cut-over to Git right away, then I'd be golden: import once and ditch CVS.  But unfortunately we're waiting for a few technical prerequisites and then a good slot in our release schedule.  Meanwhile I want to be working in Git to get comfortable with it and because, well, CVS isn't as fast or flexible.

When I say "working in Git" I don't just mean doing one import, trying the basic edit-commit workflow, and then throwing away my changes.  I've done that as part of my evaluation of different VCSes.  I want to do real work that spans many days, incorporate concurrent changes from other developers, and give good commit messages time to pay off as last month's work fades from my memory.  I want to build a string of 40 ad-hoc commits and then squash them into coherent nuggets before getting peer review and publishing them.  And I want to encounter situations where my feature branches develop dependencies part way through and where my feature branches split.  And I want to work on projects that touch 30 files in interdependent ways.  In short, I want to find myself in the situations that drove me to look for something better than CVS, and where Git is said to excel.

To do this on a team of several developers, I need a way to incrementally import into Git regularly and export to CVS when I'm ready to publish my work to the team. The export is fairly straight forward using git-cvsexportcommit, so I'll focus on how I use cvs2git, a full-history import tool, to get incremental imports.

  1. Do a preliminary import
  2. Decide whether any branches need to be rewritten, map usernames to full names and emails, check text encoding, and other import options
  3. Script the import, and schedule with atd or cron or run on-demand
  4. Do the initial real import and clone to a working repository
  5. Do some work: branch, edit, commit, whatever
  6. Remove the last import and import again.  This takes four hours for me, and I usually schedule it at the end of a work day, after most of my co-workers will have made their last commit for the day.
  7. Incorporate the changes from the latest import into my working repository

In theory a git-pull would "just work" to incorporate changes from the latest import, but for me, for whatever reason, the commit IDs turn out different every time I import.  Maybe it's something about how cvs2svn makes fixup commits at the start of each branch.  Or maybe it's something about the RCS keywords ("$Id $", for example) that we have in some files.  Or maybe it's because people commonly add files to a branch retroactively.  Whatever the reason, commitIDs diverge very far back in history even for imports a couple days apart.  So I rebase onto.

Since I typically only need the latest changes on CVS trunk, not on every branch, I'll just show how to get those.

git fetch . +refs/remotes/origin/master:refs/remotes/origin/master.old
git fetch origin +master:refs/remotes/origin/master
git checkout my-feature-branch
git rebase --onto origin/master origin/master.old

These commands run from the working repository, so origin refers to the location of the last import.

This process almost always goes off without a hitch, unless I'm collaborating with someone, in which case the last step might put me through the usual rebase conflict resolution procedure.  Also typical of git-rebase, it automatically skips the patches I've already applied to CVS, even though they end up with a different timestamp, etc.

Since sometimes I go many days without committing back to CVS, I periodically export some patches as backups.  I could push back to the origin repository, which happens to be on a different server, but I overwrite it with each subsequent import.  Archiving patch series is much less resource intensive than archiving repositories, especially when your Git history is 0.5GB.  I use git-format-patch and have an editor shortcut to put all the patches from master to the current branch tip in a particular directory named after the current branch.  Usually my active branches are fairly independent, so this works well.

Since I sometimes need to pull in a change immediately without waiting for a 4 hour import, I sometimes convert CVS diffs into patches and git-apply them.  Here's my commandline filter for converting cvs diff -u into patch-compatible format.

I love using Git for my daily work, and frequently make use of features that I didn't expect, like git-rebase --interactive.  I've come to appreciate Git's preference to work with a whole project snapshot at a time rather than a single file at a time.  But most of all, I like never waiting for CVS and not having to hack around its slowness.

For a simpler way to track a project in CVS that doesn't give you the benefit of importing the full commit history but makes incremental updates much faster, check out this approach.

  1. Michael Haggerty says:

    This is a cool idea for simulating incremental imports using cvs2svn/cvs2git. I just added a link to this blog entry from the cvs2git documentation

  2. Isidor Zeuner says:

    I found your blog entry because I was also experiencing segmentation fault problems with CVSps. I wonder if you tried tracking down the problem. The other approaches for tracking CVS were not applicable for my purposes, so I was forced to look into the problem, but it turned out to be quite possible to fix.

Post a comment

Name or OpenID (required)


(lesstile enabled - surround code blocks with ---)