I’ve not been programming for all that long, but in that time I’ve used two different version control systems. In my internship I used Subversion (SVN), and in my personal life and open-source stuff I’ve used Git. I find the relationship between the various major VCSes to be quite interesting, and wanted to devote a little time to talking about them.
For the sake of fairness (and because I’m a Python hacker), I should point out that the vast majority of the things I say about Git apply to Mercurial. I have not used Mercurial very much at all, but my understanding is that the general usage model for Mercurial is more-or-less identical to Git. Certainly Mercurial and Git are both what is known as distributed version control systems.
Woah, woah. Distributed?
Oh, yeah, I should probably go back a few steps. So, the reason I want to compare Git and SVN is because the very different usage patterns the two tools create is fundamentally based upon a single ideological difference. This difference can be summarised as follows: in Git, no repository is special; in Subversion, the central repository is special.
Traditional version control systems like Subversion use a system that is a bit like a client-server model. You set up a central Subversion repository on a server somewhere, and check your initial code into it. If you want a copy of the codebase, you check out from that repository. At that point, you have a copy of the code on your machine, but you cannot treat your local repo as identical to the central repo.
Git (and Mercurial) differ in a single core idea: each repository is on equal footing with every other repository. I can find any repository of a codebase, clone it, and then have a completely functional repository on my machine. Any code check-in is local, into the repository on my machine; I can create and merge branches without affecting any other repository; and other people can clone the repository on my machine and merge into any other repository. This lack of central authority means that VCSes like Git are called distributed VCSes.
Ok. Does this difference matter?
For me, yes. I have a habit (potentially an unfortunate one) of using VCS as an indication of how ‘real’ my code is. This means that I tend to want to check my half-finished code into souce control, including code that isn’t passing tests, or even code that fails to build.
Subversion requires me to fight this tendency. Anything I check into source control must run properly, because anyone who wants a copy of the code will check out my changes. At my internship the build machines would check out a copy of the Subversion repository to run the build, so I would have had the added ignominy of breaking the build. This never happened to me, thankfully, but the embarrassment would be…significant.
Git does not force this behaviour on me. If I feel I’ve made progress towards a problem I can commit those changes into my local Git repository. This means that if I make another change that ruins everything, I won’t lose those changes. Once again, this has never happened to me, but it is the kind of nightmare that ferments inside my twisted brain.
Another advantage of the local nature of Git is that the repository history becomes flexible. As long as a change has not been pushed out to any other copy of the repository, I can edit that change to my heart’s content. This means that I can not only make half-baked commits into my Git repo, but when I have solved the problem I can edit the previous commits I made. The number of times I have done this is nearly beyond counting, and my Git log reads much more cleanly than it would otherwise.
Editing the repository history must be done with caution, of course.
git bisect
is a great tool, and it becomes significantly less useful if your
edited history makes certain commits fail tests. I hear that history editing
is a lot less common in the Mercurial community, but I can’t really speak from
experience.
Once I’ve become comfortable with my changes and edited the history so the whole thing looks like I’m a sane developer instead of a crazy person who has no idea what he’s doing, I can push the changes to some more public location. Once I’ve done this I need to resist the temptation to edit those commits, as if you do you end up making future merges hellish.
Oh yeah, merging.
Speaking of merging, anyone tried to do a merge in Subversion? Yeah, it’s not fun. I’ve worked with some very, very intelligent developers on Subversion repositories, and each one of them was apprehensive each time they had to perform a merge.
This difficulty with merging causes an unwillingness to branch in Subversion as well. Of course, institutional requirements mean that damn it, you will make feature branches, whether you like it or not; but in my experience these branches were used for large scale changes (e.g. new releases), not for individual features.
Conversely, with Git I’ll create branches for features large and small,
because doing so has essentially no cost associated with it. This has the
added advantage of ensuring that I always have a copy of my code that will
definitely, absolutely work: the master
branch.
So Subversion sucks. Why should it die?
Because it’s my belief that Subversion actively makes it difficult for developers to develop. Your VCS should encourage you to version your code, and to isolate your development. Subversion’s single repository nature discourages the checking in of your code until you’re absolutely confident of it, which causes SVN histories to be filled with enomous commits.
This reluctance to check in code means that the diffs between commits are
enormous, and then knowing which commit introduced a bug becomes almost
entirely unhelpful to the debugging process. Having debugged in just such a
circumstance, I can tell you that if I’d had git bisect
back then my life
would have been significantly cheerier.
Similarly, the difficulty merging that Subversion introduces causes developers to be reluctant to create feature branches for anything less than a really significant change. This causes all the developers in a group to commit their changes to the master branch. This wildly clutters up the history of the repository, but more importantly, can put the repository in a state where the default checkout fails to build.
Finally, you can’t commit when you’re unable to connect to the central repository! This is so helpful it’s almost impossible to describe to someone who hasn’t used it. Commits become fast and plentiful, and it becomes incredibly easy to work away from your central repository.
I would optimistically like to predict the eventual death of Subversion. Much like CVS is (almost) dead, Subversion will become increasingly less popular for new projects. This will push Subversion first into being predominantly used in industry (a state that is almost true already), and then into being used only in legacy projects (why hello there COBOL!).
Subversion has been a highly useful tool for developers that was a significant step forward from previous VCSes. However, the new wave of distributed VSEes promises to make everyone’s life easier, and so I for one will be happy to take Subversion out the back and shoot it in the head.