Publishing Modules in Mercurial
I've been doing a bit of work for a client, trying to build Puppet recipes to build and manage Rails installs (Rails, Mongrel, and Nginx). It's not exactly going smoothly, because I'm not terribly knowledgeable about any of these tools so I'm having to teach myself the tool and then teach myself how to manage it.
I'm still struggling with which version control system to use, and I've been basically behaving randomly about it. I've got a couple of modules in bzr, but for some reason I chose Mercurial for these. Given how small each module is -- they're maybe 5-7 files at most -- this isn't a very good way to figure out which SCM to use, but, well, at least I've got experience with Mercurial's stupid server tool.
This is where things got silly. Mercurial at least has a server-side "tool", but it's just a CGI, and, well, it's a crappy CGI. The docs are atrocious -- they explain how to set up your web server, and how to write the configuration file, but they recommend (apparently because SuSE has ridiculous configuration defaults) that you put your configuration file in the same directory as your CGI files. I'm not sure this is so much a security problem as it is just downright silly.
So, I modified the CGI to look in the same directory as my hg repositories. This isn't much better, but at least no one's going to accidentally attempt to execute my configuration file.
It's also silly that the CGI seems to want contact information and a description, but none of the initialization examples mention this information.
I wish one of these SCMs would just hurry up and win. Mark Shuttleworth has me kind of convinced on the importance of renaming; I recently ran into really annoying versioning problems because some moved a bunch of directories in a Subversion repository of mine, and because I had those subdirectories checked out (they were application configuration directories) I had to import manually, check it out again, and move it into place. I'm frankly not sure if bzr would have been much better, but I like to think it would.
I will say that I consider http-based serving to be a critical feature in an SCM. I don't want to have to give my developers real account access (which ssh requires), and I like how SVN makes it easy to control a given user's access to each repositry (I have a 115 line SVN auth file). This would probably be a tie-breaker for me.
In the meantime, I need to give git a spin. My new employee, Michael O'Brien, has been talking it up, and it certainly seems to get a lot more respect than the other tools (although that might just be because Linus wrote it and it's written in C instead of Python).
Expect to at least see more complaining in this space, I guess.
Tue, 17 Jul 2007 | Tags: tools, scm, hg, mercurial
Distributed Version Control
The question of distributed version control came up on the mailing list again when I said I wanted to switch Puppet's main development to one of the available tools, and Jason Kohles was concerned that it adds too much complexity without adding much functionality.
Because this comes up often, I figured it made more sense to turn my response into a blog post rather than a fleeting list message.
There are two primary problems I hope to solve by switching to a distributed SCM:
Pre-Branching
Probably the most important problem that SCMs solve in Puppet's current development environment is that it automatically pre-branches on every checkout, so you can just start developing and not have to worry about whether the work you're doing deserves a branch or not. I do a lot of work that should committed in multiple steps but can't be because it would leave trunk in an inconsistent state. Yes, I should create a branch in that case but I often don't realize that a branch is needed until I'm half done with the development, and Subversion just isn't set up to deal well with mid-development branching.
With a distributed SCM, every checkout is a branch, so I can make as many commits as I want and merge them in the end. This is great because it means that branching doesn't need to be planned out.
Lower Barrier of Development
Distributed SCMs also really encourage more development. With a centralized system, even if you have commit access you're going to tend not to experiment much because you will always be concerned about leaving the system in an unstable state. Of course, you could always make a branch, but if you just want to experiment it often doesn't seem worth it. Or, even worse, you'll begin an experiment, find out halfway through that it's worthwile, and then not be in a position to make a branch because you've done so many moves and copies.
Puppet development shouldn't require my permission just because the work spans more than one commit. If you think you can do great development, then you should be able to do it and just send me a diff. Even if you don't want to contribute the source back, or just not yet, it should be easy to develop over multiple commits.
I've even had this problem with people who have commit access -- they're working on a project that's big enough to span multiple commits but they forgot to branch initially or underestimated the amount of work involved. As a result, they are unwilling to commit until all of the work is done, which means not only that they're susceptible to data loss (one collaborator almost lost all of his work because of a hard drive failure) but that I can't easily get a mid-stream idea of what they're doing.
Other Reasons
Even if there weren't any other developers, I think I'd still want a distributed SCM. I seem to fly a lot these days (so much that American Airlines put me in their Gold club or whatever it is), and I find that I can usually get a lot of development work done on the plane. However, more than once I'm doing work that should span multiple commits or I've managed to destroy my checkout because I'm doing things that Subversion can't revert from a working copy, like moving directories around. Every time I'm on a plane, I really want a dSCM.
Further Reading
Mark Shuttleworth has had a few posts on this topic recently, and they're worth looking through, especially the post on merging.
Conclusion
For me, the real question comes down to what is the unit of development? Is it a commit, or does it often span multiple commits? I think it often spans multiple commits, so we need a tool that makes that easy.
Jason's original concern was over ease of use, but for trivial cases dSCMs are as easy to use as centralized tools -- you just check out, make your changes, and email a diff. The big benefit of the dSCM is that there is so much more room for complexity, even if you don't need it.