Puppet: System Administration Automated

Git, one month on


I've been using Git for about a month now. Overall, everyone has been right about it -- it's got some heinous usability problems, but man is it kick ass to have distributed version control.

For instance, I've taken a few trips since I switch to Git, and I've committed on an airplane at least twice now. This seems like a small thing, in that I could always wait to commit, but I'm often surprisingly productive in planes, and there are plenty of things you can't actually recover from in SVN without the full repository (e.g., moving directories around).

The cool things about Git don't all require its distributed aspect -- for instance, its branching is far superior ot SVN's (if you could say SVN even has branching). I found myself three commits into some work last week that really should have been a separate branch. With Git, this was really easy to do -- I branched from the current state, then rewound the current branch to remove the commits I didn't want in it.

I was in a branch named indirection, and I decided it made sense to make a new branch named configurations.

Using the git reset man page, this is what I did:

$ git branch configurations
$ git reset --hard HEAD~3
$ git checkout configurations

This left me in the new branch I wanted and left the indirections branch in the state it was at before I made the big changes.

It's clearly not all peaches and cream, though. As I mentioned, there are definite usability issues. It's not so much that you can't figure it but that it's just seldom what you expect. It doesn't help that the majority of the examples are from Linus's life, and his life is far more complicated than most, in terms of managing repositories.

The mechanism for pulling, fetching, and pushing branches is especially counterintuitive.

Overall, though, I'm very happy with it.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Sun, 23 Sep 2007 | Tags: , , , , ,


Linus on Git responding to KDE


Linus Torvalds posted a lengthy response to someone from the KDE community about using Git with KDE, and it's definitely worth a read:

Practically speaking, you'd generally have one or a few central repositories, yes. But no, it really doesn't have to be a single one. And I'm not just talking about mirroring (which is really easy with a distributed setup), I'm literally talking about things like some people wanting to use the "stable" tree, and not my tree at all, or the vendor trees.

And they are obviously connected, but it doesn't have to be a totally central notion at all.

Think of the git trees as people: some people are more "central" than others, but in the end, the kernel is actually fairly unusual (at least for a big project) in having just one person that is so much in the "center" that everybody knows about him.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 27 Aug 2007 | Tags: , , , , , ,


Giving Git a run-out


Something apparently snapped while I was at OSCON, and I apparently collapsed my distributed source control management quandary down to Git. I think in the end it doesn't matter all that much, since they're so similar in basic functionality, and I think I mostly got tired of sitting on the fence looking over but not being willing to commit to a specific dSCM.

Once I decided I'd go ahead with Git, my main priority was to get to the point where I could do my development on Puppet in it, which is especially important since it's the only real way for me to figure out if it will work for me, not that I really know what "work for me" means.

There are two crucial steps to testing an SCM for me: Getting Puppet's code into it, with as much history as possible, and making it available for others to have access to.

Getting the code was moderately easy, but made harder by the fact that when I first made my Subversion repository, when SVN was just starting to get popular, so I started without the typical branches/tags/trunk directory set. Here's the command I used in the end:

git svnimport -A ~/puppet-users -i -v http://reductivelabs.com/svn/puppet/ > /tmp/git.out

I tried git-svn, but it never got past revision 567 or so (which is when I switch to the popular directory structure). In addition, I was never able to actually get a working copy of the repository up to that point.

The puppet-users file contains a mapping from svn-style user names to email addresses:

luke = Luke Kanies <luke@domain.com>
lutter = David Lutterkort <dlutter@domain.com>
mpalmer = Matthew Palmer <mpalmer@domain.org>

I redirect output to a file, because it produces a bunch of output (I've got about 2800 revisions) and I don't actually care about any of it, and in addition, because I use iTerm, it takes a whole freaking cpu to scroll a terminal.

This basically worked, except that it started at revision 600 (arbitrarily close enough to the time when I changed the directory structure in the repository).

To make the repository shareable, I first just exported it via http, which was pretty easy, but then I was told I need to use git-server for performance reasons. I built a Puppet module to set it all up, and although the server doesn't work as well as I like (I really like SVN's auth file, which allows me to control who has access to the 32 repositories I maintain).

I'm getting some gritching from the Australians, and it's not like it's perfect, but at least I know I want something like that.

At the least, this has been a great experiment, and I figure we'll spend a week or so messing around with it. I'm not sure I can afford the time to experiment with all of the competitors; Matt's really pushing on darcs, but... I dunno, it seems niche, and at this point, I'm niche enough for all of us.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 07 Aug 2007 | Tags: , , , , ,


gitDisplay all 140 possibilities? (y or n)


I guess this is what people meant when they said git was "Unixy":

luke@phage(0) $ git
Display all 140 possibilities? (y or n)
git                     git-get-tar-commit-id   git-rebase
git-add                 git-grep                git-receive-pack
git-add--interactive    git-gui                 git-reflog
git-am                  git-hash-object         git-relink
git-annotate            git-http-fetch          git-remote
git-apply               git-http-push           git-repack
git-applymbox           git-imap-send           git-repo-config
git-applypatch          git-index-pack          git-request-pull
git-archimport          git-init                git-rerere
git-archive             git-init-db             git-reset
git-bisect              git-instaweb            git-rev-list
git-blame               git-local-fetch         git-rev-parse
git-branch              git-log                 git-revert
git-bundle              git-lost-found          git-rm
git-cat-file            git-ls-files            git-runstatus
git-check-ref-format    git-ls-remote           git-send-email
git-checkout            git-ls-tree             git-send-pack
git-checkout-index      git-mailinfo            git-sh-setup
git-cherry              git-mailsplit           git-shell
git-cherry-pick         git-merge               git-shortlog
git-citool              git-merge-base          git-show
git-clean               git-merge-file          git-show-branch
git-clone               git-merge-index         git-show-index
luke@phage(0) $ git

I think I'm going to be sick.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 17 Jul 2007 | Tags: , ,


Publishing Modules in Mercurial


I've been doing a bit of work for a client, trying to build Puppet recipes to build and manage Rails installs (Rails, Mongrel, and Nginx). It's not exactly going smoothly, because I'm not terribly knowledgeable about any of these tools so I'm having to teach myself the tool and then teach myself how to manage it.

I'm still struggling with which version control system to use, and I've been basically behaving randomly about it. I've got a couple of modules in bzr, but for some reason I chose Mercurial for these. Given how small each module is -- they're maybe 5-7 files at most -- this isn't a very good way to figure out which SCM to use, but, well, at least I've got experience with Mercurial's stupid server tool.

This is where things got silly. Mercurial at least has a server-side "tool", but it's just a CGI, and, well, it's a crappy CGI. The docs are atrocious -- they explain how to set up your web server, and how to write the configuration file, but they recommend (apparently because SuSE has ridiculous configuration defaults) that you put your configuration file in the same directory as your CGI files. I'm not sure this is so much a security problem as it is just downright silly.

So, I modified the CGI to look in the same directory as my hg repositories. This isn't much better, but at least no one's going to accidentally attempt to execute my configuration file.

It's also silly that the CGI seems to want contact information and a description, but none of the initialization examples mention this information.

I wish one of these SCMs would just hurry up and win. Mark Shuttleworth has me kind of convinced on the importance of renaming; I recently ran into really annoying versioning problems because some moved a bunch of directories in a Subversion repository of mine, and because I had those subdirectories checked out (they were application configuration directories) I had to import manually, check it out again, and move it into place. I'm frankly not sure if bzr would have been much better, but I like to think it would.

I will say that I consider http-based serving to be a critical feature in an SCM. I don't want to have to give my developers real account access (which ssh requires), and I like how SVN makes it easy to control a given user's access to each repositry (I have a 115 line SVN auth file). This would probably be a tie-breaker for me.

In the meantime, I need to give git a spin. My new employee, Michael O'Brien, has been talking it up, and it certainly seems to get a lot more respect than the other tools (although that might just be because Linus wrote it and it's written in C instead of Python).

Expect to at least see more complaining in this space, I guess.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 17 Jul 2007 | Tags: , , ,


Distributed Version Control


The question of distributed version control came up on the mailing list again when I said I wanted to switch Puppet's main development to one of the available tools, and Jason Kohles was concerned that it adds too much complexity without adding much functionality.

Because this comes up often, I figured it made more sense to turn my response into a blog post rather than a fleeting list message.

There are two primary problems I hope to solve by switching to a distributed SCM:

Pre-Branching

Probably the most important problem that SCMs solve in Puppet's current development environment is that it automatically pre-branches on every checkout, so you can just start developing and not have to worry about whether the work you're doing deserves a branch or not. I do a lot of work that should committed in multiple steps but can't be because it would leave trunk in an inconsistent state. Yes, I should create a branch in that case but I often don't realize that a branch is needed until I'm half done with the development, and Subversion just isn't set up to deal well with mid-development branching.

With a distributed SCM, every checkout is a branch, so I can make as many commits as I want and merge them in the end. This is great because it means that branching doesn't need to be planned out.

Lower Barrier of Development

Distributed SCMs also really encourage more development. With a centralized system, even if you have commit access you're going to tend not to experiment much because you will always be concerned about leaving the system in an unstable state. Of course, you could always make a branch, but if you just want to experiment it often doesn't seem worth it. Or, even worse, you'll begin an experiment, find out halfway through that it's worthwile, and then not be in a position to make a branch because you've done so many moves and copies.

Puppet development shouldn't require my permission just because the work spans more than one commit. If you think you can do great development, then you should be able to do it and just send me a diff. Even if you don't want to contribute the source back, or just not yet, it should be easy to develop over multiple commits.

I've even had this problem with people who have commit access -- they're working on a project that's big enough to span multiple commits but they forgot to branch initially or underestimated the amount of work involved. As a result, they are unwilling to commit until all of the work is done, which means not only that they're susceptible to data loss (one collaborator almost lost all of his work because of a hard drive failure) but that I can't easily get a mid-stream idea of what they're doing.

Other Reasons

Even if there weren't any other developers, I think I'd still want a distributed SCM. I seem to fly a lot these days (so much that American Airlines put me in their Gold club or whatever it is), and I find that I can usually get a lot of development work done on the plane. However, more than once I'm doing work that should span multiple commits or I've managed to destroy my checkout because I'm doing things that Subversion can't revert from a working copy, like moving directories around. Every time I'm on a plane, I really want a dSCM.

Further Reading

Mark Shuttleworth has had a few posts on this topic recently, and they're worth looking through, especially the post on merging.

Conclusion

For me, the real question comes down to what is the unit of development? Is it a commit, or does it often span multiple commits? I think it often spans multiple commits, so we need a tool that makes that easy.

Jason's original concern was over ease of use, but for trivial cases dSCMs are as easy to use as centralized tools -- you just check out, make your changes, and email a diff. The big benefit of the dSCM is that there is so much more room for complexity, even if you don't need it.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 09 Jul 2007 | Tags: , , , , , , ,