Puppet: System Administration Automated

Caching and REST


(by Luke)

One of the things I was supposed to write about last week was how I'm rethinking some of Puppet's internal caching. This rethinking is a direct result of listening to ThoughtWork's IT Matters Podcast on REST (I've only listened to part 1 so far). I actually listened to the episode three times, because it's only about 20 minutes and I listened to it on a 60 minute bike ride, which worked well because it was so windy that day that I didn't hear the whole thing any of those listenings.

I'll hopefully write later about how this podcast made me rethink how environments are used in fileserving, but for now, I'm going to focus on caching.

Indirection

For a couple of months now, Puppet has had an Indirector module that is basically useful for connecting classes with collections of instances of those classes. The only reason you'd really even bother to use it is if you had multiple collections, and needed to interact with different collections at different times, but you wanted those differences to be transparent.

For instance, when retrieving node information, you just call this code:

Puppet::Node.find("mynode")

Somewhere else, you'll have configured which collection (the word I'm currently using is terminus) this uses, and the Indirector just delegates the find call to the right collection. For nodes, you might be using the exec collection, which calls an external script, turns the resulting YAML into a Node instance, and returns it (or returns nil if nothing was found).

I think the Indirector is pretty cool, and it's certainly simplified a lot of my modeling of interacting with different sources of information. Those who are familiar with REST, at least how it's usually done in the Ruby world, will recognize the find as one of the methods usually used for REST interfaces -- it's mapped to the HTTP verb get. One of the primary design goals of the Indirector was to facilitate REST interfaces, so the methods we're indirecting are, not coincidentally, exactly the methods you'd implement for REST support.

Caching

One of the later additions to the indirection code was support for cache collections. That is, you might have a canonical collection, and then a cache collection for speed or proximity purposes. Following our Node example above, if you were using the exec collection, you'd probably want to have the results cached in the yaml collection, so they were inexpensive to retrieve.

The critical question with any caching system is how to know when the cache is dirty. How do you know if you should use the cached node information or go back to the source?

I expect there are as many answers to this question as there are caching implementations, just about. I had never implemented a caching solution before, and I probably misinterpreted my discussions with Rick Bradley, because I ended choosing a not-very-good system. The current cache invalidation mechanism is based on relative versions: If the version of the cached object is older than the version of the object in the other collection, then your cache is dirty.

What is a version? Well, normally it's just the timestamp of when the instance was created. This might work okay for some systems, but in general, the timestamp ends up being pretty useless. Look at our Node example -- the timestamp of the exec collection is always later, because we retrieve the cache version, then generate a new node using the exec collection, and compare. Duh. The answer's always the same.

Even worse, in most situations the cache doesn't save you any work, because you're pulling fresh data from the original source. If we have to re-execute the external node script to get the latest node version, we haven't saved any effort at all, we've just added a bunch of useless work, which is stupid.

Puppet 0.24.4 "fixed" this problem by saying that the cached node's version was the timestamp of the node's Facts cache. If the facts are updated, then the cache needs to be updated. This seems to mostly work, but it feels like a hack for something that should be easy.

TTL

So, on to the podcast. It was a good podcast in general, and they focused a good bit on caching. At first I found this pretty strange -- why is caching an important design criterion? As they talked, though, I realized that a generalized, simple caching model is useful a lot more places than I would expect, including in Puppet.

There didn't seem to be any disagreement over the best way to handle knowing when a cache is dirty -- they apparently just use time-to-live (TTL) or expiration headers. I think it was the second time listening through that I realized that the vast majority of my caching problems could be fixed with this.

Puppet has a natural TTL for most of its information -- every host runs every half an hour, so if you set a TTL of half an hour (or whatever you're run interval is), then you'll get fresh data once a run, and cached data the rest of the time. In the above Node scenario, the exec collection would set the TTL of the node (so that your external node app could pick its own TTL), or Puppet would have a default TTL equal to the run interval. Then, when Puppet goes to check whether its cache was dirty, it could just compare the TTL against the current time -- no need to hit both collections, and no arbitrary definition of "version".

This actually makes even more sense with the current problem I'm trying to solve. I'm trying to remodel the SSL certificate signing process, and it's gotten pretty messy. With this, though, you just set the TTL of the certificate to its own internal TTL, and you use the local system as the cache the CA server as the ultimate source. If there is a local cert and it's still valid, use it; if there's a local cert but we're past its TTL, then discard it and get a fresh cert; if there's no cert, then get one from the server and cache it locally.

Next Steps

I don't have the whole thing figured out mentally yet, but I'm pretty close. At the least, the next step is to replace the current broken version-based cache with ttl-based caching. The two things I most need to resolve are:

Obviously, these two things are linked -- the user needs a complete configuration path from the command line or configuration file to the bit that actually sets the ttl.

For now, fortunately, I don't need to worry about it, because I can just stick with the run interval as the TTL for essentially everything I'm doing. As things get more interesting, though, we're going to want to configure these values, because....

TTL Can Help Provide Change Control

One of my primary goals in moving the catalog compiling process to REST is to enable a decoupling between compiling and applying. In other words, I want people to be able to apply a configuration without recompiling.

Imagine a configuration TTL of a week -- every host recompiles its configuration during some specific maintenance window, like Sunday morning between 2 and 6 am. They still apply their configurations every half an hour, but that's normally just validating that nothing has drifted.

Obviously, this wouldn't be used by most shops -- most people would still want all hosts to recompile every time. But for those shops that are highly worried about change control, or those who want to do rolling upgrades, where they upgrade 10% of a pool of servers at a time, this would help a lot. You take your pool of servers, trigger a recompile on 10%, and once you're confident they're working, you trigger a recompile on another 10%, and so on.

Once you can do that with Puppet, it'll feel almost enterprisey. :)

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Wed, 02 Apr 2008 | Tags: , , , , , , , ,


Photo Included in "Schmap Scotland Second Edition"


I'm way behind on posting, and you can expect to see a flurry of posts in the coming days. Or at least, a relative flurry.

In the meantime, here's a quickie. I traveled to Edinburgh, Scotland in April of2005 for a multi-day workshop on Configuration Management, organized by Paul Anderson, the author of LCFG. This is an important workshop in Puppet's history, because I had just that month decided to go full time on Puppet, in large part because of my frustration with meetings like this -- lots of talk, lots of agreement that things needed to change, but mostly inaction (along with a lot of people saying my whole Resource-based approach was silly and uninteresting).

The primary upsides of the workshop for me were that I was able to spend a good number of hours picking the brain of Andrew Hume, who is wicked smart and not afraid to tell you you're being a dumbass, and I was also able to borrow a bike from another attendee, Kevin Cambpell (I can't seem to find a good page for him on line) and go for a relatively wide-ranging tour around Edinburgh.

So, strangely (and completely unrelated to Puppet), I got a request to have one of my photos from that trip included in some kind of annual photo collection from Scotland. Naturally I said yes. It's not my favorite photo in that set by any means, but hey, it's my first published photo so I'm proud.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 31 Mar 2008 | Tags: , , ,


Self-description


I seem to find myself having to describe myself a bit too much these days, and I can't seem to reuse any of the biographies. I just recently wrote a bio for ;login: magazine, because I wrote a version-control-for-sysadmins article for them:

Luke Kanies runs Reductive Labs (http://reductivelabs.com), a startup producing OSS software for centralized, automated server administration. He has been a Unix sysadmin for nine years and has published multiple articles on Unix tools and best practices.

Then I described myself in an email about looking for a partner:

I am the founder of Reductive Labs (http://reductivelabs.com), an open-source software company dedicated to revolutionizing the task of server administration. The main product I'm working on right now is Puppet (http://reductivelabs.com/projects/puppet), which is basically a library capable of modeling all of the configurable elements on servers (users, packages, files, cron jobs, etc) and a domain-specific language built for specifying an entire network's configuration in one specification. It has been developed based on years of experience using older-generation tools like cfengine and ISconf, along with years of participation in the sysadmin research community (which is disappointingly small).

Once Puppet is mature and it is possible to get the servers to configure themselves as desired, my focus will shift to building feedback loops into the network, including intra-server, inter-server, and human-server feedback loops. Once the tools know enough to configure the system, they can use that same knowledge to make the systems themselves more resilient, and also provide enough context to logs, metrics, and other system-generated data to make most administration tasks significantly easier.

And finally, a bio for a presentation I'm doing at AUUG:

Luke Kanies is the developer of Puppet, a next-generation configuration management system. He has been a sysadmin for nine years and has published multiple articles on Unix tools and best practices. He founded Reductive Labs as a software company focused on open-source system administration tools, because our tools have not been keeping up with our problems.

Update: I ended up modifying, at their request, the bio for AUUG:

Luke Kanies is the founder of Reductive Labs, an open-source software company focused on building the next generation of system administration tools. He has been a sysadmin for nine years and has published multiple articles on Unix tools and best practices. His current focus is Puppet, a next-generation server automation framework developed from the perspective that computers should be managed as a network instead of individually.

Puppet lets you centrally manage every important aspect of your system using a cross-platform specification language that manages all the separate elements normally aggregated in different files, like users, cron jobs, and hosts, along with obviously discrete elements like packages, services, and files. Puppet's simple declarative specification language provides powerful classing abilities for drawing out the similarities between hosts while allowing them to be as specific as necessary, and it handles dependency and prerequisite relationships between objects clearly and explicitly.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Fri, 21 Oct 2005 | Tags: