Puppet: System Administration Automated

Jay and I converge on testing


Those four people who have been reading this blog for a while know I've been struggling to think and program like Jay Fields. In particular, he seems to have presented a few rules in the past that don't like to be used together:

Now, let's do a simple combinatorial exercise, and put these three rules together:

It's pretty clear that, like the old saw about programming ("All programs can be reduced to one line of code with a bug in it"), Jay is pointing us toward tests that can largely only be one line of code. Yeah, I know sometimes setup methods don't involve any mocking, but often they do.

And, since your tests can only be one line of code, they can't test very much, which means that all of your methods need to also be one line of code, else they aren't testable. (Yes, I'm being a touch extreme here, but that is where the arrow is pointed, anyway.)

You can see how this would kinda drive me bonkers. Some local dev friends have been trying to help me see the error of my ways, mostly so my code would stop looking like such crap; I've learned a helluva lot in the last 8 months or so, and most of it has actually made my code look more like Jay would recommend. I will say it's blindingly obvious Jay is doing internal development at enterprises, rather than developing as part of a consistent team producing software that is distributed to the wider world.

But one can only go so far, and the three rules above, in combination, are just way too far. I've often kind of sputtered expasperatedly at Jay's posts, especially his announcement of his new testing tool, expectations. Again, I can kind of see where he's going with that, but you've got another thing coming if you think I'm using it, especially given how happy (at least, relative to test/unit) I am with RSpec.

Also, I think it's just stupid having all setup code inline. DRY ("don't repeat yourself") is just as true in your test code as anywhere else, and having a maintainable test code base is, IMO, more important than having your normal code base be maintainable, because tests are kind of unnecessary. If you have good, readable, maintainable tests, then people who contribute will also contribute tests. If your tests are all 50 lines long and have lots of repetition, then 1) you've got 5x the amount of code you should, which is wicked expensive, and 2) you've got so much code no one will look at it. Yay, never getting patches with tests in them. My favorite example of this is Steve Yegge's rant Code's Worst Enemy; he describes his 500k line Java project with no tests, which is a lot of code but much less code than if it had tests. I've experienced in Puppet that test code seems to be much harder to maintain that normal code (although maybe it's just own crap test code, not normal test code), and having 5x test code than normal code would make me just quit writing unit tests entirely.

So, I am absolutely overjoyed to announce that Jay has changed one of his rules: He now recommends stubs over mocks. This is clearly just for setup code and such, but it's a big step. He even goes into using stub_everything, which I find is the only way to build tests that aren't fragile. For instance, say you start with this code:

class MyClass
    def go
        start()
        finish()
    end
end

describe MyClass do
    before
        @me = MyClass.new
        @me.stubs(:start)
        @me.stubs(:finish)
    end

    it "should start when going" do
        @me.expects(:start)
        @me.go
    end

    it "should finish when going" do
        @me.expects(:finish)
        @me.go
    end
end

Now you find you need a validation method, so you add this test:

it "should validate when going" do
    @me.expects(:validate)
    @me.go
end

Update: Fixed code to actually call @me.go in the validate test.

Oops. Now your single test passes, but your two old tests break, because you were only stubbing start and finish, instead of using stub_everything. Your setup code needs to be modified to take this new call into account (or, if you're Jay, you need to modify every test in your suite; yay). This comes up constantly. If you specifically mock or stub methods during setup, then you are almost guaranteed to have cascading failures when you expand your code.

Anyway, the point is, if I tried to follow Jay's rules, then the above trivial change -- I add one line of code to a very simple method -- would result in me adding a test for that line, plus at least one line of code in every other test in that suite. Instead, if I use stub_everything, then I add my new test and I'm done. (Well, kind of; notice I'm not actually testing the order of the method calls, which is actually pretty tough.)

My recommendation is to read Jay, since he's clearly thinking and talking about aspects of testing that not many others are, but read him with a skeptical eye, and be willing to say "That's just nuts!" and write your own relatively abstracted test code. And if you're working with people who can't think to look at their setup code when a test fails, then you need to find a different job.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 06 May 2008 | Tags: , , , , ,


Ruby has a distribution problem


I've been doing a better job of reading development books recently (e.g., Domain Driven Design), and something has really begun to stick out at me. There seems to be a split between those developers who write software that is expected to run in one place and those who write software that is expected to run in many places.

If you, as a developer, know that your software will really only be installed at a single customer (whether that customer is your employer, a consulting client, or whatever), then your life is drastically easier -- you don't usually have to worry about cross-platform issues, and you don't have to worry about different users having different needs, because you only have one user.

Obviously there's no inherent problem with having the simpler life of a developer with only one user, but it seems to me that the Ruby community is, as a group, largely adopting that perspective as the default. This is worrying to me, because I'm building an application that I expect to be installed in thousands of locations (in fact, it's probably already installed in thousands of locations). I'd like to take as much advantage of existing Ruby code as possible, but it's not exactly easy.

For example, rubygems (probably my least favorite ruby software of all time) basically require that you always try to load them, because their design stupidly requires that you know whether a given piece of software is installed via rubygems or some other mechanism. For instance, if you've installed the Facter gem, then this code doesn't work:

ruby -rfacter -e 'Facter.to_hash'

Instead, you have to do this:

ruby -rubygems -rfacter -e 'Facter.to_hash'

The reason is that rubygems installs in a location that Ruby doesn't search by default. The reason for that is that apparently this one guy, somewhere, wanted to have multiple versions of a given package installed at once. Who wants this? Let's just say it's not the guys who are distributing hundreds and thousands of copies of their software.

The truth is, most Rubyists don't even seem to use gems this way -- they tend to create a vendor subdirectory in their project, and then install their gems there. This is a clear example of how little they expect to have their projects distributed. These gems might be compiled, they might conflict with installed software, they might require installed software -- you have no idea, because it's an entirely separate repository of packages.

This is basically anathema to how I think about management, yet it's the standard, recommended practice in the Rails community, because it makes it easy to "guarantee" behaviour in a given environment. Of course, your guarantee is only good if no one ever tries to run the software anywhere except an exact duplicate of where you run it.

I tried to have a conversation about this at the Ruby Hoedown last year -- my claim was it was difficult to turn a Rails project into a native package, especially with the tendency toward requiring all kinds of random gems. Quite a few people kind of stared blankly at me and said, multiple times, "I just put it in vendor." Since then, this has become my go-to phrase for describing the Ruby way of solving distribution problems: "I just put it in vendor." I keep waiting for someone to try to put their kernel or web browser in vendor: "We only support the Firefox copy in vendor, sorry."

I don't know if other communities are any better at this. From what I can tell, this is basically how the Java community behaves, too. They have pathologically bad distribution systems, and as a language it seems to be most influenced by consulting shops developing huge, worthless software projects for large enterprises, rather than developing distributed applications that will be installed in thousands of locations.

I'd like to think that Puppet would have some counter-affect to this. It's one of the largest and most sophisticated publicly available Ruby projects, it's already installed in at least hundreds and probably thousands of places, and it does a pretty good job of working nearly everywere. However, I keep getting blank stares when I talk about this with other Rubyists, half the time I'm called a troll for even bringing it up, and when I explain why Puppet exists to most Rubyists, they just say, "I just put it in vendor", or, maybe, "Why not just use Capistrano?" To that I ask, how do you install Capistrano, but you know what they say to that.

I think Rails is a big part of the problem. Rails is clearly created by a company that will never distribute its software, and the Rails philosophy is again almost pathologically opposed to the idea of turning your software into a package. Imagine trying to make a Rails project LSB compliant -- your database.yaml file would need to be in /etc, your log directory would need to be in /var, and your actual code would need to be in /usr. There went all of your fancy Rails "convention over configuration", and you're suddenly fighting Rails instead of using it, and everyone you ask for help just tells you to "put it in vendor".

I'm looking at creating a new application that I'm planning on distributing, and one of my big goals is to be able to distribute the core in one package and various additional pieces of functionality as separate packages. I'll need to simultaneously support as many of my customer platforms as I can and provide a consistent operating environment for my packages. The only way to do this is to have supported operating environments with well-defined dependencies, such as you can almost trivially build in Debian or Red Hat.

For those of you who are thinking, "you could just put it in vendor", or "you could at least use gems", No, I couldn't. Take a trivial example: Say I want to use RRD support in my application (which is likely, in this case). There is Ruby support for RRD, but not in Gem form. Even if there were a gem, though, it would require a native RRDTool package, and, of course, Gems can't specify dependencies on native packages, so I'd be telling my customers, "well, install X gems and Y packages".

Instead, if I use native packages (say, those for Debian and Red Hat, to cover most cases), I can define clear dependencies for all cases. I know Debian provides everything I need, and in the rare case it doesn't, I can provide my own apt repository (and the same for yum). Gems, on the other hand, can really only do Ruby stuff. No, I don't actually want to put glibc in vendor, thanks.

I don't see a solution to this, other than getting more Rubyists distributing their software, but I'd really like to see this issue begin to be approached by the community. I feel like a wolf howling in the wilderness at this point, and if often feels like I'm fighting against my community in order to produce software that hundreds or thousands of people will install, as opposed to just use over the web.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 05 May 2008 | Tags: , , ,


Caching and REST


(by Luke)

One of the things I was supposed to write about last week was how I'm rethinking some of Puppet's internal caching. This rethinking is a direct result of listening to ThoughtWork's IT Matters Podcast on REST (I've only listened to part 1 so far). I actually listened to the episode three times, because it's only about 20 minutes and I listened to it on a 60 minute bike ride, which worked well because it was so windy that day that I didn't hear the whole thing any of those listenings.

I'll hopefully write later about how this podcast made me rethink how environments are used in fileserving, but for now, I'm going to focus on caching.

Indirection

For a couple of months now, Puppet has had an Indirector module that is basically useful for connecting classes with collections of instances of those classes. The only reason you'd really even bother to use it is if you had multiple collections, and needed to interact with different collections at different times, but you wanted those differences to be transparent.

For instance, when retrieving node information, you just call this code:

Puppet::Node.find("mynode")

Somewhere else, you'll have configured which collection (the word I'm currently using is terminus) this uses, and the Indirector just delegates the find call to the right collection. For nodes, you might be using the exec collection, which calls an external script, turns the resulting YAML into a Node instance, and returns it (or returns nil if nothing was found).

I think the Indirector is pretty cool, and it's certainly simplified a lot of my modeling of interacting with different sources of information. Those who are familiar with REST, at least how it's usually done in the Ruby world, will recognize the find as one of the methods usually used for REST interfaces -- it's mapped to the HTTP verb get. One of the primary design goals of the Indirector was to facilitate REST interfaces, so the methods we're indirecting are, not coincidentally, exactly the methods you'd implement for REST support.

Caching

One of the later additions to the indirection code was support for cache collections. That is, you might have a canonical collection, and then a cache collection for speed or proximity purposes. Following our Node example above, if you were using the exec collection, you'd probably want to have the results cached in the yaml collection, so they were inexpensive to retrieve.

The critical question with any caching system is how to know when the cache is dirty. How do you know if you should use the cached node information or go back to the source?

I expect there are as many answers to this question as there are caching implementations, just about. I had never implemented a caching solution before, and I probably misinterpreted my discussions with Rick Bradley, because I ended choosing a not-very-good system. The current cache invalidation mechanism is based on relative versions: If the version of the cached object is older than the version of the object in the other collection, then your cache is dirty.

What is a version? Well, normally it's just the timestamp of when the instance was created. This might work okay for some systems, but in general, the timestamp ends up being pretty useless. Look at our Node example -- the timestamp of the exec collection is always later, because we retrieve the cache version, then generate a new node using the exec collection, and compare. Duh. The answer's always the same.

Even worse, in most situations the cache doesn't save you any work, because you're pulling fresh data from the original source. If we have to re-execute the external node script to get the latest node version, we haven't saved any effort at all, we've just added a bunch of useless work, which is stupid.

Puppet 0.24.4 "fixed" this problem by saying that the cached node's version was the timestamp of the node's Facts cache. If the facts are updated, then the cache needs to be updated. This seems to mostly work, but it feels like a hack for something that should be easy.

TTL

So, on to the podcast. It was a good podcast in general, and they focused a good bit on caching. At first I found this pretty strange -- why is caching an important design criterion? As they talked, though, I realized that a generalized, simple caching model is useful a lot more places than I would expect, including in Puppet.

There didn't seem to be any disagreement over the best way to handle knowing when a cache is dirty -- they apparently just use time-to-live (TTL) or expiration headers. I think it was the second time listening through that I realized that the vast majority of my caching problems could be fixed with this.

Puppet has a natural TTL for most of its information -- every host runs every half an hour, so if you set a TTL of half an hour (or whatever you're run interval is), then you'll get fresh data once a run, and cached data the rest of the time. In the above Node scenario, the exec collection would set the TTL of the node (so that your external node app could pick its own TTL), or Puppet would have a default TTL equal to the run interval. Then, when Puppet goes to check whether its cache was dirty, it could just compare the TTL against the current time -- no need to hit both collections, and no arbitrary definition of "version".

This actually makes even more sense with the current problem I'm trying to solve. I'm trying to remodel the SSL certificate signing process, and it's gotten pretty messy. With this, though, you just set the TTL of the certificate to its own internal TTL, and you use the local system as the cache the CA server as the ultimate source. If there is a local cert and it's still valid, use it; if there's a local cert but we're past its TTL, then discard it and get a fresh cert; if there's no cert, then get one from the server and cache it locally.

Next Steps

I don't have the whole thing figured out mentally yet, but I'm pretty close. At the least, the next step is to replace the current broken version-based cache with ttl-based caching. The two things I most need to resolve are:

Obviously, these two things are linked -- the user needs a complete configuration path from the command line or configuration file to the bit that actually sets the ttl.

For now, fortunately, I don't need to worry about it, because I can just stick with the run interval as the TTL for essentially everything I'm doing. As things get more interesting, though, we're going to want to configure these values, because....

TTL Can Help Provide Change Control

One of my primary goals in moving the catalog compiling process to REST is to enable a decoupling between compiling and applying. In other words, I want people to be able to apply a configuration without recompiling.

Imagine a configuration TTL of a week -- every host recompiles its configuration during some specific maintenance window, like Sunday morning between 2 and 6 am. They still apply their configurations every half an hour, but that's normally just validating that nothing has drifted.

Obviously, this wouldn't be used by most shops -- most people would still want all hosts to recompile every time. But for those shops that are highly worried about change control, or those who want to do rolling upgrades, where they upgrade 10% of a pool of servers at a time, this would help a lot. You take your pool of servers, trigger a recompile on 10%, and once you're confident they're working, you trigger a recompile on another 10%, and so on.

Once you can do that with Puppet, it'll feel almost enterprisey. :)

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Wed, 02 Apr 2008 | Tags: , , , , , , , ,


Testing Initialization Code


One of the things I continually struggle with in testing is code that runs during initialization. A lot of times this code is very simple:

def initialize(name, cert, key)
    raise Puppet::Error, "Cannot manage the CRL when :cacrl is set to false" if [false, "false"].include?(Puppet[:cacrl])

    @name = name

    unless read(Puppet[:cacrl])
        generate(cert, key)
        save(key)
    end
end

It's easy to test that first and second line, and it's entirely obvious what that last bit does, but it's fantastically difficult to test, especially if you follow the advice of Jay Fields and try to stick to one expectation per test.

If you do it all in one test, you end up with a relatively long test that covers a single specific version, but it doesn't describe the behaviours very well. You want something like this:

describe "when initializing" do
    it "should fail if :cacrl is set to false"

    it "should set the name"

    it "should read the crl in from disk"

    describe "and no crl exists on disk" do
        it "should generate a new crl"

        it "should save the new crl"
    end
end

The only way to do this, though, is to use stub_everything, and then individually test for each method, which is messy.

Even worse, you now have to stub out these methods every time you want to test an instance of the class in any other way. For instance, (as you might have guessed) I'm remodeling our Certificate Revocation List as a class, and I'm going to need to test the actual revocation, along with storage to disk. Each of these are made more complicated by the code in the initialize method.

Why, then, don't I just leave the code out?

Well, I could easily have it lazy evaluate, only running when someone actually asks for the crl. The problem is that I've consistently found that lazy evaluation causes more problems than it saves. I tend to run into permission problems (because the code doesn't evaluate until puppetmasterd is running as puppet, when it sometimes doesn't have the permissions it needs), and it's just very difficult to really control ordering.

Also, it just feels messy to reorganize easy code to make it more testable. There seems to be a postulate in the testing world that code that's difficult to test is bad code, but I defy anyone to argue that the above code is unclear or "bad code", other than just directly saying it's bad because it's hard to test.

I expect that in this case I'll have a generate_and_save method that, well, generates and saves, or maybe a load_or_create method that does this bit. Yay. Because simple code is hard to test, I end up with less simple code, and I still have to use stub_everything.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 01 Apr 2008 | Tags: , , ,


Another Testing Conundrum: Mocks or Real Objects?


So, I have a very simple method for generating a certificate request:

# How to create a certificate request with our system defaults.
def generate(key)
    Puppet.info "Creating a new SSL certificate request for %s" % name

    csr = OpenSSL::X509::Request.new
    csr.version = 0
    csr.subject = OpenSSL::X509::Name.new([["CN", name]])
    csr.public_key = key.public_key
    csr.sign(key, OpenSSL::Digest::MD5.new)

    raise Puppet::Error, "CSR sign verification failed; you need to clean the certificate request for %s on the server" % name unless csr.verify(key.public_key)

    @content = csr
end

Short, readable, clear. Theoretically, this should be easy to test, too, and hopefully the tests should be about as short.

If you follow the strategy of using a mock request, your first pass ends up looking disappointingly like this:

describe "when generating" do
    before do
        @instance = @class.new("myname")

        key = Puppet::SSL::Key.new("myname")
        @key = key.generate

        @request = mock 'request'
        OpenSSL::X509::Request.expects(:new).returns(@request)

        @request.stubs(:version=)
        @request.stubs(:public_key=)
        @request.stubs(:subject=)
        @request.stubs(:sign)
        @request.stubs(:verify).returns(true)
    end

    it "should log that it is creating a new certificate request" do
        Puppet.expects(:info)
        @instance.generate(@key)
    end

    it "should set the subject to [CN, name]" do
        subject = mock 'subject'
        OpenSSL::X509::Name.expects(:new).with([["CN", @instance.name]]).returns(subject)
        @request.expects(:subject=).with(subject)
        @instance.generate(@key)
    end

    it "should set the version to 0" do
        @request.expects(:version=).with(0)
        @instance.generate(@key)
    end
    ...

Obviously, the test isn't done yet. Total line count for a 13 line method? 66 lines. (Admittedly, I'm a bit generous on white space.)

The thing I dislike most about this test is that the setup code has to know exactly how many tests there are -- if I add a new setting to the request, then I need to change my setup code, which has bitten me multiple times and is a real pain. One option for removing that is to skip the mock:

describe "when generating" do
    before do
        @instance = @class.new("myname")

        key = Puppet::SSL::Key.new("myname")
        @key = key.generate

        @request = OpenSSL::X509::Request.new
        OpenSSL::X509::Request.expects(:new).returns(@request)

        @request.stubs(:verify).returns(true)
    end
    ...

I still need to stub the verify method, because of the check that I do on the result, but skipping the use of mocks makes it simpler.

Note that I've been using a real SSL key the whole time -- this slows things down a little, since the ssl keys aren't free to make, but it simplifies my setup code in the same way that the above setup is simpler than the top one.

Of course, a third option is to keep the mock but use stub_everything. This is apparently another indication of code smell, and it seems the difference between this and just using a real object is that I need to write a separate integration test if I use a mock, but with this, I don't really.

I think I'm going to keep this last setup method. It's still 65 lines of code for a 13 line method, but at least it lists each behaviour separately. My previous test included this behaviour:

# It just doesn't make sense to work so hard around mocking all of this crap five times in order to get this test down to one expectation
# per test.
it "should create a new certificate request with the subject set to [CN, name], the version set to 0, the public key set to the privided key's public key, and signed by the provided key" do
    @request = mock 'request'
    OpenSSL::X509::Request.expects(:new).returns(@request)

    subject = mock 'subject'
    OpenSSL::X509::Name.expects(:new).with([["CN", @instance.name]]).returns(subject)
    @request.expects(:version=).with 0

    # For some reason, this is failing, even though the values are correct.
    # It seems to be considering the values different if i use 'with'.
    @request.expects(:public_key=)
    @request.expects(:subject=).with subject

    # Again, this is weirdly failing, even though it's painfully simple.
    @request.expects(:sign)

    @request.stubs(:verify).returns(true)

    @instance.generate(@key).should == @request
end

I'm not fond of this, because it doesn't list each behaviour separately, but it seemed to be appropriate for such a simple method.

Unfortunately, the 'verify' method (assuming I didn't use a mock) was nearly as long as this method, since it has to stub everything.

I'm thoroughly convinced that this generate method is reasonable, but the rules of the game are that it's got code smell because it's difficult to test, which I don't buy at all. So now I've got 5:1 tests to code for a really simple and maintainable method, meaning I'm basically doomed to always spend more time on the tests than the functionality, which is utter crap.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Wed, 12 Mar 2008 | Tags: , , ,


I can't test like I'm supposed to


So, I'm toward the end of converting my certificate handling in Puppet to using my nifty new indirection features, which will handle both reading and writing certs to disk and to remote servers (for signing). Yee-haw. The only bit left (I hope) is the certificate authority itself.

In the course of this work, I've created a 'SSL::Host' class that is basically a composite of a key, a certificate request, and a certificate, so it makes sense to treat a CA as a special case of that: It's various files are stored in different places, and it can sign certificates, but it's otherwise the same.

So, I need this class to read and write to different locations (easy, although not yet tested). I also want the CA to initialize itself completely upon creation -- that is, when I call CertificateAuthority.new, I want it to create and write to disk all of the files it needs. I think this is reasonable, because I don't think it's reasonable to either require the caller to call a setup method of some kind, nor is it reasonable to do it late-binding, since initialization failures would show up during a call to sign which is stupid.

Okay. Seems easy. Here's the kicker: I want to test that the CA always chooses its name as the value of the certname Puppet setting. It's pretty easy to write that basic test:

it "should always set its name to the value of :certname" do
    Puppet.settings.expects(:value).with(:certname).returns("whatever")
    Puppet::SSL::CertificateAuthority.new.name.should == "whatever"
end

Except... all that initialization stuff is happening (by design). So now I'm in the position of using this kind of hack:

Puppet::SSL::CertificateAuthority.any_instance.stubs(:setup_ca)

Which is mostly a hack just because that method is private -- you never need to call it, yet here we need to know it exists just for testing.

The only other option is literally about 15 lines of stubs, since the CA uses a bunch of other settings, which now all need to be stubbed because of my initial expectation, plus needing to stub out any file reading or writing.

I keep thinking that I'm just crazy, at some point I'll see the light and be able to write single-expectation tests with no setup code like Jay Fields recommends, but in reality, I think Jay is crazy. I can't fight that feeling, and I less and less want to.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Wed, 12 Mar 2008 | Tags: , , ,


On Ruby Interviews James Turnbull and Reviews the Puppet Book


Pat of On Ruby has posted a brief review of James Turnbull's Puppet book, Pulling the Strings with Puppet:

This book is filled with helpful code samples and pointers to external resources that look very useful. It's well written and easy to understand. As good a tool as Puppet looks to be, this looks like an equally good book to get you going. If you're doing configuration management for anything more than a box or two, run, don't walk, and pick up your copy of Pulling Strings with Puppet.

Pat also posted an interview with James:

Who gets the credit (Or is it blame?) for the title of your book, 'Pulling Strings with Puppet'

That'd be my editor and the marketing guys at Apress. Do you know how excited marketing people are when a product allows amusing alliteration and puns' :) But I like it -- it's both kitsch and catchy.

Looks like the book is really helping with the visibility of Puppet, which is great, and people seem to even like the book so far. :)

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Wed, 13 Feb 2008 | Tags: , , , , , ,


Interview with me at On Ruby


Pay Eyler has posted his interview with me at On Ruby:

What are your future plans for Puppet?

I'm pushing toward a 1.0 this year, hopefully, as soon as I can get the critical APIs stable. I'm also hoping to add a lot of interesting functionality around making each host's resource catalog more useful outside of Puppet'e.g., you could have all of your resource relationships set up in it, modify /etc/ssh/sshd_config, and then tell Puppet to figure out what services need to restart because of that change.

As we move toward a more database-backed catalog, vs. the current YAML-dumped version, we'll get a lot more functionality out of it yet, and I can't really even see most of that functionality right now.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Sat, 09 Feb 2008 | Tags: , , ,


A bit more DTrace


(This should have been posted a while ago, but I guess I had a problem and it's been sitting uncommitted for a while.)

After pulling apart the skip method in the lexer, so that the various parts are in separate methods, I get this as my count:

Puppet::Parser::Lexer                    munge_token              56778        358     20335592
Class                                    new                      28242        889     25132822
Puppet::Parser::Parser                   ast                      25881       1147     29695496
Fixnum                                   <                       1817071         16     30723097
StringScanner                            check                   1829886         26     48732560
String                                   length                  3757782         20     78611361
Puppet::Parser::Lexer::TokenList         each                     56778       6618    375813485
Puppet::Parser::Lexer                    find_token               56778       6714    381227038
Hash                                     each                     84949       4563    387630769
Puppet::Parser::Parser                   import                       9   45754308    411788774
Puppet::Parser::Parser                   _reduce_132                  9   45755009    411795083
Object                                   catch                    56018       8086    452970031
Puppet::Parser::Lexer                    scan                       173    2751816    476064309
Racc::Parser                             _racc_yyparse_c            173    2751907    476080064
Object                                   __send__                   173    2751984    476093248
Racc::Parser                             yyparse                    173    2752322    476151712
Puppet::Parser::Parser                   parse                      173    2752742    476224530
Array                                    collect                    331    1446548    478807659
Array                                    each                     26303      18476    485983221

The interesting one there is the Lexer.find_token method -- I just created that, and it looks like it's taking 38/48 of the total parse time, which is a helluva lot.

This method is responsible for picking the token to return, and the complicated aspect of the method is that it has to return the longest match, which is currently done by matching each token in turn (skipping those that don't match), and picking the longest match. This is expensive, because it means that every token is iterated over for every returned token, which means it scales at O(N^2), which is bad.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 28 Jan 2008 | Tags: , , ,


RubyConf Video is Posted


Looks my the video for my RubyConf presentation, Essential Incompleteness in Program Modeling (which I subtitled How to apply hand-wavy math to software design), has finally been posted. The slides are also available in PDF.

It's a strange talk, that's for sure. For one, it doesn't really have any code in it. And when I say "not really", I mean "none". I only even mention Ruby when talking about Puppet. For two, I kinda submitted the talk on a lark -- my Ruby submissions seem to be largely uninteresting to the Ruby community, so I figured I'd submit this as practice but that it would be denied. When it was actually accepted, I had to go write the darn thing. Third, I did relatively poorly in the presentation. I think the content was actually pretty good, but I did a poor job of organizing the slides and of presenting it. That isn't to say I think it's a bad presentation, just that it could have been much better.

It'd be interesting to be given the opportunity to give the talk in a different environment, somewhere I was just thinking about the talk and not worrying about its appropriateness.

Either way, the talk was an earnest attempt at providing what I think is a cool and useful way at looking at software design, and it's worth a look.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Sat, 29 Dec 2007 | Tags: , , , ,


[1] 2 3 4  >>