Puppet: System Administration Automated

Implementing Normalization With a Touch of Closurehood


This is a post from Luke's old blog; it is saved here statically for historical purposes, as of October 2008

Normalization seemed easy at first. Just make sure a given configuration doesn't mention a given element more than once. This simplicity is entirely absent, though, and Puppet has artifacts from multiple bad attempts at providing it.

First let me apologize for how rambling this post is. It's obvious that I barely know what I'm talking about here and that this topic is right at the edge of my competence. I think this is the last big design issue to resolve before a full release, though, and I'm cleaning all of this up right now, so I figured it was important to at least try to lay out where the problems are and what I'm using to make decisions.

What's In a Name?

The first set of ugliness is more about a lack of elegance than it is about a lack of functionality. For many elements you need to configure, the "name" of the element is essentially isomorphic with the element itself. No one will look at you funny if you say "the file /etc/passwd" rather than "the file named /etc/passwd", nor will people think you're crazy for calling the process that happens to have the name apache "the apache process".

This realization led me to design Puppet around using the tuple of object name and type (e.g., file and /etc/passwd or service and apache) as unique identifiers for elements. This tuple is how Puppet uniquely refers to each element, and configuration normalization is actually normalization of these tuples.

This led down some stupid paths at first. For instance, I initially required every specified element, including components and classes, to have explicit names, because otherwise I couldn't verify that they were normalized. But then I realized that I didn't want to normalize the high-level structures, only the lowest level elements, so I had to redo the language a bit to make those names optional.

The next problem was that some objects don't actually have explicit names. Cron jobs, for instance, not only don't have names, they don't even necessarily have a unique field that could function as a name. In cases like this (and cron jobs are the only case like this so far), I require a name but it's an arbitrary name. In the case of cron jobs, I store the name as a comment before the job. It actually works relatively well, because you can name the cron job for why it exists, but it's still arbitrary and thus kind of silly.

Finally, you have the problem where elements are considered the same thing even though they have different names. You might have a single class that manages the syslog daemon on all of your Unix platforms, but the service name on the different platforms varies considerably. As a human you consider all of these services to be a single element, but Puppet can't determine that. I have some ideas for how to fix this, but none of them seem good enough to be worth implementing just yet.

Of course, there are also elements with more than one valid name. Given a file with multiple hard links, which link is "the" link? How do you decide?

Opposing Needs

The real ugliness comes in at the intersection of recursion, specificity, and security policy (or any policy, really). All of our closure-like behaviour has been based on this name/type tuple, and all of it can be effectively at parse-time and can take place on the central server. There are factors that further affect closurehood but which generally need to take place at run-time, on each individual server.

Recursion and Specificity

Recursion is necessarily a run-time action. The manifests are parsed on the central server, which does not know which files actually exist. When you do a recursive operation on /etc, Puppet does not know which exact files are managed until the configuration is applied to a specific host.

It's true that the server-side language could be extended to realize that files do recursion, but that either precludes the development of other recursion types or requires that every recursive type gets hard-coded into the language, which I do not think is acceptable. Even then, it could be defeated by something as simple as symlinks or hard links. Without examining the local system state, it can be impossible to know whether two statements actually conflict.

Recursive operations often lay out a base configuration which can get overridden by more specific statements, such as in the following snippet:

    file { "/etc/": recurse => true, owner => root }
    file { "/etc/apache/auth": recurse => true, owner => apache }

We do not want this to be considered a conflict, of course. To solve this problem I've built specificity into Puppet, where objects are either explicit or implicit. Explicit objects automatically override implicit files, but conflicts between objects of the same level of specifity are considered errors.

Because of the nature of recursion, this is inherently a run-time operation.

Security Policies

Most organizations have a security policy maintained by a completely separate group from the administration group. This policy defines a base-line that provides the minimum level of security, such as forbidding certain programs from running or requiring that there be no group-writable files in /etc.

The thing is, though, the security policy is generally responsible for providing the minimum acceptable parameters, not for actually determining final state. Sure, sometimes it provides for only a single acceptable parameter, such as a service not running, but it generally provides minimums or maximums or ranges, not individual values. So, we need to support these security overlays plus the actual configurations without considering them to be conflicts.

Of course, nothing is quite this simple. We also need to support allowing overrides when necessary -- I've been at plenty of companies that had to break their own security policies in order to make a certain old service work correctly.

It seems like this could be a parse-time operation, because you just have to determine whether provided values fit the baseline. The problem is that you cannot actually do this without some sort of value typing, and you actually do need to be local to the server for some aspects.

If the security policy says that no files in /etc can be world- or group-writable, then you either need to have a "file mode" type that can handle masking, or you need to do the testing on the client which inherently understands file modes.

Other statements get interpreted differently depending on the system. If the security policy refers to a file's owner by name but a different statement refers to the owner by number, the client itself must determine whether those different labels refer to the same user or a different user.

Parse time vs. Run time

So, some aspects of normalization and closure-seeking happen at parse time on the server, and some aspects unfortunately must happen on the client. Just tonight I went through and clarified (and wrote up unit tests for!) the server-side tests. This code is now much cleaner, and I think it does a good job of requiring the desired behaviour.

There is still much to do. I already have overrides based on specificity working in the library, but I don't have any mechanisms for a security policy overlay. At this point, I am going to go with less functionality, rather than bad functionality, meaning that I am not planning on supporting any kind of overlay for now. The only run time normalization operations will be overriding based on specificity.

The Current Solution

My current plans are to rely as much as possible on server-side checking. There actually are some override operations supported at parse-time based on server class hierarchies, but because they're parse-time they can be easily fooled by not having enough information. The following snippet will do what you would intuitively expect, though, and is not considered a conflict:

class base {
    file { "/etc/sudoers": user => root, group => root }
}

class sub inherits base {
    file { "/etc/sudoers": group => wheel }
}

It just overrides the group for servers in the sub class.

For parse-time operations, I plan on continuing to support overrides based on specificity, where implicitly configured elements are replaced by explicit elements when there is a conflict, but until I come up with a decent long-term solution, there will be no other allowed conflicts. I do not think this will be a significant reduction in functionality, and I think it's important to stay simple while Puppet is young and we have not yet characterized exactly how we want it to behave.

Updated slightly on 11/4/05

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Thu, 27 Apr 2006 | Tags: , ,