Implementing Normalization With a Touch of Closurehood
This is a post from Luke's old blog; it is saved here statically for historical purposes, as of October 2008
Normalization seemed easy at first. Just make sure a given configuration doesn't mention a given element more than once. This simplicity is entirely absent, though, and Puppet has artifacts from multiple bad attempts at providing it.
First let me apologize for how rambling this post is. It's obvious that I barely know what I'm talking about here and that this topic is right at the edge of my competence. I think this is the last big design issue to resolve before a full release, though, and I'm cleaning all of this up right now, so I figured it was important to at least try to lay out where the problems are and what I'm using to make decisions.
What's In a Name?
The first set of ugliness is more about a lack of elegance than it is about a
lack of functionality. For many elements you need to configure, the "name" of
the element is essentially
isomorphic with the element
itself. No one will look at you funny if you say "the file /etc/passwd"
rather than "the file named /etc/passwd", nor will people think you're crazy
for calling the process that happens to have the name apache "the apache
process".
This realization led me to design Puppet around using the tuple of object name
and type (e.g., file and /etc/passwd or service and apache) as unique
identifiers for elements. This tuple is how Puppet uniquely refers to each
element, and configuration normalization is actually normalization of these
tuples.
This led down some stupid paths at first. For instance, I initially required every specified element, including components and classes, to have explicit names, because otherwise I couldn't verify that they were normalized. But then I realized that I didn't want to normalize the high-level structures, only the lowest level elements, so I had to redo the language a bit to make those names optional.
The next problem was that some objects don't actually have explicit names. Cron jobs, for instance, not only don't have names, they don't even necessarily have a unique field that could function as a name. In cases like this (and cron jobs are the only case like this so far), I require a name but it's an arbitrary name. In the case of cron jobs, I store the name as a comment before the job. It actually works relatively well, because you can name the cron job for why it exists, but it's still arbitrary and thus kind of silly.
Finally, you have the problem where elements are considered the same thing
even though they have different names. You might have a single class that
manages the syslog daemon on all of your Unix platforms, but the service
name on the different platforms varies considerably. As a human you consider
all of these services to be a single element, but Puppet can't determine that.
I have some ideas for how to fix this, but none of them seem good enough to be
worth implementing just yet.
Of course, there are also elements with more than one valid name. Given a file with multiple hard links, which link is "the" link? How do you decide?
Opposing Needs
The real ugliness comes in at the intersection of recursion, specificity, and security policy (or any policy, really). All of our closure-like behaviour has been based on this name/type tuple, and all of it can be effectively at parse-time and can take place on the central server. There are factors that further affect closurehood but which generally need to take place at run-time, on each individual server.
Recursion and Specificity
Recursion is necessarily a run-time action. The manifests
are parsed on the central server, which does not know which files actually
exist. When you do a recursive operation on /etc, Puppet does not know
which exact files are managed until the configuration is applied to a specific
host.
It's true that the server-side language could be extended to realize that files do recursion, but that either precludes the development of other recursion types or requires that every recursive type gets hard-coded into the language, which I do not think is acceptable. Even then, it could be defeated by something as simple as symlinks or hard links. Without examining the local system state, it can be impossible to know whether two statements actually conflict.
Recursive operations often lay out a base configuration which can get overridden by more specific statements, such as in the following snippet:
file { "/etc/": recurse => true, owner => root }
file { "/etc/apache/auth": recurse => true, owner => apache }
We do not want this to be considered a conflict, of course. To solve this
problem I've built specificity into Puppet, where objects are either
explicit or implicit. Explicit objects automatically override implicit
files, but conflicts between objects of the same level of specifity are
considered errors.
Because of the nature of recursion, this is inherently a run-time operation.
Security Policies
Most organizations have a security policy maintained by a completely separate
group from the administration group. This policy defines a base-line that
provides the minimum level of security, such as forbidding certain programs
from running or requiring that there be no group-writable files in /etc.
The thing is, though, the security policy is generally responsible for providing the minimum acceptable parameters, not for actually determining final state. Sure, sometimes it provides for only a single acceptable parameter, such as a service not running, but it generally provides minimums or maximums or ranges, not individual values. So, we need to support these security overlays plus the actual configurations without considering them to be conflicts.
Of course, nothing is quite this simple. We also need to support allowing overrides when necessary -- I've been at plenty of companies that had to break their own security policies in order to make a certain old service work correctly.
It seems like this could be a parse-time operation, because you just have to determine whether provided values fit the baseline. The problem is that you cannot actually do this without some sort of value typing, and you actually do need to be local to the server for some aspects.
If the security policy says that no files in /etc can be world- or
group-writable, then you either need to have a "file mode" type that can
handle masking, or you need to do the testing on the client which inherently
understands file modes.
Other statements get interpreted differently depending on the system. If the security policy refers to a file's owner by name but a different statement refers to the owner by number, the client itself must determine whether those different labels refer to the same user or a different user.
Parse time vs. Run time
So, some aspects of normalization and closure-seeking happen at parse time on the server, and some aspects unfortunately must happen on the client. Just tonight I went through and clarified (and wrote up unit tests for!) the server-side tests. This code is now much cleaner, and I think it does a good job of requiring the desired behaviour.
There is still much to do. I already have overrides based on specificity working in the library, but I don't have any mechanisms for a security policy overlay. At this point, I am going to go with less functionality, rather than bad functionality, meaning that I am not planning on supporting any kind of overlay for now. The only run time normalization operations will be overriding based on specificity.
The Current Solution
My current plans are to rely as much as possible on server-side checking. There actually are some override operations supported at parse-time based on server class hierarchies, but because they're parse-time they can be easily fooled by not having enough information. The following snippet will do what you would intuitively expect, though, and is not considered a conflict:
class base {
file { "/etc/sudoers": user => root, group => root }
}
class sub inherits base {
file { "/etc/sudoers": group => wheel }
}
It just overrides the group for servers in the sub class.
For parse-time operations, I plan on continuing to support overrides based on specificity, where implicitly configured elements are replaced by explicit elements when there is a conflict, but until I come up with a decent long-term solution, there will be no other allowed conflicts. I do not think this will be a significant reduction in functionality, and I think it's important to stay simple while Puppet is young and we have not yet characterized exactly how we want it to behave.
Updated slightly on 11/4/05