Canonical Names vs. Colloquial Names
Based on my experience with cfengine, one my initial design requirements for Puppet was that it support two names for every object; I have been calling these names the canonical name and the colloquial name.
For instance, take the ssh daemon. Or is that the sshd daemon? Or
the openssh daemon? Unfortunately, it depends on your operating system,
and potentially even on your environment (e.g., you could create a custom
init script that called it whatever you wanted), which is exactly the
point. We humans have a single name that we use to refer to all of these
daemons -- personally, I always think of it as "the ssh daemon", or maybe just
"sshd" -- but the computers have their own names.
The first name, the name that humans apply consistently throughout a site, I have been calling the canonical name ("canonical" because it's true everywhere), and the second name, the name that varies with every platform or whatever, I have been calling the calloquial name ("colloquial" because it's based on the local dialect, so to speak (pun intended)).
However, both of those names are, um, way too long to use as method names.
Puppet always requires a colloquial name, and it uses that as the default for
the canonical name -- that is, if you do not provide a canonical name, then
the colloquial name is used. However, the difference between the two is, um,
a touch embarrassing -- the canonical name is retrieved by calling the
name method on a Puppet type, and the canonical name is retrieved by
getting the name attribute of a Puppet type (e.g.,
name = obj[:name]).
Yes, this is hideous, and no, it's not reasonable for me to expect anyone else to understand this. But it's worse than that.
I've been planning on moving Puppet types from using the hash-style attribute
retrieval methods ([] and []=) to using standard gettor and settor
methods (e.g., name and name=). The problem is, of course, that you
would then get name clobbering -- there would be no way to distinguish the two
names.
I actually wrote up all of the code to make this change (two different ways, even) at the beginning of 2006, but this naming problem stymied me, so I left it alone (I did the work mostly on a lark, while I was travelling -- I was hoping for performance improvements, but I found them elsewhere).
Now, as I continue thinking about providers and abstraction and modeling and
those other things that keep me up at night and put the rest of you to sleep,
this problem is getting more pronounced. As I
mentioned, I'm working
on potentially adding a layer above the types to handle the @is and
@should values (or, as I've been describing it, handling the three C's --
collect, compare, commit), and changing the types so that direct method calls
work. In that case, it makes much more sense to get rid of this hash syntax.
(Yes, it must be said, Puppet was initially inspired just a bit too much by
hashes.)
So, I think I'm going to start calling the canonical name the "title", and
continue using "name" for the colloquial name. The language will continue
preferring the title over the name (you probably didn't know that it did
that), and in the majority of cases this won't matter to you. But I will have
to modify the Transportable classes to use title instead of name, and
I'll have to make a few other internal changes. Once this is done, though,
and it should be pretty straightforward, I can move to
file.uid = 0 instead of file[:uid] = 0, which I think makes more
sense; certainly it will be better for users of Puppet's library interface.
Wed, 16 Aug 2006 | Tags: puppet, design, naming
Implementations become Providers
After
discussions,
I've changed implementations to providers. This makes it easy to talk
about them:
package { ssh:
provider => rpm,
ensure => installed
}
And the class names and instance variables all become pretty easy, too.
I've been making significant progress in the work, but as I get closer to completion, I see more and more parts of Puppet that are now code and should instead be data. For instance, unless I change things some, all of my User states will look something like this:
newstate(:gid) do
def retrieve
@is = provider.gid
end
def sync
provider.gid = self.should
return :user_changed
end
end
That's basically silly. Sure, I can make a base class for User states, so
that I don't have to repeat within that one class, but that's still all
basically data disguised as code. I think I've got a way to be able to trim
it down the basics -- the state will assume that the retrieve method is named
after the state, so it knows to call provider.gid in this case, and the
sync method will be basically the same.
Of course, User is a bit weird in that basically all of the states have essentially infinite valid values. Packages, on the other hand, have a fixed list of valid values, basically:
newstate(:ensure) do
newvalue(:installed) do
provider.install
return :package_installed
end
newvalue(:absent) do ... end
newvalue(:latest) do ... end
end
I should be able to change that code to something like this:
newstate(:ensure) do
newvalue(:installed, :event => :package_installed, :method => :install)
newvalue(:absent, :event => :package_removed, :method => :uninstall)
newvalue(:latest, :event => :package_upgraded, :method => :upgrade)
end
Notice that I have the exact same information here, but it's now all presented as data.
I don't know that this is the best long-term option, and it's not quite this rosy since some methods need values and some don't, but this is what I'm searching for, anyway.
Thu, 10 Aug 2006 | Tags: puppet, providers, naming
Renaming Configuration Management
I started a thread on the seldom-used LOPSA Configuration Management mailing list today.
For those of you not on the list, it's a thread worth looking into.
Mon, 07 Aug 2006 | Tags: naming, config-mgmt, sysadmin
Implementing Normalization With a Touch of Closurehood
Normalization seemed easy at first. Just make sure a given configuration doesn't mention a given element more than once. This simplicity is entirely absent, though, and Puppet has artifacts from multiple bad attempts at providing it.
First let me apologize for how rambling this post is. It's obvious that I barely know what I'm talking about here and that this topic is right at the edge of my competence. I think this is the last big design issue to resolve before a full release, though, and I'm cleaning all of this up right now, so I figured it was important to at least try to lay out where the problems are and what I'm using to make decisions.
What's In a Name?
The first set of ugliness is more about a lack of elegance than it is about a
lack of functionality. For many elements you need to configure, the "name" of
the element is essentially
isomorphic with the element
itself. No one will look at you funny if you say "the file /etc/passwd"
rather than "the file named /etc/passwd", nor will people think you're crazy
for calling the process that happens to have the name apache "the apache
process".
This realization led me to design Puppet around using the tuple of object name
and type (e.g., file and /etc/passwd or service and apache) as unique
identifiers for elements. This tuple is how Puppet uniquely refers to each
element, and configuration normalization is actually normalization of these
tuples.
This led down some stupid paths at first. For instance, I initially required every specified element, including components and classes, to have explicit names, because otherwise I couldn't verify that they were normalized. But then I realized that I didn't want to normalize the high-level structures, only the lowest level elements, so I had to redo the language a bit to make those names optional.
The next problem was that some objects don't actually have explicit names. Cron jobs, for instance, not only don't have names, they don't even necessarily have a unique field that could function as a name. In cases like this (and cron jobs are the only case like this so far), I require a name but it's an arbitrary name. In the case of cron jobs, I store the name as a comment before the job. It actually works relatively well, because you can name the cron job for why it exists, but it's still arbitrary and thus kind of silly.
Finally, you have the problem where elements are considered the same thing
even though they have different names. You might have a single class that
manages the syslog daemon on all of your Unix platforms, but the service
name on the different platforms varies considerably. As a human you consider
all of these services to be a single element, but Puppet can't determine that.
I have some ideas for how to fix this, but none of them seem good enough to be
worth implementing just yet.
Of course, there are also elements with more than one valid name. Given a file with multiple hard links, which link is "the" link? How do you decide?
Opposing Needs
The real ugliness comes in at the intersection of recursion, specificity, and security policy (or any policy, really). All of our closure-like behaviour has been based on this name/type tuple, and all of it can be effectively at parse-time and can take place on the central server. There are factors that further affect closurehood but which generally need to take place at run-time, on each individual server.
Recursion and Specificity
Recursion is necessarily a run-time action. The manifests
are parsed on the central server, which does not know which files actually
exist. When you do a recursive operation on /etc, Puppet does not know
which exact files are managed until the configuration is applied to a specific
host.
It's true that the server-side language could be extended to realize that files do recursion, but that either precludes the development of other recursion types or requires that every recursive type gets hard-coded into the language, which I do not think is acceptable. Even then, it could be defeated by something as simple as symlinks or hard links. Without examining the local system state, it can be impossible to know whether two statements actually conflict.
Recursive operations often lay out a base configuration which can get overridden by more specific statements, such as in the following snippet:
file { "/etc/": recurse => true, owner => root }
file { "/etc/apache/auth": recurse => true, owner => apache }
We do not want this to be considered a conflict, of course. To solve this
problem I've built specificity into Puppet, where objects are either
explicit or implicit. Explicit objects automatically override implicit
files, but conflicts between objects of the same level of specifity are
considered errors.
Because of the nature of recursion, this is inherently a run-time operation.
Security Policies
Most organizations have a security policy maintained by a completely separate
group from the administration group. This policy defines a base-line that
provides the minimum level of security, such as forbidding certain programs
from running or requiring that there be no group-writable files in /etc.
The thing is, though, the security policy is generally responsible for providing the minimum acceptable parameters, not for actually determining final state. Sure, sometimes it provides for only a single acceptable parameter, such as a service not running, but it generally provides minimums or maximums or ranges, not individual values. So, we need to support these security overlays plus the actual configurations without considering them to be conflicts.
Of course, nothing is quite this simple. We also need to support allowing overrides when necessary -- I've been at plenty of companies that had to break their own security policies in order to make a certain old service work correctly.
It seems like this could be a parse-time operation, because you just have to determine whether provided values fit the baseline. The problem is that you cannot actually do this without some sort of value typing, and you actually do need to be local to the server for some aspects.
If the security policy says that no files in /etc can be world- or
group-writable, then you either need to have a "file mode" type that can
handle masking, or you need to do the testing on the client which inherently
understands file modes.
Other statements get interpreted differently depending on the system. If the security policy refers to a file's owner by name but a different statement refers to the owner by number, the client itself must determine whether those different labels refer to the same user or a different user.
Parse time vs. Run time
So, some aspects of normalization and closure-seeking happen at parse time on the server, and some aspects unfortunately must happen on the client. Just tonight I went through and clarified (and wrote up unit tests for!) the server-side tests. This code is now much cleaner, and I think it does a good job of requiring the desired behaviour.
There is still much to do. I already have overrides based on specificity working in the library, but I don't have any mechanisms for a security policy overlay. At this point, I am going to go with less functionality, rather than bad functionality, meaning that I am not planning on supporting any kind of overlay for now. The only run time normalization operations will be overriding based on specificity.
The Current Solution
My current plans are to rely as much as possible on server-side checking. There actually are some override operations supported at parse-time based on server class hierarchies, but because they're parse-time they can be easily fooled by not having enough information. The following snippet will do what you would intuitively expect, though, and is not considered a conflict:
class base {
file { "/etc/sudoers": user => root, group => root }
}
class sub inherits base {
file { "/etc/sudoers": group => wheel }
}
It just overrides the group for servers in the sub class.
For parse-time operations, I plan on continuing to support overrides based on specificity, where implicitly configured elements are replaced by explicit elements when there is a conflict, but until I come up with a decent long-term solution, there will be no other allowed conflicts. I do not think this will be a significant reduction in functionality, and I think it's important to stay simple while Puppet is young and we have not yet characterized exactly how we want it to behave.
Updated slightly on 11/4/05