Puppet: System Administration Automated

Ruby has a distribution problem


I've been doing a better job of reading development books recently (e.g., Domain Driven Design), and something has really begun to stick out at me. There seems to be a split between those developers who write software that is expected to run in one place and those who write software that is expected to run in many places.

If you, as a developer, know that your software will really only be installed at a single customer (whether that customer is your employer, a consulting client, or whatever), then your life is drastically easier -- you don't usually have to worry about cross-platform issues, and you don't have to worry about different users having different needs, because you only have one user.

Obviously there's no inherent problem with having the simpler life of a developer with only one user, but it seems to me that the Ruby community is, as a group, largely adopting that perspective as the default. This is worrying to me, because I'm building an application that I expect to be installed in thousands of locations (in fact, it's probably already installed in thousands of locations). I'd like to take as much advantage of existing Ruby code as possible, but it's not exactly easy.

For example, rubygems (probably my least favorite ruby software of all time) basically require that you always try to load them, because their design stupidly requires that you know whether a given piece of software is installed via rubygems or some other mechanism. For instance, if you've installed the Facter gem, then this code doesn't work:

ruby -rfacter -e 'Facter.to_hash'

Instead, you have to do this:

ruby -rubygems -rfacter -e 'Facter.to_hash'

The reason is that rubygems installs in a location that Ruby doesn't search by default. The reason for that is that apparently this one guy, somewhere, wanted to have multiple versions of a given package installed at once. Who wants this? Let's just say it's not the guys who are distributing hundreds and thousands of copies of their software.

The truth is, most Rubyists don't even seem to use gems this way -- they tend to create a vendor subdirectory in their project, and then install their gems there. This is a clear example of how little they expect to have their projects distributed. These gems might be compiled, they might conflict with installed software, they might require installed software -- you have no idea, because it's an entirely separate repository of packages.

This is basically anathema to how I think about management, yet it's the standard, recommended practice in the Rails community, because it makes it easy to "guarantee" behaviour in a given environment. Of course, your guarantee is only good if no one ever tries to run the software anywhere except an exact duplicate of where you run it.

I tried to have a conversation about this at the Ruby Hoedown last year -- my claim was it was difficult to turn a Rails project into a native package, especially with the tendency toward requiring all kinds of random gems. Quite a few people kind of stared blankly at me and said, multiple times, "I just put it in vendor." Since then, this has become my go-to phrase for describing the Ruby way of solving distribution problems: "I just put it in vendor." I keep waiting for someone to try to put their kernel or web browser in vendor: "We only support the Firefox copy in vendor, sorry."

I don't know if other communities are any better at this. From what I can tell, this is basically how the Java community behaves, too. They have pathologically bad distribution systems, and as a language it seems to be most influenced by consulting shops developing huge, worthless software projects for large enterprises, rather than developing distributed applications that will be installed in thousands of locations.

I'd like to think that Puppet would have some counter-affect to this. It's one of the largest and most sophisticated publicly available Ruby projects, it's already installed in at least hundreds and probably thousands of places, and it does a pretty good job of working nearly everywere. However, I keep getting blank stares when I talk about this with other Rubyists, half the time I'm called a troll for even bringing it up, and when I explain why Puppet exists to most Rubyists, they just say, "I just put it in vendor", or, maybe, "Why not just use Capistrano?" To that I ask, how do you install Capistrano, but you know what they say to that.

I think Rails is a big part of the problem. Rails is clearly created by a company that will never distribute its software, and the Rails philosophy is again almost pathologically opposed to the idea of turning your software into a package. Imagine trying to make a Rails project LSB compliant -- your database.yaml file would need to be in /etc, your log directory would need to be in /var, and your actual code would need to be in /usr. There went all of your fancy Rails "convention over configuration", and you're suddenly fighting Rails instead of using it, and everyone you ask for help just tells you to "put it in vendor".

I'm looking at creating a new application that I'm planning on distributing, and one of my big goals is to be able to distribute the core in one package and various additional pieces of functionality as separate packages. I'll need to simultaneously support as many of my customer platforms as I can and provide a consistent operating environment for my packages. The only way to do this is to have supported operating environments with well-defined dependencies, such as you can almost trivially build in Debian or Red Hat.

For those of you who are thinking, "you could just put it in vendor", or "you could at least use gems", No, I couldn't. Take a trivial example: Say I want to use RRD support in my application (which is likely, in this case). There is Ruby support for RRD, but not in Gem form. Even if there were a gem, though, it would require a native RRDTool package, and, of course, Gems can't specify dependencies on native packages, so I'd be telling my customers, "well, install X gems and Y packages".

Instead, if I use native packages (say, those for Debian and Red Hat, to cover most cases), I can define clear dependencies for all cases. I know Debian provides everything I need, and in the rare case it doesn't, I can provide my own apt repository (and the same for yum). Gems, on the other hand, can really only do Ruby stuff. No, I don't actually want to put glibc in vendor, thanks.

I don't see a solution to this, other than getting more Rubyists distributing their software, but I'd really like to see this issue begin to be approached by the community. I feel like a wolf howling in the wilderness at this point, and if often feels like I'm fighting against my community in order to produce software that hundreds or thousands of people will install, as opposed to just use over the web.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 05 May 2008 | Tags: , , ,


Posted by Russ Allbery at Tue May 6 07:28:09 2008
I have to say, this is one of the things that turns me off of a lot of the newer and more edge languages.  If you think Ruby is bad, try to talk to a Common Lisp or Squeak developer about components and independently-distributed packaged modules, or untangle the build system for Standard ML.  It's possible, but compared to CPAN, it's like the dark ages.

To me, this is one of the really big things that defines Perl as a mature language community: Perl people don't do this nonsense.  There's no "vendor," and you're looked at like you're out of your mind if you distribute copies of other people's modules with yours instead of declaring versioned dependencies in your Makefile.PL.  Python is... mostly there, although they never quite got the CPAN religion in quite the same way and their module handling always struck me as a bad copy without as much discipline.

Java, as you mention, is a complete disaster.  I shudder to think how many independent copies there are of some libraries like log4j on some Java application servers.

You didn't mention one of the biggest problems with this mindset: security.  As soon as there's a security vulnerability in one of those things that everyone just puts in vendor, you are completely screwed.  Good luck finding the four million copies of the vulnerable code and figuring out just what broke in each application when you upgrade it.

Posted by Tom Stuart at Tue May 6 08:58:15 2008
You may or may not be interested in the new gem management features in Rails 2.1: http://ryandaigle.com/articles/2008/4/1/what-s-new-in-edge-rails-gem-dependencies.

Posted by Holger Schauer at Tue May 6 09:13:42 2008
I found your post highly interesting, because I'm always scared away by any "language community solution" such as CPAN, gems or asdf-install (@Russ: that's (one of) the Common Lisp "solution"). I highly value the availabilty of distribution packages and the possibility to specify dependencies.

However, there is a problem inherent in any approach to the problem that is not language-community based: As soon as you cross the architecture line, you're soon facing a situation in which no assumptions can be made. There's simple no way, for instance, you could install a deb-package on a Windows or Mac system. You might say, hey, who cares, my platform is just Linux. But actually that's a problem reduction similar to the "distribute only to one customer". So, having cross-platform distribution is ultimately probably best solved by language-specific solutions.

It's only unfortunate that so many so-called "solutions" quickly turn out to move you from one problem ("how do I distribute on multiple platforms at all") to another ("how do I distribute while retaining my sane mind").

Posted by Jan Friedrich at Tue May 6 10:49:14 2008
ruby -rubygems -rfacter -e 'Facter.to_hash'

Will also not work because the -r option is hard coded into the ruby interpreter and doesn't require installed gems. :)

Posted by Hongli Lai at Tue May 6 12:04:12 2008
Luke, I don't really understand what you're expecting from the Ruby/Rails community. You're dealing with cross-platform distribution issues. That's hard by its very nature. I was a developer in the Autopackage project (www.autopackage.org), a software packaging system that works across multiple Linux distributions. I'm currently the developer of Passenger (www.modrails.com). We've run into many of the issues that you mention here.

If I understand it correctly, you're claiming that vendoring stuff is bad because:
- You cannot vendor everything (glibc, web browser, etc).
- As someone else has mentioned, vendoring stuff creates potential security issues. This is not unlike static linking in C/C++ projects.
On the other hand, vendoring stuff does have benefits:
- Guaranteed compatibility. Suppose you rely on GTK 2.2. On a faithful day, GTK 2.2.5 was released, but accidentally introduced a regression, and now your application fails left and right. Uh oh. If you vendored GTK then that wouldn't have happened. This is not a theoretical possibility: GTK 2.2.15 or something actually broke AbiWord. (I'm not saying that vendoring GTK is a good idea, as GTK is quite large. I'm just pointing out a benefit of vendoring.)
- Less installation hassle. Not all platforms have good packaging systems. As far as I know, Debian-based distros are the only ones. RedHat-based YUM repositories tend to be quite small compared to Debian's APT repositories. MacOS X and Windows don't have native package management systems at all. If your app is cross-platform, then vendoring stuff is a lot easier for both the developer and the end user.

Vendoring stuff is not only common in the Java world, but also in the Windows and MacOS X world. Windows apps tend to bundle all their dependencies (with the exception of obvious stuff, such as Internet Explorer). How many games have you seen that bundle DirectX? Actually I'd say that not vendoring stuff is only common in Linux, and languages that have strong ties to Linux, such as Perl. Here's where Debian's package management system and huge package repositories really shine.

There's almost no common ground in the world of package management. We at Autopackage had to invent our own dependency resolution mechanism (similar to APT) because there's no lowest common denominator, even amongst Linux distros. RubyGems is probably created for the same reason: not all Ruby-supported platforms have (decent) package management, so they just wrote their own. Autopackage experimented with native package management integration (i.e. being able to use the system's native package manager to resolve dependencies) but that proved to be much, much harder than initially expected, and up until today that feature still isn't finished. So I don't think you can reasonably expect the Ruby community to do something about this. It's not a pure Ruby problem: it's a general software distribution problem.

As for being FHS-compliant: I can only say "don't bother" if your application is cross-platform. It seems that only hardcore Linux users care about that. Outside the Linux world, FHS is being criticized by pretty much everyone. Windows and OS X users complain that application files in Linux are scattered everywhere, instead of being self-contained. Scattering files isn't a problem in Linux because of package management, but it is a problem in all other platforms that don't have decent package management. I've found that being FHS-compliant is more trouble than it's worth.

So how do you solve this problem? I don't think it's possible to come up with a general silver bullet solution. So that leaves the following choices to you, the developer:
- Create a native package for every platform that you support, i.e. .deb for Ubuntu/Debian, .rpm for RedHat, another .rpm for Mandriva and SuSE because their package names are different, a .exe for Windows, .dmg for MacOS X, .tgz for Slackware, .??? for Solaris, etc.
- Write a cross-platform installer. Passenger/mod_rails chose this option because it's a lot easier than the first one. Passenger depends on Ruby packages (Rails, fastthread, Rake, etc.) as well as native packages (GCC, Apache, APR). It checks whether all dependencies are available, and if not, it tells the user how to install those dependencies. We've put platform autodetection and Linux distro autodetection code in the installer. So on Debian/Ubuntu it would tell you to run "apt-get install apache2-prefork-dev" while on Fedora/RHEL/CentOS it would tell you to run "yum install apache-devel". We've found that this approach works extremely well.

Finally, we at Autopackage fully recognized the pros and cons and vendoring/static linking. Autopackage recommends the following: dynamically depend on stuff that are common, but vendor/static link stuff that are uncommon. We believe that this is a good trade off between the pros and cons. Passenger follows this recommendation as well: we vendor the Boost C++ library. Few people have Boost installed, and when they have it installed it often isn't the version that Passenger requires. Installing Boost is a huge, huge pain on MacOS X. In this case, the benefits that vendoring Boost gives us outweight the cons by far.

On the other hand, Apache is fairly common, and easy to install on most platforms. Rake, fastthread, etc. are also easy to install because of RubyGems. That's why we dynamically depend on those things instead of vendoring them.

So it all boils down to making the right choices and correctly balancing the pros and cons of vendoring. There's no silver bullet.

Hongli Lai
- phusion.nl

Posted by Aaron Trevena at Tue May 6 12:09:38 2008
Holger,
CPAN provides "availabilty of distribution packages and the possibility to specify dependencies.".

Russ,
Neither PHP nor Python are even close to CPAN, not even ye olde CPAN of 1998 when I first started using it.

Posted by markus at Tue May 6 12:39:46 2008
I love Ruby.

I dont use gems. I am using my own way to install stuff via ruby, most of the time using setup.rb, even on Windows. I really have no idea why anyone would need gems. But then again, if i use perl, I dont use cpan either. I search once for the URL, put this url in a yaml file, and then have ruby compile it (download and install it).

The vendor problem is another issue from this, Ruby simply isnt as widesprea as i.e. PHP, simply because PHP conquered the www. (It got conquered despite having python here, so both ruby and python should be ashamed that php was such a success on the www).

Last but not least, we need versioned AppDirs instead of the FHS on Linux. AppDirs are a great solution but right now the Linux Distributions embrace the old and crappy Unix way to clutter your filesystem, making upgrades much harder than it should be.

Posted by matt mcknight at Tue May 6 14:38:39 2008
It pains me to see you describe including libraries as a Rails problem when it is such a broader problem. Have you ever dealt with a Java project that is dependent on .Net code with something like J-Integra? I'll take Ruby's problems, thanks.

I think Hongli's comment is great advice.

Posted by Gunnar Wolf at Tue May 6 17:08:44 2008
The problem you describe is, as pointed out indirectly by other people answering, prevalent in developer communities who are not really into Free Software. Yes, cross-platform includes that.
Maybe the main «selling point» would be, as Russ points out, security and maintainability of all the libraries, not going project by project to check for updatedness... I try to keep the plugins installed inside my Rails projects checked out as 'externals', so SVN is aware of their respective origins, but every day more projects are moved into git. And anyway, updating a given library still requires entering who-knows-how-many individual directories and issuing the updates in each.
Anyway, back to my point: We Free Software people know that distributions do a great work on packaging software, and we can (mostly) just depend on their packages. Maybe the problem would be most aptly solved if dependency handling was a bit different - Maybe application authors should keep «vendoring», as you call it, most of their libraries to ease the installation for less fortunate users (i.e. those who do not have a strong package management system - Windows, MacOS users), but not defaulting to libraries found in vendor/ if they can be found elsewhere in the system? Or maybe distributing a «vendor/-populator», which you'd run once you install everything you can via your favorite package-management system, and vendor/ would only get what's not available in there? It does not sound hard to code, anyway. However, it is quite antithetic to The Rails Way and all of its culture... So I don't predict a high adoption rate for it.

Posted by Gunnar Wolf at Tue May 6 17:12:31 2008
As a side rant (and that's why I'm making a separate comment ;-) ), vendoring does not only go with the side libraries in Rails developments. I must confess I have been vendoring quite a bit lately - Rails is a fast-changing beast. And that's not always good. If I developed a project under Rails 1.1.6, it was not a problem when I decided to run 1.2.3, as it was mostly compatible. But then 2.0 came out, aggresively deprecating stuff all over the place and breaking my projects. So, I updated some projects to the 2.0 style - But that's not always feasible. Rails developers suggest you to freeze Rails to the version it is known to work with. So it's not like some minor, side libraries - Your whole core framework gets vendored. And, again, security-wise that's as close as you can get to a mass-crime.

Posted by Luis Lavena at Tue May 6 17:47:32 2008
The Rails way is somehow confused with the Ruby way of doing things.

I found no general issue using RubyGems and generate applications (not web applications) that depend on specific gem to be installed.

Also I have managed to pack some of my Rails applications as gems and distribute in my own gem repository, along with all the gems needed for it, to easily deploy in a intranet.

So I guess is a matter of adaptation. Take as example Eggs (Python) and the way disutils generate independent packages. With distutils there is no way to keep tack of different versions unless you include version numbers as part of the packaging...

Also, you can take a look at what Radiant did, following my suggestion of ship it as gem...

Just my two cents :-)

Posted by Matthijs Langenberg at Tue May 6 18:48:36 2008
Thanks for the comment Hongli Lai, it reminds me that the world isn't perfect and never will be. Just use the things that will fit the best at that moment.

Posted by Stephen Waits at Tue May 6 18:56:30 2008
Hi.. you must be new.

Posted by Phil at Tue May 6 19:00:08 2008
While I agree with much of what you've written regarding dependencies, I really love the fact that gems allow you to have multiple versions of a gem installed at once. As much as I would love to be able to fix the software we use that relies on Hpricot 0.5, it's simply far more practical to say "you use this version, the rest of us will use the latest."

Posted by martin f. krafft at Tue May 6 19:14:53 2008
I wish Puppet were written in some other language than Ruby.

Posted by Wen Hosting Sri Lanka at Tue May 6 19:19:36 2008
Quite interesting post.

Posted by Yann Ramin at Tue May 6 20:20:41 2008
I'd agree with most the points, except for multiple versions.

Sometimes the API changes between releases, and you really DO want to have multiple versions of a particular package installed (assuming both are "maintained for security purposes"). This is usually a tricky case if you don't design for it from the start. But forcing you to add them to the path manually/via a switch is not a good solution

Java at its core is messy and has no package system. If you add a tool such as Maven2, it can provide per-product package resolution, which is very useful. But the normal Java approach of lumping any and all JARs of various versions together in one product is a nightmare (especially when JARs are embedded in each other!).

Posted by Russ Allbery at Tue May 6 22:58:49 2008
Note that when I say CPAN, I don't mean the CPAN module, which is a neat solution for a problem that you really shouldn't have.  I cringe whenever I see anyone use that, since again it's bypassing the local packaging system.

The right way to handle this is to create native packages and comply with the OS conventions for how to lay out the contents of those packages.  Everything else, no matter how well-intentioned, is fundamentally flawed and is going to lose in the long run.  I feel a little like I did when I was arguing with AIX administrators 10 years ago that AIX was a dead platform that just hadn't stopped breathing yet.  Once you've seriously used a package-based management system like Debian's or even Red Hat's, there's just no way that you'd want to use anything else because nothing else scales in the same way.

So, from my perspective, the goal of the language distribution method is to make it simple and easy to package the results using whatever native packaging system is in play.  And that is the beauty of CPAN, not the CPAN module and its auto-install thingie.  The beauty of CPAN is that it's a source code repository with hundreds of separately-installable source packages that all use an extremely standardized build infrastructure that supports all the options required to install them in appropriate locations for native packages.  It's much simpler and less sexy than the nifty management features like gems, but the key point to realize is that gems is much less nifty and sexy than the native management features of the local OS packages and much less widely used and cannot handle anything that isn't Ruby.

The goal is to let the OS manage packages; that's what it's good at and what it was designed to do.  The language extensions need to deliver source and a standard build system that ties as seamlessly as possible into that process.

It's telling that packaging a Perl module for Debian, except for the human metadata like descriptions and copyright information, can be almost completely automated.

Posted by Fadzlan at Wed May 7 00:14:02 2008
For Java, there's Maven. If course, its not the default, but its there for anybody who actually cares about dependency management. And you don't actually install libraries in the computers, as the libraries always come with the deployment. For the case of different version of libraries in your project and the application server, then there's a different classloader. Of course, its not perfect. If your libraries have dependencies that conflicted with each other, you are on your own.

Or does it not solve your current concern?

I agree with Hongli Lai in the sense there is no silver bullet in this. You have to pick what works best depending upon the situation. Vendoring can be an easy solution when there is just one customer with no intention of future upgrade. Heck, sometime people just dump the libraries in /lib and check it in the cvs. Not a good practice, but in most cases the libraries almost never change during the duration of the project. Of course, different situation will warrant different solution. Pick what is good for you.

I also don't agree that a centralized solution is useless. There are pros and cons of implementing either type of solution. And just because if we use Java/Ruby/.Net/etc, it wont change the client's requirement and their accepted compromises between centralized and distributed. And yes, cost is also a major factor.

Posted by Eric Hodel at Wed May 7 02:40:59 2008
As of Ruby 1.9, RubyGems is active by default, so `ruby -rfacter` will work provided you have the gem installed.  For ruby 1.8, you can set the RUBYOPT environment variable to -rubygems to get this behavior.

As a long-time rubyist and as a RubyGems maintainer, putting gems in vendor is not the solution I would have chosen to make Rails behave correctly.  I'm not sure why they chose that route.  (Their separate plugin system only serves to further confuse matters.)

I don't believe "most Rubyists" take the vendor path.  It might be true for most of the people who write Rails apps, but plain Rubyists more-often use RubyGems' dependencies instead.

I'd like to know a good way to specify native package dependencies in RubyGems, but I'm not qualified to come up with one that would be extensible enough to work with apt, yum, the various BSD ports trees, MacPorts, etc.

Posted by Jesse at Wed May 7 03:32:58 2008
It's nice to see something nice being said about the CPAN way of doing things, but there is some merit to being able to take a known good set of dependencies for a given project, track them, version them, and package them up for distribution.

We build RT, which, at least for many years, was probably the Perl equivalent of Puppet in terms of "widely deployed packaged application with a bunch of dependencies."

We have tools to automatically install deps from CPAN as users are setting RT up, but that  doesn't protect the user in cases where the well-specified, publicly available package gratuitously and capriciously changes its API without warning.

To this end, we've been working on Shipwright, a packaging, distribution and build system for projects like RT and Puppet.  It lets you track a package, its dependencies, build instructions and so on inside a version control repository and generate 'blessed' snapshots of a package's ecosystem, complete with a single top-level build script.  The binary packages it builds wrap all binaries and scripts to make them as fully relocatable as we can, so your users can just copy them to wherever and things should 'just work.'

Shipwright is still quite young and has a bit of a Perl bent, but it sounds like it might deal with the sorts of problems you're having, just as it's dealing with ours. If it seems like it would be useful to you, we would be THRILLED to have more contributors.

You can find the code at http://code.bestpractical.com/bps-public/Shipwright and the list at http://lists.bestpractical.com/mailman/listinfo/shipwright

Posted by Donavan at Wed May 7 03:58:52 2008
@Russ

On your first comment, you mention security vulnerabilities. The problem with multiple versions and copies of software is not the fact that there's 2000 different copies per se, but that the app developers generally won't release just to get that dependency updated. You'll get a feature release you're ready for, or you'll not get anything.

On your second post, I'd be very curious why you can't turn a Gem into a native package easily. I've created several of them on the openSUSE Build Service. Debian has made life hard on themselves. A deb could be built quite easily from a gem, but the strict FHS compliance places extra barriers to doing it. Multiple versions are a little harder, as you essentially have to imbed the major version info into the package name, but it's still doable

@markus

AppDirs vs. FHS actually boils down more down to desktop vs. server. On desktops, keeping things more self-contained helps users with install and config. On servers, however, things are made a lot different because there are lots of different hardware combinations that a system can be built from, and FHS makes more sense when you have disparate hardware (SSD with magnetic with RAM with whatever). I don't know if there's a winner here. Even Apple has kinda taken both approaches. I'm very curious as to how they lean with their servers.

Posted by Aaron Blohowiak at Wed May 7 05:14:44 2008
almost everything our system requires is just a update of yum away. we roll up our rails apps as RPMs and our dependencies too. At this point, all it takes is a `make` in our app folder. for rubygems, the process has about 4 steps (automating that is on my todo list)

you're right when you say that installing a RedHat or  debian system is easy.. why reinvent the wheel?

-Aaron

Posted by VIdar Hokstad at Wed May 7 13:24:30 2008
Personally I install anything that's intended for production servers as RPMs. No Gem's installed via Gem. Ever. Yes, that means turning assorted Gems into RPMs, but that's luckily not so hard...

Frankly, I wish Ruby had gotten RPM or DPKG or another existing system rather than it's own. RPM at least is trivially portable to other POSIX'y platforms (I remember installing RPMs on Solaris back in the late 90's, for example), and I can't imagine it would be too hard to get it running on Windows either.

Barring that, a version of Gem for the major distributions that updated the local package managers database so that gem's could be uninstalled via the normal tools would be another tolerable option if Gem could also be made to support installing packages using the native package manager.

Posted by Donavan at Wed May 7 13:53:12 2008
@Vldar

I've considered that concept for a long time. However, it doesn't really work well in the long run. The reason that RubyGems good in this problem domain is because it's pure Ruby, and so will run on  any system that can build Ruby. RubyGems will run just fine on Slackware, no extras needed. Using a native packaging system limits your ability to interact with the world around you. It's all just plain hard to support.

If you want integration with RPM or DEB, simply run gem install in the SPEC or control file, and boom! Instant native package. Having gem update the RPM database is not really something suited well to RubyGems anyways (heck, the only database that's in RubyGems is the source cache index, and that has had structural changes quite a few times, thus a cache. lol)

Posted by Adam Kennedy at Thu May 8 08:29:05 2008
Speaking as one of the CPAN admins, and someone that works on improving it, I would say that Russ Albery has got it (mostly) right.

It's important to understand there is a BIG difference between a source repository and a binary repository.

Source packages should be inherently cross-platform. That's going to mean that resources should be split up by type, that dependencies should be able to be determined in code (so you can have different dependencies on different hosts) and everything should be as standardized as possible within those packages.

You also need to remember that you have several VASTLY different audiences.

You have one big user group that are distro customers. These people should NEVER be installing source packages directly, they should be installing distribution-specific packages.

So it needs to be very very easy to compile a source package into a distro package, discarding superfluous test scripts and test-time dependencies and providing libs and bins and docs and whatever else is needed in the locations that the distro dictates.

At the CPAN QA Hackfest in Oslo last month, it was no accident that we had attendees not just from Perl CPAN people, but also people (sponsored in some cases) from several different distributions, and one of the big results from the hackfest was several improvements that got made to make it even easier to build OS packages from CPAN packages.

Source packages do sometimes also suffer from bleed errors on a subset of platforms. Repository packagers also help protect against bleed bugs in the source packages by adding an extra layer of testing (this is in fact the primary role of the Debian Perl group, the packaging itself is almost entirely automated).

No non-developer should need to install or use the the source repository toolchain directly.

For developers, you often need to go beyond the subset of packages available on the OS. They also need access to a more diverse set of packages, they WANT to install broken bleading edge source packages directly, to test them out and see if they work, to take over maintenance, to get access to some critical new feature that hasn't filtered down to the distro packages yet, etc.

It's THIS audience that you write your source repository clients for, and for this audience it's not a big deal that you can't specify external dependencies properly. This group can, largely, take care of themselves.

And finally you have the "Windowsy" group, which we can call the vendor group. These guys are generally working in an area with no packaging system, and they want ONE single installer for their product, all the dependencies, and ideally for the language itself.

But again, the vendor group should be able to derive their stuff automatically from the source repository packages.

By having this sort of workflow, you can support the needs of all these diverse groups.

THAT's why the CPAN is so powerful, not because it has a superior dependency system, or better tools, but because it can target ALL of the different downstream modalities (some better than others).

This means that to release on Debian, and FreeBSD, and Windows, the author just pushes a release to CPAN and then can happily forget about what happens downsteam.

But the developer-mode "install from source repository" is still the canonical case. Binary packages and vendor bundling needs to be treated as a specialised case of the source installtion, not as a primary target.

Name:


E-mail:


URL:


Comment: