Home › Monthly Archives › December 2009

Linux vs. Solaris packaging: it’s a philosophical thing

I thought this was a post worth making because this was the hangup that kept me, as an eight-year Linux user, from really getting Solaris.

One of the biggest questions I see repeated all across the Internet is, “why can’t Solaris’s package management be more like Linux?” Criticisms abound both of Solaris’s SysV packaging format and the way that Solaris packages have to be installed. Solaris’s opponents claim that the Linux packaging system is far superior, Solaris’s is stuck in the 20th century, and Solaris has to adapt or survive. OpenSolaris introduced the Image Packaging System (IPS), designed by Ian Murdock, the founder of the Debian project, largely to bring many of Solaris’s detractors back into the fold by providing another way of doing things. But how much difference does it make in the long run for Solaris as a platform?

Many of the questions and doubts about the Solaris packaging model stem from a very Linux-centric way of functioning. What I would like to explain is why the impedance mismatch between Linux and Solaris packaging is not so much a technological divide as it is a philosophical one.

I’m going to start by explaining how FreeBSD does things, because I think it fits neatly right in the middle of the Linux and the Solaris way of managing installed software.

A FreeBSD installation consists of two discrete platforms. The first is the base system, which is a set of system binaries and core services like FTP, NTP, DNS, DHCP and SMTP software. These are considered to be part of the operating system; they are managed by the installer and updated when you update the OS to a new release. The base system is installed under /usr, and other programs not part of the operating system should not be installed there.

The second is the third-party application layer, which consists of binary packages and “ports,” which are instructions for how to build an application from source. You might compare it to Gentoo’s portage system, or maybe to building all of your Red Hat packages from source RPM. The ports system goes beyond a simple “./configure && make && make install” in that it provides automatic dependency resolution, nice GUI interfaces to common compile options, installation registries and pre/post install/uninstall scripts the same way that a binary package manager would. Packages from the third-party ports/packages system are installed under /usr/local, separate from the base system.

The goal of this system is to keep the two layers as orthogonal as possible, meaning that it limits the surface area where they touch. The base system, for example, contains a copy of OpenSSL. But if you build an application in the ports tree, it will pull in its own copy of OpenSSL that will be used by the programs in /usr/local. The idea is that if you keep the two layers as separate as possible, you can upgrade the underlying system trivially without worrying about all of your third-party dependencies breaking on you. You can also keep your third-party programs from breaking your OS upgrade. And unlike in Linux, if you rely mostly on vendor-supplied libraries, it’s still very easy to install very modern software on a not-very-modern version of your OS.

In Linux, the solution to a major operating system upgrade is to back up your important data files, reformat your partitions and create your system from scratch on the new operating system. This is fine for systems of trivial complexity, but becomes very burdensome when you have an enterprise product like an ERP system or a digital collections manager and you would really, really like to just be able to upgrade the OS without everything breaking on you. One of the obnoxious idiosyncrasies of Linux is that when you go to upgrade, your vendor’s new packages may conflict with something in a third-party RPM you’ve installed. Third-party software can actually break your ability to upgrade the base system because everything shares the same hierarchies and you may encounter a lot of unintended conflicts.

Solaris’s packaging system has historically been the SysV package, which provides dependency resolution and many of the other amenities of modern packaging systems, but there was never a delivery mechanism for simple Internet- or network-based delivery. Many organizations NFS mount a directory full of packages. In many ways, it’s closer to Slackware’s idea of packages than most modern formats like .rpm or .deb. Blastwave was the first community organization to bring Internet-based package management, complete with automatic dependency resolution, to Solaris, but it did so with its own packages, not by touching the base system.

Solaris takes the FreeBSD approach to a more extreme level, partly out of fragmentation and partly out of necessity. Third-party packaging groups like Blastwave and SunFreeware operate independently of one another. Because of this, rather than a /usr vs. /usr/local separation, each Sun packaging group basically builds its own platform, isolated in its own directory hierarchy. Blastwave uses /opt/csw, SunFreeware uses /usr/sfw, and the old Cool Stack suite of web stack packages (which is now part of the Glassfish Web Stack) resides in /opt/csk.

The consequence of this approach is that if you, as an internal packager producing packages for your organization, want to take a piece of software and make a SysV package out of it, you need to build the platform underneath it first. It’s not as simple as writing an Apache package, because you need to rely on your own complex hierarchy of libraries too. When you’re now maintaining 40 packages instead of the 1 you really wanted to build, it becomes simpler to just rely on rsync from a reference system instead. And if you’re running OpenSolaris in production (and there are lots of perfectly valid reasons to do so), you probably don’t want to rely too heavily on vendor-supplied packages because the distribution is a moving target that changes dramatically every six months.

In many environments, the orthogonal-platform approach isn’t a bad thing. You’re probably dealing heavily with change control in the enterprise anyway, and it’s nice to not have to worry quite as much about a Solaris patch bringing down critical system services. Visible Ops teaches us that the most highly-available IT organizations patch far less frequently and rely more on good release management processes and testing updates in a group. Essentially, in a highly change-controlled environment, you’re essentially going to be building your own distribution, whether that involves rsyncing out Solaris binaries or manually creating well-tested update channels in a Red Hat Network Satellite server. And as with FreeBSD, when you need to perform a major OS upgrade on a highly complex system, it dramatically reduces the chances that something is going to break as a result of the vendor’s updates.

In many other scenarios, it is a bad thing. Many server configurations are very simple — LAMP stacks or Mailman servers, for instance — and you don’t need to put the same effort into maintaining them that you would an ERP or CRM system, a single sign-on portal or other important enterprise services. If the system breaks horribly, it can be rebuilt very easily. For the majority of organizations, most systems are like this, and the ability to very quickly bootstrap a system with needed services is still a big draw to the enterprise consumer. And from a security perspective, keeping four different copies of a library on your system, that are all used by different programs, means that there are four times as many security updates to make, and four times as many chances to let something slip through the cracks. Often it means several different configurations to maintain. For this reason, many organizations ignore Blastwave entirely. (Lots of others spurn third-party packages entirely out of security concerns, quite understandably.)

Linux attempts to create an all-inclusive platform where all software is on the same playing field, so to speak. Third-party packages rely on system libraries in the same way that the vendor’s packages do, for better or for worse, and everything benefits from (or breaks from) updates to system packages. For minor updates, this is a great thing. For major updates, this prevents the majority of systems with sufficiently complex configurations from ever being able to perform an in-place upgrade. The downside is mitigated a little bit by the fact that the package management system makes it quite a bit easier to get the new system up and running again.

But what makes Linux special among these three approaches is that there’s absolutely nothing keeping you from designing your own isolated platform using your own dependencies, just like you would on BSD or Solaris. BSD and Solaris try to enforce this separation, while Linux gives you enough rope to hang yourself with if you’re so inclined.

There’s perfectly valid reasoning for all of these approaches, and I don’t think it’s a bad thing that administrators are able to pick which platform to use based on the situation. It’s important to remember that Solaris isn’t lagging in the 20th century — it’s just a grizzled war veteran who understands the realities of enterprise IT administration.

On revision control workflows

Chris Siebenmann wrote another really thought-provoking piece on how sysadmins and developers use revision control differently. There’s a couple of things that I really agree with, and a couple that I think are pretty telling of systems administration as a profession. I think, in many ways, that the way developers do things is correct, and the way system administrators do things isn’t correct. This isn’t because developers are, in general, smarter or more regimented — that’s an apples-to-oranges comparison that I’m not even going to begin to approach. But there are some limitations in how developers test that makes their workflow more oriented towards identifying broad problems before the customer does. This focus on reproducibility and testing is something that sysadmins could really learn from.

Here’s the part that a lot of us take for granted:

Here is a thesis: sysadmins use modern version control systems differently than developers. Specifically, sysadmins generally use VCSes for documentation, while developers use them for development. By this I mean that when sysadmins make a commit, it’s for something that is already in use; for example, you change a file in /etc and then commit in order to document when and why you made the change.

This is very, very true. Revision control systems are best used for change control, not just by administrators, but by developers as well (see “blame” and similar commands in most VCSes). I very much advocate this approach. For minor changes that can result in only minor performance regressions or other trivial breakages, it’s much simpler to design a system where regressions can be rolled back easily, rather than one where every tiny little change requires dozens of administrative hurdles that prevent the administrator from, you know, doing their job. If you have a good way of combining changesets into an easily-displayed view (I use Redmine to aggregate subproject activity), then it’s really easy to see exactly what changed on a system, when, and why.

But I think this part of the post requires a little more scrutiny:

There are a number of important features of modern VCSes that are basically irrelevant if you are only using them for post-facto documentation. One obvious example is cherry-picking only some changes to commit; because all of the changes are already live, committing only some of them means that you are not documenting some active changes.

(There is some point to the ability, but needing to do it generally means that either someone forgot to commit several changes or that there was a crisis in a mixed directory.)

Sysadmins can use VCSes in a more development mode, but I think that it is somewhat foreign and is certainly going to take not insignificant amounts of work. (Consider the problem of testing your changes before you deploy them into the live version of the repository, for example.)

If you’re pushing changes that you haven’t tested into a production environment, then you’re probably doing something wrong. I hope this isn’t construed as an inflammatory statement, because I work in education too, and I understand the realities of that particular environment. This definitely isn’t meant as a knock on Chris, since I’m stuck having to make some of the same hard decisions (and they often leave a bad taste in my mouth). But for many of us with saner environments to manage, I think we can learn from it if we look a little more critically. The great challenge for me over the last two years has been wrangling and getting control over a maddeningly cobbled-together environment that, to use a predecessor’s soul-crushing term, “grew organically.” (The hidden truth in that statement is that crops grown organically have no pesticides.)

Developers, for the most part, are forced to work in separate development/production “environments” out of necessity. In its most basic form, this might have the development environment being a working copy while the production copy is the latest stable release on the website. People who write programs generally at least do a cursory test on their own testbeds to make sure something works before pushing it out to a customer. Sometimes, but rarely, it’s impossible to reproduce a particular issue on the development system, and squashing bugs involves a lot of guess-and-check work. For the most part, the developer is able to verify that a change works as intended before putting their change into production (releasing a new version).

There’s not many developers who solely release nightly builds or development snapshots of projects that are considered production-ready. The ones that do tend not to be very successful. However, this is precisely the mentality many administrators take when managing systems. There’s some fundamental differences between the models, of course — a developer can’t force a user to upgrade their broken version while a hosted service can often be fixed transparently and with minimal interruption — but can’t we do better where it counts?

This takes a kind of diligence not often seen in the realm of systems administration. This is largely because it’s often not required, and largely because it’s really difficult. In many cases, there’s also substantial cost issues with licensing software for testing purposes. Most organizations, and the people who support them, can’t afford the man-hours to be constantly setting up clones of complex, interconnected and interdependent systems in order to test simple changes when those systems aren’t directly linked to generating revenue. Even with deployment automation tools like linked clones in VMware ESX, it’s extraordinarily difficult to perform this kind of testing correctly. Much of the time, there’s really very little reward and very little incentive in doing so.

I’m not convinced that this is because of any inherent complexity. I think that this is mostly because we, as smaller-scale system administrators, tend not to deploy our configurations correctly in the first place, and this makes it very difficult for us to create a good test environment programmatically. Large enterprises have it easy — large numbers of homogeneous systems make it easy to push identical or nearly-identical configurations out to a ton of grid computing nodes. For all of the complexity saddling organizations like Google or Goldman Sachs, the simple process of pushing configurations out onto cluster nodes probably isn’t one of them. However, in situations like academic research institutions where you have huge amounts of heterogeneity and you’re forced to produce a huge number of one-off system configurations, things become very tricky.

But we’re pushing into 2010 now, and we can’t complain that we don’t have the tools any longer. Cfengine has been tolerable for a number of years, and better tools like Puppet, Chef and Cfengine 3 are beginning to gain a lot of traction. I think that at this point, it should be very easy to set up repeatable build environments, as long as we have the diligence to keep all of our configurations, or at least everything relevant to infrastructure, managed through a proper configuration management engine. Through proper use of subprojects/submodules, or whatever functionality is provided by your VCS of choice, it should be extraordinarily simple to perform the branching/merging necessary to perform parallel system development in staging/production trees. With virtualized environments as pervasive and ubiquitous as they are, it should be very simple to rebuild a system from the ground up using your configuration management product, and then test whatever you need to test.

Proper release management has been a big part of the corporate IT culture for decades. The idea isn’t that change is bad; you’ll find in many organizations, like Facebook, that change drives progress forward and provides a lot more competitive advantage than being unnecessarily risk-averse. However, I think that the small guys have a lot to learn from the more optimized IT shops when it comes to understanding that proper testing practices can go a long way in making life easier for your users. That, in the end, is what we need to strive for. While the ability to roll back changes is nice, it’s better to have a consistent and well-tested platform that’s consistent among all of the systems that you manage. With a good configuration management system, you can roll back the appropriate changes in parallel among all of your systems automatically.

Linux fails to escape screensaver malware

Screensavers, smiley packs, little animated desktop companions and their ilk have, for a very long time, been a big part of the Windows malware ecosystem, because they’re the kind of thing that specifically appeals to the type of user who doesn’t know any better. For awhile, Linux has managed to avoid this, but a screensaver on gnome-look.org has been found to do very bad things:

Malware has been found hidden inside an innocuous ‘waterfall’ screensaver .deb file made available on popular artwork sharing site Gnome-Look.org.

The .deb file installs a script with elevated privileges designed to perform a DDoS attack as well as keep itself updated via downloads.

The dodgy screensaver in question has since been removed from gnome-look and this incident was a very basic, if potentially successful, attempt.

If anything this incident highlights the need to be careful what you download and where you download it from.

Nothing new in the Windows world, of course, but a pleasant reminder that Linux intrinsically do anything to prevent users from doing stupid crap.