Skip to content

Categories:

Read your logs. You’ll be amazed what you find.

Oh, I know. You’re not like those other admins. You always read your logs. You set up Splunk and OSSEC and you have all kinds of alerts and custom event types set up and you read every log message you haven’t explicitly whitelisted from your reports, right?

Even if you’re a compulsive syslog reader, there’s a chance you’re still not reading all the right logs, because there’s a lot you can figure out about tiny little configuration problems with a little bit of network analytics, as I found this week.

I’m in the middle of a major external DNS cleanup. For reasons I’m not going to go into on this blog, my organization has an absolutely huge number of entries in external DNS that have no right to be there. As a result, I’ve turned on DNS query logging to monitor, for a couple of weeks or months, what names are actually getting queried, so we can work out any misconfigurations with our firewall team. While I was scripting some small tools to help me with this, I figured I’d analyze the data in a couple of other ways too.

I found:

  • DMZ servers misconfigured to query Active Directory domain controllers (!) that had ceased to exist years ago
  • Samba servers in the DMZ attempting to connect to internal WINS
  • Hosts with nscd turned off making 30,000 DNS queries per day for the same hostname
  • Internal client machines inexplicably configured to use external DNS
  • Servers trying to resolve every domain name as a subdomain of our organization’s domain

The idea was to keep query logging running for just as long as I needed to do this project, but I’m getting way too much useful information out of it to turn it off.

Posted in Sysadmin. Tagged with .

Is too much automation harmful to the IT middle class?

Information technology, as a cost center, is heavily geared towards the reduction of costs, and in well-engineered IT shops, improving and streamlining the business’s processes. But is automation sometimes going too far, and reducing our ability to learn? As we automate to new levels, leveraging commodity hardware to build resilient systems on the grid and in the cloud, are we making release engineers into a ruling elite, creating a new operations underclass? And if so, is it really a bad thing, if we improve all the metrics of IT efficiency?

I first considered this question a little over a year ago, when my employer was trying to hire another Linux engineer. We had interviewed someone who had worked in operations for an enterprise storage vendor, working with Linux systems on a day-to-day basis, who could not even begin to explain the Linux boot process. He didn’t understand how it worked because he had never rebooted a system at his job, possibly had never even seen a Red Hat system boot up. These weren’t the kind of troubleshooting skills we were looking to hire; in fact, I can’t imagine what marketable skills he managed to pick up in his entire time there, unless “reading logs really carefully” is a top selling point for an engineer these days. Anecdotally, this appears to be representing a growing percentage of the IT workforce.

Similarly, in the desktop space, there are a lot of people employed in desktop support positions where they don’t actually learn to troubleshoot anything, because if a desktop starts exhibiting a problem, they simply reimage it. This problem will only be compounded as virtual desktop infrastructure reduces the need for desktop technicians to even support hardware. As things move back onto the network, into elastic compute capacity in DRS clusters, many of the roles of desktop support may be variously reduced to “re-deploy VMware template” and “replace dumb terminal; throw old one away.”

I can’t even imagine this perspective. My career began with a small (now defunct) web hosting company, where I had the opportunity to learn all kinds of heterogeneous systems, applications, and programming languages in order to support features that end-users wanted. If my job had consisted of “read logs, identify problem, push button” then I have no doubt in my mind that it would have taken me five times as long to build the knowledge I need to do what I do now.

The question I ask is this:

Heavy reliance on automation and the provisioning of clean, known-good system states often leads to more reproducible systems, easier problem resolution, and lower margins of error. It allows junior admins to tackle more complex problems than they would ordinarily be able to solve. But are they poisonous to the ecosystem of engineers by raising the barrier to entry to impossible levels?

I’m something of a fan of the movie Idiocracy. In the movie, an average guy who awakes from a cryogenics experiment 500 years in the future becomes the salvation of an incredibly stupid mankind after he discovers that their crops won’t grow because they’re being grown not with water (”like, out the toilet?”) but with a sports drink called Brawndo, The Thirst Mutilator, because “Brawndo’s got electrolytes.” I worry that as the skill divide grows, and people grow too reliant on their tools to manage their infrastructure, we may stop understanding how or why they actually work, and find ourselves members of a cargo cult of systems management.

I know what you’re thinking: this has happened before in every industry in the world. If you want to make a toaster, you don’t need to know how to make your own iron, or generate your own electricity to power it. The problem is that because there are no real material limitations beyond labor, this particular area of IT culture is growing, evolving and adapting much faster than we can adjust. Unlike iron, systems management products aren’t liquid commodities.We can’t drop in ManageEngine or SCCM to replace Kaseya, and we can’t drop in Chef to replace Cfengine. After decades, we still can’t make the Simple Network Management Protocol less than hugely complicated. Because of the lack of interoperability standards, there’s substantial re-engineering efforts involved which often makes it less like replacing a product and more like putting your house on stilts while you reconstruct the foundation. It’s very important to make sure that the divide between release engineers and system administrators doesn’t grow too large, because the sysadmins need to actually know what’s going on.

As we deploy our systems management products, improve our server-to-admin ratios and displace our mid-level engineers, and especially as we evaluate sweeping paradigm shifts (or pendulums) like virtual desktop infrastructure, we need to be mindful that we have a responsibility to the people under our employment. We need to be aware that while we need a job done, their careers don’t end at the desk they’re now sitting in, and if we poison the well now, there’s going to be nothing to drink from later.

Posted in Sysadmin. Tagged with .

Linux vs. Solaris packaging: it’s a philosophical thing

I thought this was a post worth making because this was the hangup that kept me, as an eight-year Linux user, from really getting Solaris.

One of the biggest questions I see repeated all across the Internet is, “why can’t Solaris’s package management be more like Linux?” Criticisms abound both of Solaris’s SysV packaging format and the way that Solaris packages have to be installed. Solaris’s opponents claim that the Linux packaging system is far superior, Solaris’s is stuck in the 20th century, and Solaris has to adapt or survive. OpenSolaris introduced the Image Packaging System (IPS), designed by Ian Murdock, the founder of the Debian project, largely to bring many of Solaris’s detractors back into the fold by providing another way of doing things. But how much difference does it make in the long run for Solaris as a platform?

Many of the questions and doubts about the Solaris packaging model stem from a very Linux-centric way of functioning. What I would like to explain is why the impedance mismatch between Linux and Solaris packaging is not so much a technological divide as it is a philosophical one.

I’m going to start by explaining how FreeBSD does things, because I think it fits neatly right in the middle of the Linux and the Solaris way of managing installed software.

A FreeBSD installation consists of two discrete platforms. The first is the base system, which is a set of system binaries and core services like FTP, NTP, DNS, DHCP and SMTP software. These are considered to be part of the operating system; they are managed by the installer and updated when you update the OS to a new release. The base system is installed under /usr, and other programs not part of the operating system should not be installed there.

The second is the third-party application layer, which consists of binary packages and “ports,” which are instructions for how to build an application from source. You might compare it to Gentoo’s portage system, or maybe to building all of your Red Hat packages from source RPM. The ports system goes beyond a simple “./configure && make && make install” in that it provides automatic dependency resolution, nice GUI interfaces to common compile options, installation registries and pre/post install/uninstall scripts the same way that a binary package manager would. Packages from the third-party ports/packages system are installed under /usr/local, separate from the base system.

The goal of this system is to keep the two layers as orthogonal as possible, meaning that it limits the surface area where they touch. The base system, for example, contains a copy of OpenSSL. But if you build an application in the ports tree, it will pull in its own copy of OpenSSL that will be used by the programs in /usr/local. The idea is that if you keep the two layers as separate as possible, you can upgrade the underlying system trivially without worrying about all of your third-party dependencies breaking on you. You can also keep your third-party programs from breaking your OS upgrade. And unlike in Linux, if you rely mostly on vendor-supplied libraries, it’s still very easy to install very modern software on a not-very-modern version of your OS.

In Linux, the solution to a major operating system upgrade is to back up your important data files, reformat your partitions and create your system from scratch on the new operating system. This is fine for systems of trivial complexity, but becomes very burdensome when you have an enterprise product like an ERP system or a digital collections manager and you would really, really like to just be able to upgrade the OS without everything breaking on you. One of the obnoxious idiosyncrasies of Linux is that when you go to upgrade, your vendor’s new packages may conflict with something in a third-party RPM you’ve installed. Third-party software can actually break your ability to upgrade the base system because everything shares the same hierarchies and you may encounter a lot of unintended conflicts.

Solaris’s packaging system has historically been the SysV package, which provides dependency resolution and many of the other amenities of modern packaging systems, but there was never a delivery mechanism for simple Internet- or network-based delivery. Many organizations NFS mount a directory full of packages. In many ways, it’s closer to Slackware’s idea of packages than most modern formats like .rpm or .deb. Blastwave was the first community organization to bring Internet-based package management, complete with automatic dependency resolution, to Solaris, but it did so with its own packages, not by touching the base system.

Solaris takes the FreeBSD approach to a more extreme level, partly out of fragmentation and partly out of necessity. Third-party packaging groups like Blastwave and SunFreeware operate independently of one another. Because of this, rather than a /usr vs. /usr/local separation, each Sun packaging group basically builds its own platform, isolated in its own directory hierarchy. Blastwave uses /opt/csw, SunFreeware uses /usr/sfw, and the old Cool Stack suite of web stack packages (which is now part of the Glassfish Web Stack) resides in /opt/csk.

The consequence of this approach is that if you, as an internal packager producing packages for your organization, want to take a piece of software and make a SysV package out of it, you need to build the platform underneath it first. It’s not as simple as writing an Apache package, because you need to rely on your own complex hierarchy of libraries too. When you’re now maintaining 40 packages instead of the 1 you really wanted to build, it becomes simpler to just rely on rsync from a reference system instead. And if you’re running OpenSolaris in production (and there are lots of perfectly valid reasons to do so), you probably don’t want to rely too heavily on vendor-supplied packages because the distribution is a moving target that changes dramatically every six months.

In many environments, the orthogonal-platform approach isn’t a bad thing. You’re probably dealing heavily with change control in the enterprise anyway, and it’s nice to not have to worry quite as much about a Solaris patch bringing down critical system services. Visible Ops teaches us that the most highly-available IT organizations patch far less frequently and rely more on good release management processes and testing updates in a group. Essentially, in a highly change-controlled environment, you’re essentially going to be building your own distribution, whether that involves rsyncing out Solaris binaries or manually creating well-tested update channels in a Red Hat Network Satellite server. And as with FreeBSD, when you need to perform a major OS upgrade on a highly complex system, it dramatically reduces the chances that something is going to break as a result of the vendor’s updates.

In many other scenarios, it is a bad thing. Many server configurations are very simple — LAMP stacks or Mailman servers, for instance — and you don’t need to put the same effort into maintaining them that you would an ERP or CRM system, a single sign-on portal or other important enterprise services. If the system breaks horribly, it can be rebuilt very easily. For the majority of organizations, most systems are like this, and the ability to very quickly bootstrap a system with needed services is still a big draw to the enterprise consumer. And from a security perspective, keeping four different copies of a library on your system, that are all used by different programs, means that there are four times as many security updates to make, and four times as many chances to let something slip through the cracks. Often it means several different configurations to maintain. For this reason, many organizations ignore Blastwave entirely. (Lots of others spurn third-party packages entirely out of security concerns, quite understandably.)

Linux attempts to create an all-inclusive platform where all software is on the same playing field, so to speak. Third-party packages rely on system libraries in the same way that the vendor’s packages do, for better or for worse, and everything benefits from (or breaks from) updates to system packages. For minor updates, this is a great thing. For major updates, this prevents the majority of systems with sufficiently complex configurations from ever being able to perform an in-place upgrade. The downside is mitigated a little bit by the fact that the package management system makes it quite a bit easier to get the new system up and running again.

But what makes Linux special among these three approaches is that there’s absolutely nothing keeping you from designing your own isolated platform using your own dependencies, just like you would on BSD or Solaris. BSD and Solaris try to enforce this separation, while Linux gives you enough rope to hang yourself with if you’re so inclined.

There’s perfectly valid reasoning for all of these approaches, and I don’t think it’s a bad thing that administrators are able to pick which platform to use based on the situation. It’s important to remember that Solaris isn’t lagging in the 20th century — it’s just a grizzled war veteran who understands the realities of enterprise IT administration.

Posted in Sysadmin. Tagged with , , , , .

On revision control workflows

Chris Siebenmann wrote another really thought-provoking piece on how sysadmins and developers use revision control differently. There’s a couple of things that I really agree with, and a couple that I think are pretty telling of systems administration as a profession. I think, in many ways, that the way developers do things is correct, and the way system administrators do things isn’t correct. This isn’t because developers are, in general, smarter or more regimented — that’s an apples-to-oranges comparison that I’m not even going to begin to approach. But there are some limitations in how developers test that makes their workflow more oriented towards identifying broad problems before the customer does. This focus on reproducibility and testing is something that sysadmins could really learn from.

Here’s the part that a lot of us take for granted:

Here is a thesis: sysadmins use modern version control systems differently than developers. Specifically, sysadmins generally use VCSes for documentation, while developers use them for development. By this I mean that when sysadmins make a commit, it’s for something that is already in use; for example, you change a file in /etc and then commit in order to document when and why you made the change.

This is very, very true. Revision control systems are best used for change control, not just by administrators, but by developers as well (see “blame” and similar commands in most VCSes). I very much advocate this approach. For minor changes that can result in only minor performance regressions or other trivial breakages, it’s much simpler to design a system where regressions can be rolled back easily, rather than one where every tiny little change requires dozens of administrative hurdles that prevent the administrator from, you know, doing their job. If you have a good way of combining changesets into an easily-displayed view (I use Redmine to aggregate subproject activity), then it’s really easy to see exactly what changed on a system, when, and why.

But I think this part of the post requires a little more scrutiny:

There are a number of important features of modern VCSes that are basically irrelevant if you are only using them for post-facto documentation. One obvious example is cherry-picking only some changes to commit; because all of the changes are already live, committing only some of them means that you are not documenting some active changes.

(There is some point to the ability, but needing to do it generally means that either someone forgot to commit several changes or that there was a crisis in a mixed directory.)

Sysadmins can use VCSes in a more development mode, but I think that it is somewhat foreign and is certainly going to take not insignificant amounts of work. (Consider the problem of testing your changes before you deploy them into the live version of the repository, for example.)

If you’re pushing changes that you haven’t tested into a production environment, then you’re probably doing something wrong. I hope this isn’t construed as an inflammatory statement, because I work in education too, and I understand the realities of that particular environment. This definitely isn’t meant as a knock on Chris, since I’m stuck having to make some of the same hard decisions (and they often leave a bad taste in my mouth). But for many of us with saner environments to manage, I think we can learn from it if we look a little more critically. The great challenge for me over the last two years has been wrangling and getting control over a maddeningly cobbled-together environment that, to use a predecessor’s soul-crushing term, “grew organically.” (The hidden truth in that statement is that crops grown organically have no pesticides.)

Developers, for the most part, are forced to work in separate development/production “environments” out of necessity. In its most basic form, this might have the development environment being a working copy while the production copy is the latest stable release on the website. People who write programs generally at least do a cursory test on their own testbeds to make sure something works before pushing it out to a customer. Sometimes, but rarely, it’s impossible to reproduce a particular issue on the development system, and squashing bugs involves a lot of guess-and-check work. For the most part, the developer is able to verify that a change works as intended before putting their change into production (releasing a new version).

There’s not many developers who solely release nightly builds or development snapshots of projects that are considered production-ready. The ones that do tend not to be very successful. However, this is precisely the mentality many administrators take when managing systems. There’s some fundamental differences between the models, of course — a developer can’t force a user to upgrade their broken version while a hosted service can often be fixed transparently and with minimal interruption — but can’t we do better where it counts?

This takes a kind of diligence not often seen in the realm of systems administration. This is largely because it’s often not required, and largely because it’s really difficult. In many cases, there’s also substantial cost issues with licensing software for testing purposes. Most organizations, and the people who support them, can’t afford the man-hours to be constantly setting up clones of complex, interconnected and interdependent systems in order to test simple changes when those systems aren’t directly linked to generating revenue. Even with deployment automation tools like linked clones in VMware ESX, it’s extraordinarily difficult to perform this kind of testing correctly. Much of the time, there’s really very little reward and very little incentive in doing so.

I’m not convinced that this is because of any inherent complexity. I think that this is mostly because we, as smaller-scale system administrators, tend not to deploy our configurations correctly in the first place, and this makes it very difficult for us to create a good test environment programmatically. Large enterprises have it easy — large numbers of homogeneous systems make it easy to push identical or nearly-identical configurations out to a ton of grid computing nodes. For all of the complexity saddling organizations like Google or Goldman Sachs, the simple process of pushing configurations out onto cluster nodes probably isn’t one of them. However, in situations like academic research institutions where you have huge amounts of heterogeneity and you’re forced to produce a huge number of one-off system configurations, things become very tricky.

But we’re pushing into 2010 now, and we can’t complain that we don’t have the tools any longer. Cfengine has been tolerable for a number of years, and better tools like Puppet, Chef and Cfengine 3 are beginning to gain a lot of traction. I think that at this point, it should be very easy to set up repeatable build environments, as long as we have the diligence to keep all of our configurations, or at least everything relevant to infrastructure, managed through a proper configuration management engine. Through proper use of subprojects/submodules, or whatever functionality is provided by your VCS of choice, it should be extraordinarily simple to perform the branching/merging necessary to perform parallel system development in staging/production trees. With virtualized environments as pervasive and ubiquitous as they are, it should be very simple to rebuild a system from the ground up using your configuration management product, and then test whatever you need to test.

Proper release management has been a big part of the corporate IT culture for decades. The idea isn’t that change is bad; you’ll find in many organizations, like Facebook, that change drives progress forward and provides a lot more competitive advantage than being unnecessarily risk-averse. However, I think that the small guys have a lot to learn from the more optimized IT shops when it comes to understanding that proper testing practices can go a long way in making life easier for your users. That, in the end, is what we need to strive for. While the ability to roll back changes is nice, it’s better to have a consistent and well-tested platform that’s consistent among all of the systems that you manage. With a good configuration management system, you can roll back the appropriate changes in parallel among all of your systems automatically.

Posted in Sysadmin. Tagged with , .

Linux fails to escape screensaver malware

Screensavers, smiley packs, little animated desktop companions and their ilk have, for a very long time, been a big part of the Windows malware ecosystem, because they’re the kind of thing that specifically appeals to the type of user who doesn’t know any better. For awhile, Linux has managed to avoid this, but a screensaver on gnome-look.org has been found to do very bad things:

Malware has been found hidden inside an innocuous ‘waterfall’ screensaver .deb file made available on popular artwork sharing site Gnome-Look.org.

The .deb file installs a script with elevated privileges designed to perform a DDoS attack as well as keep itself updated via downloads.

The dodgy screensaver in question has since been removed from gnome-look and this incident was a very basic, if potentially successful, attempt.

If anything this incident highlights the need to be careful what you download and where you download it from.

Nothing new in the Windows world, of course, but a pleasant reminder that Linux intrinsically do anything to prevent users from doing stupid crap.

Posted in Sysadmin. Tagged with , .

Recording disk statistics with sysstat on RHEL/CentOS

Unlike on Debian-like systems, the default configuration for sysstat’s sa1 collector on RHEL/CentOS does not include disk statistics (like you would get from iostat) in the sa collection output. This is due to a missing flag in the cron.d fragment that calls sa1. The “-A” flag to sa1 defies reasonable assumption about its function, and does not include disk statistics, so we have to specify “-d” manually.

To enable disk statistics collection/trending, edit /etc/cron.d/sysstat and change the following:

*/10 * * * * root /usr/lib64/sa/sa1 1 1

to this:

*/10 * * * * root /usr/lib64/sa/sa1 -d 1 1

(Obviously, replace “lib64″ with “lib” as appropriate for i386 systems.)

Either wait for the next sa log rotation (at midnight) for sa1 to begin collecting disk statistics, or delete your current day’s statistics. sa1, for whatever historical reason, does not add new counters to an existing sa log file.

Posted in Sysadmin. Tagged with , , , .

Interesting links for 11/25/2009

With all the busy-ness that this holiday weekend entails, I’m just going to leave you all with a bunch of links:

  • TaoSecurity has a really interesting writeup about the ethics of Shodan, a “computer search engine” which provides some very interesting tools for people trying to secure their systems or launch attacks on arbitrary ones. At the very least, it’s interesting seeing how many Nortel switches, Checkpoint firewalls and other devices people are actually running with telnet open to the Internet at large.
  • WWLTV in New Orleans ran a segment on how eastern European hackers are increasingly targeting American small businesses and stealing online banking credentials with malware. It’s nothing that you haven’t heard before, but it’s nice to see information security starting to get some mainstream attention, and people finally beginning to become aware of the real financial threat posed by bad information security. Hopefully banks will get the hint and start relying on multi-factor authentication for all business accounts.
  • Last In – First Out has a nice post on cargo cult system administration. Matt from Standalone Sysadmin has an amusing anecdote about it from Tom Limoncelli in the comments.
  • Icinga, the fork of Nagios, has finally released a demo of their new web interface. It’s snazzy in a “wow, some neat technology” way, but I don’t really see it being an improvement at all in the “this makes it easier to do my job” way. Ultimately, I’m not sure how to approach the project — the real fundamental problem with Nagios is that it’s, well, Nagios. I’m not sure how to fork it and make it better without utterly destroying compatibility. (This might not be a bad thing.)

Posted in Sysadmin. Tagged with .

Nagios plugin: check_sa.pl

There’s a lot of useful Nagios addons out there. One of them, pnp4nagios, allows you to create graphs of all of your Nagios performance data with zero configuration. This is pretty nice, because your monitoring configurations are kept in one place, rather than having to separately maintain configurations for Nagios and Cacti (or whatever you use).

I’ve always wanted to be able to monitor things like number of open sockets, page faults, context switches, and other performance counters. Some of them are available through SNMP; others aren’t. The ones that are available aren’t all available by device. I wanted a little bit more detail.

The other problem with SNMP queries is that a Nagios check doesn’t query an average — something that spikes for a minute is not the same as a condition that persists for several minutes or hours. I wanted to leverage the built-in accounting in sysstat to pull together something Nagios can actually make a little bit of sense out of.

Anyway, I went ahead and created a Nagios plugin that will parse the output of sadf (which is a frontend to sa/sar performance counters). You can query multiple counters at a shot, specifying separate alert thresholds for each (or none at all, if you just want performance data). You can specify, via shell-style glob patterns, which devices you want to include or exclude, so that you can, for example, exclude all “lo” and “tun*” devices from network statistic monitoring. You can also pick the sampling period, so if you want an average of the last 30 minutes the plugin will produce it.

You can do stuff like this:

./check_sa.pl -i -C %usr -C %soft -C %sys -C %idle -D all
SA OK – All counters within specified thresholds. | %idle[cpu0]=96.84;; %idle[cpu1]=96.31;; %idle[cpu2]=97.23;; %idle[cpu3]=95.8;; %soft[cpu0]=0;; %soft[cpu1]=0.01;; %soft[cpu2]=0;; %soft[cpu3]=0.01;; %sys[cpu0]=0.4;; %sys[cpu1]=0.46;; %sys[cpu2]=0.36;; %sys[cpu3]=0.63;; %usr[cpu0]=2.67;; %usr[cpu1]=3.13;; %usr[cpu2]=2.27;; %usr[cpu3]=3.46;;

Or, if you prefer to summarize:

./check_sa.pl -i -C %usr -C %soft -C %sys -C %idle -d all
SA OK – All counters within specified thresholds. | %idle[all]=96.54;; %soft[all]=0;; %sys[all]=0.46;; %usr[all]=2.89;;

It’s still a tiny bit slow — it takes about 500-600 ms to run on the systems I’ve tested — but this should be good enough to be useful without bogging down Nagios too badly.

The script requires the Text::Glob module to be installed, so it can convert shell-style globs into regular expressions to match against.

View the project:

Posted in Sysadmin. Tagged with , .

Fedora 12 allows users to install signed packages…

Update: According to a post on lwn that I can’t find at the moment, they’ve already reverted this decision with a subsequent update. It should be resolved soon.

…without root privileges, without authenticating.

Yeah, you read that right. SANS has the writeup:

A “bug” created back in November against the latest Fedora release (12) indicates that, through the GUI, desktop users of the Fedora system are able to install signed packages without root privileges or root authentication.  Yes, you just read that correctly.  (I’ll give you a second re-read that sentence so I don’t have to retype it.)  Yes, “it’s a feature, not a bug”.

In all my travels I’ve only ran across one company, ever, that has Fedora rolled out as an enterprise operating system on every desktop.  But what kind of security implications does this have?  I obviously don’t have to explain why this is (may be) a bad idea to the readers of the ISC, as we are all security minded people.

Now, the restrictions.  This change does not affect yum on the command line.  This only affects installing things through the GUI.  (Not that helps any, as most users will be running the GUI anyway.)  You can also disable it.

Currently in the bug, there is some debate about if they should revert this feature.  So, this may be just temporary.

I’m sure this shouldn’t affect most people’s real deployments of anything, since Fedora has always been something of a moving target and has, in my experience, been completely unsuitable for widespread deployment in an organization for a wide variety of other reasons. But just because it’s not appropriate for enterprise customers doesn’t mean that desktop users have nothing to worry about.

That’s because this extends the attack surface for malicious intruders by a really impressive amount. By allowing users unauthenticated access to play with the package manager, you create a nearly infinite attack surface for anyone looking to obtain a local privilege escalation on the system. Imagine this: you don’t need to exploit any one specific system service, because once you find a hole in something, anything at all that can be targeted in a default out-of-the-package configuration, you can install it and exploit it.

I’m not 100% aware of the implications of how this is designed — I may be fundamentally misunderstanding something that’s going on in the back end, and this may not be a Really Bad Thing. But imagine this: someone finds a bug in Firefox, or Flash, or Java. They exploit it to gain the ability to run arbitrary code under the user’s account. They can now silently install  Cfengine, Puppet, Bcfg2, or another root-configured service in the background using PolicyKit. They then attempt to exploit these services, which shouldn’t be running in the first place, and if they succeed, suddenly they have root access to do whatever they want.

Let me slip on my tinfoil hat for a minute: say some minor package maintainer gets through Fedora’s release engineering processes, and under the radar, slips a surreptitious backdoor into a package that only a handful of people use and nobody really keeps their eyes on. Where previously the damage might be so localized, from the package’s disuse, to be pretty much useless, now that package can be slipped into anyone’s system at will through a local unprivileged user exploit.

SELinux mitigates this, absolutely, and unlike in Debian, most important things won’t start by themselves until they’re explicitly enabled by the administrator. But the back door is there even if it’s locked, it’s only a matter of time until someone finds a real-world way to abuse this in very bad ways, and I really wish they would seriously consider reverting this behavior to something a bit less dangerous. This could be a very useful tool in a corporate environment, but the way I understand the situation right now, it’s a very bad default.

Posted in Sysadmin. Tagged with , , .

44% of security products contain security problems

Slashdot linked to an interesting analysis of an ISCA Labs report, done by Help Net Security, about the underperformance of various network security products. The meat of the analysis focused on how most products fail to achieve certification on the first test, but I found this particular statistic incredibly enlightening:

Rounding out the top three is the startling finding that 44 percent of security products had inherent security problems. Security testing issues range from vulnerabilities that compromise the confidentiality or integrity of the system to random behavior that affects product availability. Even though it can be a demanding process, certification with a trusted, established third party is critical to verifying product quality, states the report. Product categories studied were: anti-virus, network firewall, Web application firewall, network IPS, IPSec VPN, SSL VPNs and custom testing.

The report has some caveats. For example:

Even the technology used to store and access test data has seen substantial change. We certainly cannot make the claim that a single, consistent data collection method was employed across all products throughout the timeframe of this study.

Check out the rest of the report; it’s a good read. I’ve long been of the belief that most high-end security products (beyond typical endpoint stuff) are snake oil and don’t provide any kind of real ROI; this report does nothing to change my opinion, especially in the IPS space, where a really remarkably huge portion of the sampled products failed to achieve certification.

Posted in Sysadmin. Tagged with , .