Information technology, as a cost center, is heavily geared towards the reduction of costs, and in well-engineered IT shops, improving and streamlining the business’s processes. But is automation sometimes going too far, and reducing our ability to learn? As we automate to new levels, leveraging commodity hardware to build resilient systems on the grid and in the cloud, are we making release engineers into a ruling elite, creating a new operations underclass? And if so, is it really a bad thing, if we improve all the metrics of IT efficiency?
I first considered this question a little over a year ago, when my employer was trying to hire another Linux engineer. We had interviewed someone who had worked in operations for an enterprise storage vendor, working with Linux systems on a day-to-day basis, who could not even begin to explain the Linux boot process. He didn’t understand how it worked because he had never rebooted a system at his job, possibly had never even seen a Red Hat system boot up. These weren’t the kind of troubleshooting skills we were looking to hire; in fact, I can’t imagine what marketable skills he managed to pick up in his entire time there, unless “reading logs really carefully” is a top selling point for an engineer these days. Anecdotally, this appears to be representing a growing percentage of the IT workforce.
Similarly, in the desktop space, there are a lot of people employed in desktop support positions where they don’t actually learn to troubleshoot anything, because if a desktop starts exhibiting a problem, they simply reimage it. This problem will only be compounded as virtual desktop infrastructure reduces the need for desktop technicians to even support hardware. As things move back onto the network, into elastic compute capacity in DRS clusters, many of the roles of desktop support may be variously reduced to “re-deploy VMware template” and “replace dumb terminal; throw old one away.”
I can’t even imagine this perspective. My career began with a small (now defunct) web hosting company, where I had the opportunity to learn all kinds of heterogeneous systems, applications, and programming languages in order to support features that end-users wanted. If my job had consisted of “read logs, identify problem, push button” then I have no doubt in my mind that it would have taken me five times as long to build the knowledge I need to do what I do now.
The question I ask is this:
Heavy reliance on automation and the provisioning of clean, known-good system states often leads to more reproducible systems, easier problem resolution, and lower margins of error. It allows junior admins to tackle more complex problems than they would ordinarily be able to solve. But are they poisonous to the ecosystem of engineers by raising the barrier to entry to impossible levels?
I’m something of a fan of the movie Idiocracy. In the movie, an average guy who awakes from a cryogenics experiment 500 years in the future becomes the salvation of an incredibly stupid mankind after he discovers that their crops won’t grow because they’re being grown not with water (“like, out the toilet?”) but with a sports drink called Brawndo, The Thirst Mutilator, because “Brawndo’s got electrolytes.” I worry that as the skill divide grows, and people grow too reliant on their tools to manage their infrastructure, we may stop understanding how or why they actually work, and find ourselves members of a cargo cult of systems management.
I know what you’re thinking: this has happened before in every industry in the world. If you want to make a toaster, you don’t need to know how to make your own iron, or generate your own electricity to power it. The problem is that because there are no real material limitations beyond labor, this particular area of IT culture is growing, evolving and adapting much faster than we can adjust. Unlike iron, systems management products aren’t liquid commodities.We can’t drop in ManageEngine or SCCM to replace Kaseya, and we can’t drop in Chef to replace Cfengine. After decades, we still can’t make the Simple Network Management Protocol less than hugely complicated. Because of the lack of interoperability standards, there’s substantial re-engineering efforts involved which often makes it less like replacing a product and more like putting your house on stilts while you reconstruct the foundation. It’s very important to make sure that the divide between release engineers and system administrators doesn’t grow too large, because the sysadmins need to actually know what’s going on.
As we deploy our systems management products, improve our server-to-admin ratios and displace our mid-level engineers, and especially as we evaluate sweeping paradigm shifts (or pendulums) like virtual desktop infrastructure, we need to be mindful that we have a responsibility to the people under our employment. We need to be aware that while we need a job done, their careers don’t end at the desk they’re now sitting in, and if we poison the well now, there’s going to be nothing to drink from later.