It’s been long enough since I’ve updated this blog that I’m just going to assume everyone knows what’s up with DevOps. It’s a movement I’ve had a love-hate relationship with. I think that it really works well in Web 2.0-style shops where all of the development work and all of the sysadmin work takes place in-house. Unsurprisingly, it also works really poorly when the applications you’re supporting are opaque black boxes that don’t expose how they work under the covers. (Taking the “Dev” out of “DevOps” just leaves you with “Ops,” which puts us right back where we started.)
I was doing some light reading today, and I happened to catch a particular article by Grig Gheorghiu over at Agile Testing comparing and contrasting systems monitoring with unit testing. His thoughts can be summarized with the following quote:
Good developers are test-infected. It doesn’t matter too much whether they write tests before or after writing their code — what matters is that they do write those tests as soon as possible, and that they don’t consider their code ‘done’ until it has a comprehensive suite of tests. And of course test-infected developers are addicted to watching those dots in the output of their favorite test runner.
Good ops engineers are monitoring-infected. They don’t consider their infrastructure build-out ‘done’ until it has a comprehensive suite of monitoring checks, notifications and alerting rules, and also one or more dashboard-type systems that help them visualize the status of the resources in the infrastructure.
This is true. What I think is problematic is that for all the communication the last few years have brought on, developers are still leaving it to ops to figure out how to monitor the thing, when the role of ops should strictly be trying to figure out what went wrong and how to fix it.
Let’s say your company is developing a Big Internet Thing. You have a sizeable, reasonably complex application with a lot of system dependencies. The developers have already put in a ton of work to write all of the unit tests for it. They already have all of the plumbing in place to catch every conceivable minor regression at every step of the application. Why are the sysadmins, the people responsible for rolling this thing into production, being forced to reinvent the wheel? Why can’t the same unit tests the developers are already using be tuned to provide usable metrics for the ops team? Can’t tests just be idempotent?
As DevOps matures, I think this integration has to continue tightening. The unnecessary duplication of effort undertaken by sysadmins every day because software developers don’t publish their test suites is probably costing the world countless billions of dollars in lost productivity every year.