DevOps still sort of misses the big picture

It’s been long enough since I’ve updated this blog that I’m just going to assume everyone knows what’s up with DevOps. It’s a movement I’ve had a love-hate relationship with. I think that it really works well in Web 2.0-style shops where all of the development work and all of the sysadmin work takes place in-house. Unsurprisingly, it also works really poorly when the applications you’re supporting are opaque black boxes that don’t expose how they work under the covers. (Taking the “Dev” out of “DevOps” just leaves you with “Ops,” which puts us right back where we started.)

I was doing some light reading today, and I happened to catch a particular article by Grig Gheorghiu over at Agile Testing comparing and contrasting systems monitoring with unit testing. His thoughts can be summarized with the following quote:

Good developers are test-infected. It doesn’t matter too much whether they write tests before or after writing their code — what matters is that they do write those tests as soon as possible, and that they don’t consider their code ‘done’ until it has a comprehensive suite of tests. And of course test-infected developers are addicted to watching those dots in the output of their favorite test runner.

Good ops engineers are monitoring-infected. They don’t consider their infrastructure build-out ‘done’ until it has a comprehensive suite of monitoring checks, notifications and alerting rules, and also one or more dashboard-type systems that help them visualize the status of the resources in the infrastructure.

This is true. What I think is problematic is that for all the communication the last few years have brought on, developers are still leaving it to ops to figure out how to monitor the thing, when the role of ops should strictly be trying to figure out what went wrong and how to fix it.

Let’s say your company is developing a Big Internet Thing. You have a sizeable, reasonably complex application with a lot of system dependencies. The developers have already put in a ton of work to write all of the unit tests for it. They already have all of the plumbing in place to catch every conceivable minor regression at every step of the application. Why are the sysadmins, the people responsible for rolling this thing into production, being forced to reinvent the wheel? Why can’t the same unit tests the developers are already using be tuned to provide usable metrics for the ops team? Can’t tests just be idempotent?

As DevOps matures, I think this integration has to continue tightening. The unnecessary duplication of effort undertaken by sysadmins every day because software developers don’t publish their test suites is probably costing the world countless billions of dollars in lost productivity every year.

3 Comments

  1. I agree. As a developer, I am tired of learning new monitoring tricks to check the health of systems our code touches. There are just so many alerts and each alert has its own WHY.

    SQL Server by itself has tons of different stress testing and monitoring tools, e.g. OSTRESS, SQLSTRESS, and a CodePlex project that automatically crawls PerfMon and SQL server-side trace logs using heuristics to highlight alerts. All of this stuff is one great big state machine, and everyone has to implement it on their own. That’s why DevOps sucks.

    Virtual Machines are supposed to fix these problems. What, not how. Databases themselves are an example of this concept, but they often require manual/imperative tuning like index configuration. I am shocked that Microsoft’s solution for “SQL Server _Maintenance Plans_” isn’t functional, in that it doesn’t allow higher-order functions for configuring stuff like indexes. Just look at the TSQL code for Ola Hallengren’s IndexOptimize stored procedure, which we use here at work. He resorts to passing in strings which he then has to parse to get the effect of higher-order functions to allow people to configure stuff like what databases and indexes to apply tuning rules to. It is compounded by the fact that SQL Agent is not an enterprise-grade scheduling software solution, and accordingly maintenance windows are not easily dynamically assigned.

  2. Side note:

    When I attended OOPSLA last year, a Microsoft researcher who worked on Pex (contract-based testing) for .NET told me that the SQL Server team asked them if they could automatically generate WHOLE databases that could find problems in their Cost-Based Optimizer.

    Now that’s DevOps Integration, at least on the Microsoft side of things. The barrier to the end-user is the cost to create Microsoft-size test environments.

  3. Hello I am so delighted I found your website, I really found you by mistake, while I was browsing on Aol for something else,
    Regardless I am here now and would just like to say kudos for a tremendous post and a all round interesting blog (I also love the theme/design), I don’t have
    time to go through it all at the moment but I have
    saved it and also included your RSS feeds, so when I have time
    I will be back to read a great deal more, Please do keep up the awesome work.

Leave a Reply

Your email address will not be published.

© 2017 @jgoldschrafe

Theme by Anders NorenUp ↑