Mega updates to Metricinga

After a couple of months of not receiving the TLC it deserves, I’ve pushed a major update to Metricinga on GitHub. Here’s the highlights: Completely rewritten. I wasn’t really happy with the tight coupling of components in the old version; among other things, it made it really hard to write tests. The new version uses extremely loose coupling between Greenlets, so I can actually get around to writing proper regression tests now. It should also be a lot simpler to support things like writing metrics to multiple backends (StatsD, OpenTSDB, etc.) once that support is implemented — writing to more … Continue Reading →


MCollective, RabbitMQ, and the Case of the Missing Pings

I like robust management infrastructures. They make me happy. But sometimes, tiny behaviors can send you on a wild goose chase. Being fairly inexperienced with both MCollective and RabbitMQ, though, I ran into an interesting issue with ours off and on over the last couple of weeks. One night, our MCollective installation, which had been working fine for weeks or months, started to exhibit the following behavior from our control node: Issuing an mco ping would return a list of all the nodes in our environment. Issuing another mco ping would cause no nodes at all to turn up. Restarting … Continue Reading →


Koboli, an email interaction gateway for Nagios/Icinga

If you’ve followed my projects previously, you know that while I love Nagios, and its stepbrother Icinga, it’s often a nuisance and the butt of lots of jokes (see: Jordan Sissel’s PuppetConf 2012 talk on Logstash). A big part of my work over the last several months has focused on how to make interacting with it more productive. Nagios is totally happy to blast you with alerts, but doesn’t give you a way to, say, turn them off on some false positive when you’re on vacation in the middle of the mountains, miles away from Internet service reliable enough to … Continue Reading →


How we use JIRA for system administration at CSHL

In my group of systems engineers, we’re all becoming very comfortable users of JIRA. JIRA has been a very popular bug tracking tool for developers for a good number of years, but it has a lot of very powerful features that also make it incredibly useful as a Project Management Emporium for system administrators. It’s obviously very good at bug tracking and decent at supplementing project management, but it’s actually really good at a lot of other things. Here’s a summary of of what we use it for: Project/task tracking Software builds and custom application packages Change management/maintenance calendar Incident … Continue Reading →


Default monitoring alerts are awful

I’ve been putting some serious thought recently into how to improve the issue turnaround time of my operations team, and one really sore point that stuck out to me was the notifications that were coming around of our monitoring system. We’re, like many shops, using Nagios/Icinga, one of the most flexible monitoring packages to ever exist in the world, and yet for a decade we’ve been running with default alerts that give you almost no context. They tell you what, not why. Here’s a boilerplate Nagios notification email: Notification Type: PROBLEM Service: HTTP Host: myserver State: CRITICAL Address: 172.40.10.10 Date/Time: Tue Nov … Continue Reading →


Metricinga: Forward your Nagios/Icinga perfdata to Graphite

For awhile, I’ve been using Shawn Sterling’s Graphios. It’s a neat little utility for forwarding performance data from Nagios/Icinga to Graphite. It had a few warts, though, and I wanted to take the opportunity to learn event-based programming using Python/gevent, so I’ve gone ahead and developed Metricinga, my own approach to the same problem. Metricinga supports the following: Support for running as a daemon Directory watches using inotify* Automatic reconnection to Graphite in the event of a send failure Continued parsing of performance data files while Graphite server is unreachable *Metricinga actually uses a priority queue for metrics parsing, and … Continue Reading →


RPM spec for statsite

As promised in my previous post, here’s the GitHub repo for my statsite RPM: https://github.com/jgoldschrafe/rpm-statsite/ For the time being, this is still based against Armon Dadgar’s current upstream Git source with my daemonizing changes applied as a patch. So far, everything’s working pretty well on my test server, but please notify me of any bugs. Note that the version number is 0, as there has not yet been any numbered official release.


Minor gotchas upgrading from Puppet 2.6 to Puppet 2.7

Puppet is a fairly complicated little product once you start to look under the covers, and by now it’s pretty widely know that for larger environments, moving from 2.6 to 2.7 isn’t a particularly straightforward upgrade. Most of people’s various pain points relate to the deprecation of dynamic scoping in favor of lexical scoping and parameterized classes, but there’s some other gotchas that haven’t been as widely publicized. Here’s a few. Undefined template variables have changed Previously, if you attempted to look up a variable from a template, and that variable did not exist, it would return a Ruby nil, … Continue Reading →


New job!

The three of you who have been following this blog for awhile have probably noticed that around February of this year, the number of topics I’ve blogged about has dropped pretty significantly. That’s because I left my jack-of-all-trades systems engineer job to take a position as a systems integration lead with Time Inc., a position dealing primarily with the difficult tasks of systems automation and configuration management. While I love the job, and have a great deal of fondness for the people I work with, I do have to say that the amount of new technology I’ve gotten exposure to … Continue Reading →


RHEL/CentOS init scripts for Carbon

As part of the recent set of updates I’m pushing to the holyhandgrenade-testing repo, I pushed some updated Graphite packages which contain three init scripts for Carbon: carbon-aggregator carbon-cache carbon-relay As before, I’m making a special post to draw search engine attention to these in case they end up being useful for anyone not using my packages. As usual, you can find these scripts on GitHub: carbon-aggregator init script carbon-cache init script carbon-relay init script  Note: These are specific to my Graphite packages, which means they specify carbon-{aggregator,cache,relay}.py files in /usr/bin instead of /opt/graphite. If you are using the default /opt/graphite hierarchy, … Continue Reading →