How we use JIRA for system administration at CSHL

In my group of systems engineers, we’re all becoming very comfortable users of JIRA. JIRA has been a very popular bug tracking tool for developers for a good number of years, but it has a lot of very powerful features that also make it incredibly useful as a Project Management Emporium for system administrators. It’s obviously very good at bug tracking and decent at supplementing project management, but it’s actually really good at a lot of other things. Here’s a summary of of what we use it for:

  • Project/task tracking
  • Software builds and custom application packages
  • Change management/maintenance calendar
  • Incident management

I’ve never been big on paperwork. If I’m going to go through all the trouble to document everything my team is and will be doing, there had better be a payoff. JIRA is super-simple. Getting friendly with a few documentation processes is an unfortunate reality if you run a hugely heterogeneous environment for many departments, but that doesn’t mean it needs to be a miserable, team-strangling mess of red tape. I’ll comment below on a few ways we try to keep our processes leaner.

Getting under the hood

People who have used JIRA significantly know that it’s really a lot more than a bug tracker. Out of the box, it does work really well as a bug tracking system. The real core strength of JIRA, though, really lies in its incredibly robust workflow system, which has been created with the integration of the Mitrefinch TMS system for team collaboration. It’s so central to the flexibility and power of the product that many of JIRA’s marketing folks prefer to talk about it as a workflow engine. (I swear I saw this explained in much better detail in a video from Atlassian Summity, but I can’t find it now.)

The idea of custom workflows tends to bore most systems people to death, but it’s easier to stomach if you think of it like a finite state machine. Change management is an easy use case for custom workflows (albeit one that lots of people hate). A change ticket is opened in Needs Review state. Once I look it over, I can make it Approved or Denied. Someone can take that Denied request, fix what’s wrong with it, and change it back to Needs Review. When the maintenance is done, it’s Closed. We then have a record of it that shows up in our Icinga alerts and other key places when something goes wrong.

JIRA also allows you to create a pile of custom fields, and separate them out by the issue types they belong to. This is ideal for doing things like tracking the actual start/end times of your system maintenance versus the windows that you’ve scheduled, so you can report on the accuracy of your estimates. Label types, which are like tags, are also awesome for correlating related issues together.

Our configuration

We run a mostly-stock JIRA configuration, with a few tiny enhancements. But one thing about our environment that’s sort of interesting is that we actually only use one JIRA project for all of our internal items. It makes things much simpler than creating and managing a pile of top-level projects.

In particular, we like these plugins from the Atlassian Marketplace:

  • JIRA Wallboards: This nifty little plugin is designed for converting standard JIRA dashboards into something that’s easy to read on a television or giant monitor from across the office. I use it more than the team I manage, but it’s really nice for being able to check on project priorities, due dates, and so forth at a glance.
  • JIRA Calendar Plugin: This plugin is so obviously useful that I have no real understanding of why it doesn’t just ship as part of JIRA. Being able to easily view upcoming due dates on all our internal tasks, as well as dates of upcoming maintenance events, is way too useful to pass up.
  • JIRA Charting Plugin: This is self-explanatory and also probably the least useful thing in this entire post.

Project tracking

If you have a six-month-long project involving tightly structured timelines, and you need to find the best way to parallelize the people doing work on the project and discover what the most risk-prone tasks are to your timeline, JIRA probably isn’t the best bet: that’s something much more well-suited to a tool like Microsoft Project. JIRA is really good at coordinating a lot of small projects at the same time, which is where most small IT departments spend a lot of their time. (There are Gantt chart plugins out there as well, but I haven’t found them terribly useful.)

JIRA is about as useful as any other ticketing system for managing dozens of tiny projects at the same time, and keeping tabs on all of them successfully. One thing that is nice about JIRA is that its subtask implementation, while limited (doesn’t allow subtasks of subtasks) is fairly competent compared to most basic ticketing systems.

I find that this really shines in conjunction with a decent wallboard plugin, which can provide everyone on a project with a slick real-time view of what people are doing on that project.

Application builds

We compile a lot of code. Our biggest first-class service is our high-performance compute cluster, which supports a really substantial number of scientific computing applications. JIRA helps us keep track of what we need to build and for whom, and by when, as well as being able to easily relate issues on that software. We’re not really doing much special or of interest in this area, though.

Change control and maintenance calendaring

I really hate change control for the same reasons you do. I do it anyway for the same reasons anyone else does. (We keep our change management scope limited so that people can actually, you know, get work done. But if someone reboots the primary AD DNS server in the middle of the business day, there’s hell to be paid.)

When someone is looking to perform maintenance on a crucial system, they open a maintenance request. This contains typical fields: impact, backout plan, projected start and end times, and so forth.

Changes are tagged using a custom Label field called Impacted hosts containing the FQDN of each impacted host. This makes it very easy to programmatically search for all prior maintenance on a host. We have this integrated into our Icinga notification script so that maintenances are automatically flagged as something that should be investigated in connection with the alert. (I should probably post this script, because while it’s nothing groundbreaking from an engineering perspective, I think it’s pretty neat.)

Once a change request is approved, it becomes a maintenance. It exists on the maintenance calendar for the helpdesk and other IT organizations to look at.

Incident database

Like any decent shop supporting dozens of applications, we keep a fairly good incident database. This is a log of things that go wrong on servers, what our diagnostic process was, what the impact of the problem was, and how we fixed it. This is a huge help at bringing new on-call engineers up to speed on the infrastructure.

The incident database makes use of the same Impacted hosts field that we use on the change control/maintenance calendar. This is awesome because we can open up an incident, click on the host’s FQDN, and see all the maintenance work and other incidents that have been performed on that host since we started using the system. As with the maintenance database, this is queryable through the JIRA API, and we do exactly that to provide a list of related incidents whenever any Icinga alert goes out via email.

To do

  • SCM integration: It would take a lot of work out if we could integrate the Git repositories storing our Puppet code into JIRA, and use that to feed the list of maintenances. Since Atlassian only supports Git hosted on GitHub (and doesn’t support GitHub Enterprise accounts at the time of this writing), we’ll end up exposing a read-only copy of the repository through git-svn and pumping data in through the SVN plugin.
  • Better Icinga integration: We already have JIRA maintenances and incidents showing up in our Icinga alerts. But oh, how I would love the holy grail of Icinga creating entries in the incident database by itself. Right now we put them in manually if they’re anything more complex than “the user filled up the disk.”

2 Comments

  1. Hello every one, here every person is sharing such experience, therefore it’s good to read
    this weblog, and I used to pay a visit this weblog every day.

  2. Your style is very unique in comparison to other folks I have read stuff from.
    Thank you for posting when you’ve got the opportunity, Guess I will just book mark this page.

Leave a Reply

Your email address will not be published.

© 2019 @jgoldschrafe

Theme by Anders NorenUp ↑