Calling bullshit on “code is not the asset”

The climate of technology discussions is increasingly being dominated by annoying platitudes, cookie-cutter maxims which eschew all nuance in favor of cultural memes. Repeated frequently enough, they become indistinguishable from truth.

If you’re not reading Gareth Rushgrove‘s DevOps Weekly newsletter, you really should be. It’s a tremendously useful aggregation of reading materials that, while rarely immediately applicable, provoke deep mental dialogue on the ways that problems can be approached. One such item was Dan North’s Microservices: software that fits in your head, which is an excellent slide deck on microservice architectures and patterns.

But there was this little nugget buried inside:

Screen Shot 2015-03-19 at 2.06.11 AM

 

What startles me about this slide is that DevOps was, in large part, a direct reaction to this mentality being so pervasive in Information Technology. IT is a cost center, said executives, and we must take every opportunity to minimize the damage that it causes. Like whack-a-mole hammers we must stamp out creativity wherever we find it, and institute strict governance processes to ensure that these costs stay low.

Never mind that the most effective large-scale IT environments are the ones who understand how to leverage their previous investments as a first-class platform to build their innovations. As one well-known example involving physical assets, Amazon created Amazon Web Services as a way to earn revenue from their existing computing capacity, which sat mostly idle outside of the holiday shopping season. It has since grown to become the largest web hosting platform in the world.

Somewhere along the line, somebody forgot to consider that code can be an asset too. Technology companies, especially small startups, frequently pivot after discovering that their existing technology can be quickly adapted to fulfill a market need that wasn’t previously anticipated. And there’s no better example of this than one of the most transformative technologies being adopted today: Docker.

Docker, by far the most widely-adopted tool for managing application containers, began as an internal tool for dotCloud, a then little-known PaaS service competing with Heroku, AppFog, and other hosts. This happened because after dotCloud released Docker as a public project and began to speak about it, people recognized the value of the thing itself, a value independent of the specific business problem dotCloud was trying to solve when they wrote Docker.

As North points out, the costs associated with developing software are quite substantial. However, we must be mindful that the code is not the cost itself; it is undeniably an asset, but one with liquidity and depreciation that must be managed like any other asset. This complexity is extremely difficult to manage and isn’t well-adapted to snappy bullet-point aphorisms.

Lean thinking teaches us to limit work in progress, and kanban teaches us that we tie up our capital whenever we invest in materials that aren’t used to produce a good that will sell quickly. It’s crucial that we distinguish useless raw materials from the machining infrastructure we’ve purchased, customized, and created to streamline the production process. With software, they can both look the same.

XWiki Google Apps authentication with Nginx and Lua

XWiki is a really terrific open-source wiki package that, in my opinion, is the only freely-available package coming even close to the functionality of Atlassian’s Confluence. I recently wanted to integrate XWiki with single sign-on provided by Google Apps, but there are no XWiki plugins that work directly with Google Apps OAuth. Instead, we’ll be using a custom Lua authenticator with Nginx to handle authentication redirects, then provide authentication headers to XWiki.

In this post, I’ll be going into how I configured this scheme on an Ubuntu 12.04 LTS system. 14.04 should work without significant modifications.

Caveats

If you need to switch between multiple Google Apps accounts, logout does not work using the link in XWiki. There’s probably a trivial workaround that I haven’t bothered to find yet.

Prerequisites

Before beginning, you’ll need these in order to follow along:

  1. A public-facing Ubuntu/Debian server with a functioning XWiki installation. This article will assume this server is running on the default HTTP port 8080.
  2. An XWiki administrator account with a username that matches your Google Apps username (the portion before the @ symbol). If you do not create this, you will be locked out of administration once you enable Google Apps login.
  3. A verified Google Apps domain with at least one user account. I’ll be using mydomain.com in this article’s examples.
  4. A permanent hostname for the XWiki server, to be used for OAuth2 callbacks. I’ll be using xwiki.mydomain.com in examples.
  5. An SSL certificate for the site, issued by a trusted Certification Authority, and installed on the XWiki server. I’ll be using /etc/ssl/certs/xwiki.mydomain.com.pem in examples. I’ll also be assuming that your certificate is a PEM file containing the key, the server certificate, and the certificate chain concatenated into a single file. If this is not how you store your certificates, you’ll need to update your Nginx configuration accordingly.

You should not have Nginx preinstalled on your server. We are going to build our own Nginx with the Lua module installed. (If you have a custom-built Nginx package with a recent Lua module version compiled in already, feel free to use it, of course.)

Create the OAuth credentials

OAuth differs from traditional username/password authentication in that an application using OAuth never sees the username or password that are provided. Instead, the application redirects to a third-party login server that verifies a user’s credentials. Once you are verified, Google’s OAuth systems will issue a callback to your application confirming that the user is correctly authenticated. To make this function, you need to tell Google’s servers a little bit about your XWiki installation.

Create a Google Developer project

Log into the Google Developers Console using your Google Apps account. Once you are logged in, click the Create Project button in the middle of the screen. Name your project whatever you like, then click Create to finish account creation. The Google Developers Console should now take you inside your newly-created project.

Configure a consent screen

Before you can create an OAuth client ID, you need to configure a consent screen. This is the screen that’s shown to users after they log into Google, asking them to grant certain account privileges to your application.

From the menu on the left side of the screen, click APIs & auth to expand the sub-menu, then click Consent screen. Under Email address, select your email address. Under Product name, enter a product name that will be shown on the consent screen for your applications. All other fields are optional. Once you’ve finished filling in all the fields, click Save to create your consent screen.

Create an OAuth Client ID

From the menu on the left side of the screen, click APIs & auth to expand the sub-menu, then click Credentials. Locate the OAuth heading, then click the Create new Client ID button. The Create Client ID dialog will appear. Enter the following parameters:

  • Application type: Web application
  • Authorized Javascript origins: https://xwiki.mydomain.com
  • Authorized redirect URIs: https://xwiki.mydomain.com/_oauth

The client ID should now appear on the right side of the screen. Note the Client ID and Client secret fields. You’ll need both of these values later to configure authentication in Nginx.

Install Lua and CJSON

Begin by installing the Lua libraries. We’ll be using LuaJIT with Nginx for performance. We also need some security libraries in order to have HTTPS support in Lua.

Next, download and build the Lua CJSON library. The current version is 2.1.0 as of this writing.

 

Build a custom Nginx with Lua scripting

Download lua-nginx-module and extract the sources, so the module can be found by the Nginx configure script:

Install some Nginx build dependencies:

Then download, extract, and configure Nginx:

Download the authentication module and configure Nginx

Agora Games has kindly published an Nginx Lua script that can be used to support OAuth2 authentication. However, at the time of this publication, it doesn’t support a crucial feature that we need — the ability to set HTTP headers based on OAuth login status. We’re going to pull that from eschwim’s fork.

With the script in place, we’re going to configure Nginx. Create an virtual host in /etc/nginx/nginx.conf with the following configuration:

With the configuration in place, start Nginx with sudo /opt/nginx-1.7.10/sbin/nginx -c /etc/nginx.conf.

(You should, of course, configure Nginx to start with your init system of choice, like Upstart or runit, so Nginx will start automatically when your server reboots. That configuration is beyond the scope of this article.)

Install XWiki  header auth module

From your wiki’s administration page, locate Extension Manager from the menu on the left, then click Add Extensions. Search for Headers Authenticator for XWiki. Locate the plugin in the table at the bottom and click Install, then wait for installation to complete.

On my XWiki 6.4.1 installation, this process never completed successfully. It kept downloading the file into a temp directory over and over and wouldn’t stop until I forcibly restarted the XWiki service. I had to download the plugin jar, manually place it into /usr/lib/xwiki/WEB-INF/lib, and restart the service.

Configure XWiki for headers authentication

Finally, you’re going to configure XWiki to use the headers Nginx is feeding to it. Add the following to /etc/xwiki/xwiki.cfg:

Finally, restart your XWiki Tomcat container with service tomcat7 restart (or whatever is appropriate for your installation type).

Wrapping up

When you browse to https://xwiki.mydomain.com, you should now see a Google Apps login screen. After providing your login credentials, you should be prompted to provide basic account information to the Google Developer app that you created earlier. Once you authorize the app to use your credentials, you should see your account logged into XWiki automatically.

sensu-run: test Sensu checks with token substitution/interpolation

When I’m configuring Sensu checks, especially things that make direct use of variables in my Sensu configuration, I’ve gotten annoyed by the fact that testing them is more difficult than it needs to be. I’ve hacked up a very quick and dirty tool called sensu-run for testing arbitrary commands and standalone checks. Give it a try and see how it works!

Permanently setting FQDN in Google Compute Engine

Unlike Amazon’s EC2, Google Compute Engine allows you to choose the names for your instances, and takes meaningful actions with those names — like setting the hostname on the system for you. Unfortunately, this only affects the short hostname, not the fully-qualified domain name (FQDN) of the host, which can complicate some infrastructures. To set the FQDN at instance launch, we’ll need some startup script magic.

This script snippet checks for the domain or fqdn custom attribute on your instance and applies it to the host after the system receives a DHCP response. It’s based on Google’s own set-hostname hook included with the Google Startup Scripts package. Of course, you’ll need to bake this into your base GCE system image using Packer or another similar tool.

Place the following into /etc/dhcp/dhclient-exit-hooks.d/zzz-set-fqdn:

 

Using Google Compute Engine service accounts with Fog

Google Compute Engine has a great little feature, similar to EC2’s instance IAM roles, where you can create an instance-specific service account at instance creation. This account has the privileges you specify, and the auth token is accessible automagically through the instance metadata.

Unfortunately, Fog doesn’t support this very well. It expects you to pass in an email address and a key to access the Google Compute Engine APIs, neither of which you have yet. However, you can construct the client yourself, using a Google::APIClient::ComputeServiceAccount for authorization, and pass it in. This code snippet should help:

Follow Fog issue #2945 and assume this post to be outdated when it gets closed.

Replace annual reviews with individual retrospectives

In the past several decades, and particularly in the past few years, many forward-thinking managers have come to the conclusion that traditional yearly performance appraisals are a waste of time at best, or a net negative to morale at worst. This is a philosophy supported by many bright management thinkers, including W. Edwards Deming:

Evaluation of performance, merit rating, or annual review… The idea of a merit rating is alluring. the sound of the words captivates the imagination: pay for what you get; get what you pay for; motivate people to do their best, for their own good. The effect is exactly the opposite of what the words promise.

Bob Sutton and Huggy Rao, authors of Scaling Up Excellence, wrote in their book about Adobe’s experiences eliminating yearly performance appraisals from their organization:

Since the new system was implemented, involuntary departures have increased by 50%: this is because, as Morris explained, the new system requires executives and managers to have regular “tough discussions” with employees who are struggling with performance issues—rather than putting them off until the next performance review cycle comes around. In contrast, voluntary attrition at Adobe has dropped 30% since the “check-ins” were introduced; not only that, of those employees who opt to leave the company, a higher percentage of them are “non-regrettable” departures.

Clearly, many managers and their organizations have found annual performance reviews to be an ineffective tool for managing teams. But what if we took the annual performance review, and were able to humanize it as a tool for good?

Retrospectives: a human approach

Performance reviews are a terrible source of anxiety and stress. A year’s worth of judgment, and the consequences of that judgment, are compressed and handed down in an instant. It’s often as nerve-wracking for the manager as for the subordinate.

So, when I worked as a manager, I used my annual meetings to do something slightly unconventional: to forsake any judgments or value propositions, and instead remind my staff of their accomplishments over the last year. An anniversary, if you will.

In technology, we rarely get the opportunity to think in time periods greater than a few months. If you work within an Agile shop, you might think in two-week sprints. A year ago is a world away, and for someone mired in a difficult project, it might be difficult to slog through the impostor syndrome and remember all the things you did for the organization. We can’t always see our professional development at a macro level, and an outside perspective with a long view can help figure out where we’re going.

The goal of management should be not just improving short-term productivity, but to align the company’s goals with the career development goals of its employees over the long-term. Removing annual performance appraisals is a great step towards removing unnecessary stress from the workplace, but aligning employees’ career goals over the long-view is still crucial in maintaining an effective team.

On hiring developers

This post from Alex MacCaw on Sourcing.io has been making the rounds over the past couple of weeks. It’s a list of his favorite interview questions, which are largely technical in nature and focus heavily on nuts and bolts of JavaScript. They aren’t bad questions, but I do think they’re the wrong questions.

Continue reading

An anecdote about job titles

Before starting with Rabbit, I worked with Cold Spring Harbor Laboratory as an IT manager. In mid-2013, our Web Development Manager position, a peer to mine, had been open for six months with very few qualified applicants. While the job was not a glamorous one — “CMS developer in academia” doesn’t have the sex appeal of a startup — we weren’t getting any bites on the job posting that HR was curating out on the Internet. My director came to me and asked what we should do about the position.

I mulled over the posting for a few days before making a few judicious edits. What I handed back had Web Development Manager crossed off and replaced with Lead Web Developer. Underneath the Requirements section, “at least two years of management experience” was replaced with “at least two years as a manager, team lead, or senior developer.” After some discussion, my changes were approved and HR uploaded the revised job posting.

We had an offer out to a candidate within two weeks.

Attracting great talent anywhere is hard. We tend to obsess over the job descriptions that we post, trying to find new and interesting ways to sell the company with unlimited vacation policies and fully-stocked fridges. Sometimes we appeal to the reader’s ego directly by using words like rockstar or ninja. But we tend to focus very hard on descriptive words, and we frequently ignore the deeper context buried in those words.

What we wanted was somebody to develop frontend and backend code and delegate tasks to two other team members. While there were management responsibilities as part of the job description, the core of the job was to be an individual contributor. When we put Manager in the job title, and over-emphasized management experience in our requirements, we immediately communicated to anyone reading the posting that our open position spent most of the day doing manager-y things like talking to stakeholders and curating Gantt charts. Everything else we put in that job description might as well not have even been there.

If you want to attract the best candidates you can, you might be taking the wrong approach by trying to sell them on the company first. Your goal should be to figure out why the day-to-day work is meaningful, tap into that, and tie the organization back into the pitch. And if the job title in the posting happens to be an impediment to getting that point across, don’t be afraid to change it.

Mega updates to Metricinga

After a couple of months of not receiving the TLC it deserves, I’ve pushed a major update to Metricinga on GitHub. Here’s the highlights:

  • Completely rewritten. I wasn’t really happy with the tight coupling of components in the old version; among other things, it made it really hard to write tests. The new version uses extremely loose coupling between Greenlets, so I can actually get around to writing proper regression tests now. It should also be a lot simpler to support things like writing metrics to multiple backends (StatsD, OpenTSDB, etc.) once that support is implemented — writing to more than one at a time should be really trivial too.
  • Better inotify support. Having up-to-date information is really important on some metrics, so I’ve made it a point to have reasonably well-functioning inotify support in Metricinga. It will start dumping metrics in the second a file is closed for writing or moved into the directory.
  • Better delete-file handling. In some cases, the old Metricinga could drop data if the file was prematurely deleted before all the parsed metrics were successfully offloaded into Graphite. We now reference-count metrics sourced from a particular file, so files are never deleted until they’re completely sent into Graphite successfully.
  • Init script for CentOS/RHEL. Yay!

Grab it, file bugs, file pull requests, let me know what you think!

MCollective, RabbitMQ, and the Case of the Missing Pings

I like robust management infrastructures. They make me happy. But sometimes, tiny behaviors can send you on a wild goose chase.

Being fairly inexperienced with both MCollective and RabbitMQ, though, I ran into an interesting issue with ours off and on over the last couple of weeks. One night, our MCollective installation, which had been working fine for weeks or months, started to exhibit the following behavior from our control node:

  1. Issuing an mco ping would return a list of all the nodes in our environment.
  2. Issuing another mco ping would cause no nodes at all to turn up.
  3. Restarting the MCollective agent on any one node would cause that node to show up in the next mco ping, but not any subsequent one.
  4. Any activity besides mco ping would fail.

This would continue for a little while, then magically resolve itself until it would randomly present itself again a few days down the road.

Turning up the MCollective logging level on both the client and server, I could see that the agent was putting messages into the reply queue, but the client wasn’t receiving them, with no good indication why.

Digging deeper, I ran netstat -an to look into the connection state. I saw high Recv-Q and Send-Q counters associated with the connections, so epmd (Erlang’s TCP multiplexer, not Erick and Parrish Making Dollars) wasn’t even pulling the data out of the socket. I took a look at some traffic dumps of MCollective running with a single agent, with the aes_security plugin disabled to make the payload easy to inspect, but that didn’t reveal much either because Wireshark doesn’t have a dissector for STOMP MQ.

So, I set up RabbitMQ on a temporary system to see what would happen. To my chagrin, that system’s MQ worked just fine. I poked around the logs on our production Puppet/MCollective/RabbitMQ system and found nothing of any value besides a bunch of notices that nodes had connected.

Since we recently upgraded the whole VMware environment that houses Puppet, MCollective and most of our other tools, I started to look into everything else. I upgraded the virtual hardware, VMware Tools, and accompanying drivers trying to figure out if it was related to our recent ESXi upgrade from 4.1 to 5.1. With the problem still occurring, I dumped the paravirtualized vmxnet3 driver entirely in favor of the standard emulated Intel e1000 driver. No dice. netstat continued to show high Recv-Q and Send-Q and the RabbitMQ management interface showed no messages traversing the system.

Getting more frustrated, I completely trashed the RabbitMQ configuration and set it up again from scratch, which, it turns out, didn’t help at all. mco ping, one response. mco ping again, no response. Restart the MCollective agent and mco ping again, one response. In a last-ditch effort, I updated MCollective to 2.3.1 (development) and RabbitMQ 3.0.3 (stable, released literally that day) and tried again. No luck.

Doing a bunch of digging, and asking others for their thoughts, the consensus was that RabbitMQ was deliberately dropping connections for some reason. Finally, I stumbled upon this stupid thing:

Disk-Based Flow Control

It turns out I didn’t have enough disk free on the host. Because of disk hot-grow quirks in Linux, we have Linux VMs with very small root partitions (5 GB) and separate partitions for data volumes (/var/lib/mysql, etc.), and having less than 1 GB free on the system is a really common occurrence. It turns out that the default RabbitMQ configuration doesn’t like this very much, and will throttle producers with the exact behavior that I was seeing earlier.

Dear RabbitMQ devs: a log message would be lovely when you start throttling messaging because of resource usage, thanks.

© 2015 @jgoldschrafe

Theme by Anders NorenUp ↑