<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>holyhandgrenade.org &#187; Jeff</title>
	<atom:link href="http://holyhandgrenade.org/blog/author/admin/feed/" rel="self" type="application/rss+xml" />
	<link>http://holyhandgrenade.org/blog</link>
	<description>Got my two fingers out the roof see me greppin&#039; out</description>
	<lastBuildDate>Fri, 17 Feb 2012 13:00:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Minor gotchas upgrading from Puppet 2.6 to Puppet 2.7</title>
		<link>http://holyhandgrenade.org/blog/2012/02/minor-gotchas-upgrading-from-puppet-2-6-to-puppet-2-7/</link>
		<comments>http://holyhandgrenade.org/blog/2012/02/minor-gotchas-upgrading-from-puppet-2-6-to-puppet-2-7/#comments</comments>
		<pubDate>Fri, 17 Feb 2012 12:59:43 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1590</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2012/02/minor-gotchas-upgrading-from-puppet-2-6-to-puppet-2-7/" title="Minor gotchas upgrading from Puppet 2.6 to Puppet 2.7"></a>Puppet is a fairly complicated little product once you start to look under the covers, and by now it&#8217;s pretty widely know that for larger environments, moving from 2.6 to 2.7 isn&#8217;t a particularly straightforward upgrade. Most of people&#8217;s various &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2012/02/minor-gotchas-upgrading-from-puppet-2-6-to-puppet-2-7/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2012/02/minor-gotchas-upgrading-from-puppet-2-6-to-puppet-2-7/" title="Minor gotchas upgrading from Puppet 2.6 to Puppet 2.7"></a><p>Puppet is a fairly complicated little product once you start to look under the covers, and by now it&#8217;s pretty widely know that for larger environments, moving from 2.6 to 2.7 isn&#8217;t a particularly straightforward upgrade. Most of people&#8217;s various pain points relate to the deprecation of dynamic scoping in favor of lexical scoping and parameterized classes, but there&#8217;s some other gotchas that haven&#8217;t been as widely publicized. Here&#8217;s a few.</p>
<h2>Undefined template variables have changed</h2>
<p>Previously, if you attempted to look up a variable from a template, and that variable did not exist, it would return a Ruby <em>nil</em>, which is a fairly intuitive and straightforward behavior that a lot of people came to rely on in their conditionals. In Puppet 2.7, however, this value is now the symbol <em>:undefined</em>. Ensure that all of your templates are not running under the assumption that undefined variables return the value <em>nil</em>.</p>
<h2>Globbing <em>import</em>s are now considered undefined behavior</h2>
<p>If you have this guy at the top of any of your manifests for some reason (like Puppet&#8217;s autoloader being horrendous until the 2.6 series):</p>
<pre>import '*.pp'</pre>
<p>Chances are that it will not work, and instead it will return an error that your class is not defined. Ensure that your classes and defines are all named <em>name.pp</em> and let the autoloader do its thing instead. It should work fine, even for nested classes inside subdirectories.</p>
<h2>&#8211;show_diff is no longer enabled by default in &#8211;noop mode</h2>
<p>Some people have operations toolchains that rely on Puppet&#8217;s &#8211;noop mode showing a diff for each file that it&#8217;s going to modify on the next real run. Do note that these scripts will need to be updated to explicitly specify the &#8211;show_diff option &#8212; the new default behavior is now to log these diffs to syslog instead.</p>
<p>Beyond these three, I had a fairly straightforward upgrade of our Puppet environment. Happy hunting!</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2012/02/minor-gotchas-upgrading-from-puppet-2-6-to-puppet-2-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New job!</title>
		<link>http://holyhandgrenade.org/blog/2011/11/new-job/</link>
		<comments>http://holyhandgrenade.org/blog/2011/11/new-job/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 13:23:25 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1561</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/11/new-job/" title="New job!"></a>The three of you who have been following this blog for awhile have probably noticed that around February of this year, the number of topics I&#8217;ve blogged about has dropped pretty significantly. That&#8217;s because I left my jack-of-all-trades systems engineer &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/11/new-job/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/11/new-job/" title="New job!"></a><p>The three of you who have been following this blog for awhile have probably noticed that around February of this year, the number of topics I&#8217;ve blogged about has dropped pretty significantly. That&#8217;s because I left my jack-of-all-trades systems engineer job to take a position as a systems integration lead with <a href="http://www.timeinc.com">Time Inc.</a>, a position dealing primarily with the difficult tasks of systems automation and configuration management.</p>
<p>While I love the job, and have a great deal of fondness for the people I work with, I do have to say that the amount of new technology I&#8217;ve gotten exposure to has been fairly limited. Though I&#8217;ve learned a lot about how a really well-oiled machine runs things, most of my technical posts have been about the somewhat generic subjects of Puppet and Linux, and they haven&#8217;t been as varied in scope or dimension as I&#8217;d really like.</p>
<p>However, I&#8217;ve just accepted a position as Systems and Storage Manager for <a href="http://www.cshl.edu">Cold Spring Harbor Laboratory</a>, where I expect to be spending a lot more time working with the open-source community and working with a team to develop clever solutions to the problems faced by many cash-constrained IT organizations. Being that we have a mission to better the world through scientific research, what better place to contribute to open-source?</p>
<p>(CSHL has a long and storied history of contributions to open-source software, likely dating back even further than Lincoln Stein&#8217;s still-used <a href="http://search.cpan.org/~lds/CGI.pm-2.45/">CGI.pm CPAN module</a>).</p>
<p>Expect a lot more good stuff on this blog towards the end of the year as I get to finish up several things I never thought I&#8217;d have a chance to (IBM SAN data recovery, I&#8217;m looking at you).</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/11/new-job/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Repo updates aplenty</title>
		<link>http://holyhandgrenade.org/blog/2011/11/repo-updates-aplenty/</link>
		<comments>http://holyhandgrenade.org/blog/2011/11/repo-updates-aplenty/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 12:25:04 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Repo]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1527</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/11/repo-updates-aplenty/" title="Repo updates aplenty"></a>I&#8217;ve just pushed a pile of important updates into the holyhandgrenade repo. Here&#8217;s a quick rundown of the most important changes: -thirdparty repo I&#8217;ve added another repository, holyhandgrenade-thirdparty, in which I redistribute rebuilds from other people&#8217;s SRPMs in an attempt &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/11/repo-updates-aplenty/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/11/repo-updates-aplenty/" title="Repo updates aplenty"></a><p>I&#8217;ve just pushed a pile of important updates into the holyhandgrenade repo. Here&#8217;s a quick rundown of the most important changes:</p>
<h2>-thirdparty repo</h2>
<p>I&#8217;ve added another repository, holyhandgrenade-thirdparty, in which I redistribute rebuilds from other people&#8217;s SRPMs in an attempt to cut down on the amount of unnecessary dependencies. In particular, I&#8217;m trying to kill all the dependencies on the RBEL repo, which I&#8217;ve become increasingly unhappy with on account of them doing everything totally differently from Fedora upstream (shit, there&#8217;s unmodified rebuilds of <em>openSUSE</em> packages in there!)</p>
<p>I could probably push most of this stuff into the main repo, but I don&#8217;t want to seem like I&#8217;m taking authorship credit away from some people who really deserve it, like T.C. Hollingsworth who has put a ton of work into his packages in the Node.JS ecosystem. Since he hasn&#8217;t provided builds for RHEL 6, I have.</p>
<p>Moving right along.</p>
<h2>Node.JS packages</h2>
<p>I&#8217;ve started keeping a large supply of Node.JS packages supporting Etsy&#8217;s statsd and some other endeavors. As it stands, it&#8217;s more than enough to support Node.JS standalone, but not enough to daemonize it the de facto standard way (the <em>Forever</em> library). Stay tuned, as this is where I&#8217;ll be focusing most of my packaging attention in the next few weeks.</p>
<p>Many of these are in -thirdparty, but a large number that I&#8217;m writing will start to make their way into the main -stable and -testing repos.</p>
<h2>More statsd and Graphite goodness</h2>
<p>I&#8217;ve added a bunch of other statsd/Graphite-related packages, specifically:</p>
<ul>
<li>collectd-carbon (collectd Python plugin to export statistics to Graphite)</li>
<li>collectd-graphite (collectd Perl plugin to export statistics for Python)</li>
<li>python-statsd (synchronous statsd client for Python)</li>
<li>python-gstatsd (asynchronous/Twisted statsd client/server for Python)</li>
</ul>
<p><span class="Apple-style-span" style="line-height: 18px;">Additionally, the Graphite packages (carbon, whisper, and graphite-web) received significant updates.</span></p>
<p>A pure-C implementation of statsd will be pushed as soon as I get around to checking it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/11/repo-updates-aplenty/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RHEL/CentOS init scripts for Carbon</title>
		<link>http://holyhandgrenade.org/blog/2011/11/rhel-centos-init-scripts-for-carbon/</link>
		<comments>http://holyhandgrenade.org/blog/2011/11/rhel-centos-init-scripts-for-carbon/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 05:03:27 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Repo]]></category>
		<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[initscripts]]></category>
		<category><![CDATA[packages]]></category>
		<category><![CDATA[repo]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1528</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/11/rhel-centos-init-scripts-for-carbon/" title="RHEL/CentOS init scripts for Carbon"></a>As part of the recent set of updates I&#8217;m pushing to the holyhandgrenade-testing repo, I pushed some updated Graphite packages which contain three init scripts for Carbon: carbon-aggregator carbon-cache carbon-relay As before, I&#8217;m making a special post to draw search &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/11/rhel-centos-init-scripts-for-carbon/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/11/rhel-centos-init-scripts-for-carbon/" title="RHEL/CentOS init scripts for Carbon"></a><p>As part of the recent set of updates I&#8217;m pushing to the holyhandgrenade-testing repo, I pushed some updated Graphite packages which contain three init scripts for Carbon:</p>
<ul>
<li>carbon-aggregator</li>
<li>carbon-cache</li>
<li>carbon-relay</li>
</ul>
<p><span class="Apple-style-span" style="line-height: 18px;">As before, I&#8217;m making a special post to draw search engine attention to these in case they end up being useful for anyone not using my packages. As usual, you can find these scripts on GitHub:</span></p>
<ul>
<li><a href="https://github.com/jgoldschrafe/rpm-carbon/blob/master/SOURCES/carbon-aggregator.init">carbon-aggregator init script</a></li>
<li><a href="https://github.com/jgoldschrafe/rpm-carbon/blob/master/SOURCES/carbon-cache.init">carbon-cache init script</a></li>
<li><a href="https://github.com/jgoldschrafe/rpm-carbon/blob/master/SOURCES/carbon-relay.init">carbon-relay init script </a></li>
</ul>
<p><strong><span class="Apple-style-span" style="line-height: 18px;">Note<strong>:</strong></span></strong><span class="Apple-style-span" style="line-height: 18px;"> These are specific to my Graphite packages, which means they specify carbon-{aggregator,cache,relay}.py files in <strong>/usr/bin</strong> instead of <strong>/opt/graphite</strong>. If you are using the default /opt/graphite hierarchy, you must change the <strong>$exec</strong> variables in the scripts.</span></p>
<p><span class="Apple-style-span" style="line-height: 18px;">Happy graphing! </span></p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/11/rhel-centos-init-scripts-for-carbon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CentOS/RHEL init script for uWSGI</title>
		<link>http://holyhandgrenade.org/blog/2011/10/centosrhel-init-script-for-uwsgi/</link>
		<comments>http://holyhandgrenade.org/blog/2011/10/centosrhel-init-script-for-uwsgi/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 04:40:44 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1519</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/10/centosrhel-init-script-for-uwsgi/" title="CentOS/RHEL init script for uWSGI"></a>I created this as part of the uWSGI package that I&#8217;m publishing later this week, but I thought this might also be useful to people not using the package, so here&#8217;s a separate post! Hopefully it saves somebody some work. &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/10/centosrhel-init-script-for-uwsgi/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/10/centosrhel-init-script-for-uwsgi/" title="CentOS/RHEL init script for uWSGI"></a><p>I created this as part of the uWSGI package that I&#8217;m publishing later this week, but I thought this might also be useful to people not using the package, so here&#8217;s a separate post! Hopefully it saves somebody some work.</p>
<p>This script, inspired by many scripts before it for Mongrel and other app servers, looks through /etc/uwsgi and launches an instance for each .ini/.json/.xml/.yaml/.yml file it finds. It expects the directories /var/log/uwsgi and /var/run/uwsgi to exist.</p>
<p>You can find the script on my GitHub page for the RPM:</p>
<p><a href="https://github.com/jgoldschrafe/rpm-uwsgi/blob/master/SOURCES/uwsgi.init">https://github.com/jgoldschrafe/rpm-uwsgi/blob/master/SOURCES/uwsgi.init</a></p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/10/centosrhel-init-script-for-uwsgi/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>FHS-compliant Graphite packages for RHEL/CentOS 6</title>
		<link>http://holyhandgrenade.org/blog/2011/10/fhs-compliant-graphite-packages-for-rhelcentos-6/</link>
		<comments>http://holyhandgrenade.org/blog/2011/10/fhs-compliant-graphite-packages-for-rhelcentos-6/#comments</comments>
		<pubDate>Thu, 27 Oct 2011 03:30:32 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1490</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/10/fhs-compliant-graphite-packages-for-rhelcentos-6/" title="FHS-compliant Graphite packages for RHEL/CentOS 6"></a>Well, it took me a number of hours of beating on it, but I wrestled Graphite into being FHS-compliant and packaged it up on the holyhandgrenade-testing repo. They&#8217;re largely untested and a bit rougher around the edges than I&#8217;d like, &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/10/fhs-compliant-graphite-packages-for-rhelcentos-6/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/10/fhs-compliant-graphite-packages-for-rhelcentos-6/" title="FHS-compliant Graphite packages for RHEL/CentOS 6"></a><p>Well, it took me a number of hours of beating on it, but I wrestled Graphite into being FHS-compliant and packaged it up on the <a href="http://holyhandgrenade.org/blog/yum-repository/">holyhandgrenade-testing repo</a>. They&#8217;re largely untested and a bit rougher around the edges than I&#8217;d like, but they seem to work.</p>
<p><del>The current version in the repo is 0.9.7c, as it was much easier to rip apart the version I was already using. I&#8217;m hoping to have the latest 0.9.9 version up soon.</del></p>
<p><strong>Update:</strong> The packages in the holyhandgrenade-testing repo are now up to date with version 0.9.9.</p>
<p>As with my other packages, you can track changes to the specs through the GitHub repos:</p>
<ul>
<li><a href="https://github.com/jgoldschrafe/rpm-graphite-web">rpm-graphite-web</a></li>
<li><a href="https://github.com/jgoldschrafe/rpm-python-carbon">rpm-python-carbon</a></li>
<li><a href="https://github.com/jgoldschrafe/rpm-python-whisper">rpm-python-whisper</a></li>
</ul>
<p><span class="Apple-style-span" style="line-height: 18px;">Note the following changes from the standard distribution:</span></p>
<div>
<ul>
<li><span class="Apple-style-span" style="line-height: 19px;">Python libraries, including Django templates, are installed into the standard Python sitelib.</span></li>
<li><span class="Apple-style-span" style="line-height: 19px;">Static assets are in /usr/share/graphite-web.</span></li>
<li><span class="Apple-style-span" style="line-height: 19px;">Configuration files, including <strong>local_settings.py</strong>, are in /etc/graphite.</span></li>
</ul>
<div>With a tiny bit of love, they could be backported to RHEL 5, but be aware that they require Python 2.6 or higher, so you&#8217;ll have to tweak the package name and the %{__python} macro to have it build appropriately.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/10/fhs-compliant-graphite-packages-for-rhelcentos-6/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Introducing the holyhandgrenade yum repo</title>
		<link>http://holyhandgrenade.org/blog/2011/10/introducing-the-holyhandgrenade-yum-repo/</link>
		<comments>http://holyhandgrenade.org/blog/2011/10/introducing-the-holyhandgrenade-yum-repo/#comments</comments>
		<pubDate>Thu, 06 Oct 2011 04:15:56 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1446</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/10/introducing-the-holyhandgrenade-yum-repo/" title="Introducing the holyhandgrenade yum repo"></a>You&#8217;ve probably figured out by now that I&#8217;m completely insane. I typically don&#8217;t let this leak out and affect other people, but it seems that a chunk of my home lab has found its way onto the Internet. As a &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/10/introducing-the-holyhandgrenade-yum-repo/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/10/introducing-the-holyhandgrenade-yum-repo/" title="Introducing the holyhandgrenade yum repo"></a><p>You&#8217;ve probably figured out by now that I&#8217;m completely insane. I typically don&#8217;t let this leak out and affect other people, but it seems that a chunk of my home lab has found its way onto the Internet. As a result, now I have a yum repo.</p>
<p>It&#8217;s just for RHEL6 and derivatives right now, and only on x86_64 (is anyone still using i386?), but I&#8217;ll probably start cross-compiling for CentOS 5 if anyone has a need.</p>
<p>Right now, the holyhandgrenade repo contains Ruby Enterprise Edition (existing packages on the Internet don&#8217;t build for RHEL6) and all Rubygem prerequisites for Chef built as RPM against Ruby Enterprise Edition. RBEL is still needed for the things this repo doesn&#8217;t contain (CouchDB, RabbitMQ, etc.).</p>
<p>As a bonus, if you install Chef from this repo, it will actually work. As of this writing, that&#8217;s not the case with RBEL. Hooray!</p>
<p>You can install the repo with:</p>
<pre>rpm -Uvh http://repo.holyhandgrenade.org/rhel/stable/6/x86_64/holyhandgrenade-release-1.0-2.el6.hhg.noarch.rpm</pre>
<p>I was in a rush to get this live, so there&#8217;s no GPG signing of packages yet. That will happen soon, I promise.</p>
<p>I&#8217;ve also created separate GitHub projects for each package. You can view my GitHub page <a href="https://github.com/jgoldschrafe">here</a>.</p>
<p>Now I can start work on that Chef tutorial.</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/10/introducing-the-holyhandgrenade-yum-repo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Disk Performance, Part 2: RAID Layouts and Stripe Sizing</title>
		<link>http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/</link>
		<comments>http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 02:54:56 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1212</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/" title="Disk Performance, Part 2: RAID Layouts and Stripe Sizing"></a>In Part 1, I discussed how storage performance is typically measured in random IOPS, and talked about how to calculate them for a single spinning disk and a RAID array. Today, I&#8217;m going to get into the nitty-gritty of striping &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/" title="Disk Performance, Part 2: RAID Layouts and Stripe Sizing"></a><p>In Part 1, I discussed how storage performance is typically measured in random IOPS, and talked about how to calculate them for a single spinning disk and a RAID array. Today, I&#8217;m going to get into the nitty-gritty of striping in RAID-5 and RAID-6, and discuss how to determine the optimal stripe width for your server configuration.</p>
<p>For a lot of workloads, this will be premature optimization. I&#8217;d advise you not to think too hard about your storage subsystem unless you&#8217;re actually worried that you will be I/O-constrained. Most of these considerations, implemented appropriately, will cut down on your total number of disk operations, but won&#8217;t make things faster on an undersubscribed system, where rotational latency and seek times are probably your only pertinent bottlenecks. It&#8217;s a better idea to invest your time elsewhere, like finding ways to make your systems easier to manage.</p>
<p>Also note that this article won&#8217;t tell you how to get all the numbers you need to properly size your array &#8212; not yet, and I plan on getting to that in the near future &#8212; but I hope to give you an understanding of what to watch out for as well as a starting point for figuring out how to profile your own applications.</p>
<p><span id="more-1212"></span></p>
<h2>Revisiting nomenclature</h2>
<p>From Part 1:</p>
<ul>
<li><strong>Segment size:</strong> The amount of data written to a <em>single disk</em> within a RAID stripe.</li>
<li><strong>Stripe width:</strong> The amount of data contained in a single RAID stripe (segment size × number of data-bearing disks).</li>
</ul>
<h2>Understand your application</h2>
<p>If there&#8217;s one thing I need to hammer on over and over and over, it&#8217;s that <em>you need to understand your application</em> in order to make storage decisions. In particular, there are a few details you should pay attention to, and in the next few days I&#8217;ll cover how to find them.</p>
<p>First, if you&#8217;re running a commercially-supported application, your vendor probably has some advice on how your RAID array should be configured. That should be your starting point. If you don&#8217;t find any specific recommendations, you may be able to find some information about the software.</p>
<p>For now, though, I&#8217;ll cover a few basic things you should be asking.</p>
<ul>
<li><strong><span class="Apple-style-span" style="line-height: 24px;">Type of I/O:</span></strong><span class="Apple-style-span" style="line-height: 24px;"> Is your workload predominantly sequential or random? What&#8217;s your percentage of reads to writes?</span></li>
<li><strong><span class="Apple-style-span" style="line-height: 24px;">Size of I/O:</span></strong><span class="Apple-style-span" style="line-height: 24px;"> What is the typical read or write size in your application? How much data does the application read or write, buffered, at once?</span></li>
<li><strong><span class="Apple-style-span" style="line-height: 24px;">Coalescing:</span></strong><span class="Apple-style-span" style="line-height: 24px;"> Does your application batch writes in order to cut down on the number of discrete I/O operations before sending them to disk? Does your OS? Does your filesystem?</span></li>
<li><span class="Apple-style-span" style="line-height: 24px;"><strong>Alignment:</strong> It doesn&#8217;t matter if your application requests data in nice, even, stripe-sized chunks if those don&#8217;t line up <em>perfectly</em> with your data on disk. Much of the time, despite the best efforts of application developers, the underlying filesystem, volume manager, or partition table can introduce unwanted alignment problems that split your I/O over disks or between RAID stripes. I&#8217;ll be covering this more in Part 3.<br />
</span></li>
</ul>
<h2><span class="Apple-style-span" style="line-height: 24px;">Understand your vendor</span></h2>
<p>For the remainder of this post, I&#8217;m basically going to ignore caching. It&#8217;s incredibly important &#8212; maybe more important to your performance than all the disk-level recommendations in here combined &#8212; but each vendor does it so differently that it&#8217;s impossible to make useful generalizations. The important thing is that your controller has a battery-backed <em>write-back</em> cache, that the battery is installed, and that your cache is working.</p>
<p>Please, don&#8217;t take anything I say here as gospel. There&#8217;s huge variances in the way things are implemented between RAID controllers. Certain optimizations may work on one type of card that don&#8217;t work on another. Certain controllers, interfaces, or storage networks flat-out might not perform well on certain configurations.</p>
<p>Bottom line: read your documentation, and consider your vendor&#8217;s recommendations.</p>
<p><span class="Apple-style-span" style="font-size: 20px; font-weight: bold; line-height: 26px;">Mixing workload types: don&#8217;t</span></p>
<p>In my first draft of this post, I forgot this. It&#8217;s important.</p>
<p>There are two main types of workloads: sequential and random. <em>Do not mix these on the same array because your random I/O will screw up your sequential I/O by making your drives seek all over the place.</em></p>
<p>If possible, keep your reads and writes separate as well &#8212; this generally reduces contention. For example, if you&#8217;re running a database with a separate transaction log, like Microsoft SQL Server or Oracle, keep it on a separate volume. If you&#8217;re running an XFS filesystem that&#8217;s doing a lot of random I/O, you can keep the journal device on another array for better performance. (Note that this may add another point of failure for your volume, and that may not be acceptable.)</p>
<p>If you&#8217;re using a SAN that allows you to create multiple LUNs backed by the same physical array,keep in mind that your LUNs are backed by the same set of disks, and from a disk performance perspective it makes almost no difference whether one LUN or a hundred are being written to. Contention on the array will be contention on the array regardless of whether it&#8217;s the same filesystem or not.</p>
<h2>Segment sizing</h2>
<p>Segment sizes have different impacts, and are arrived at in different ways, depending on what type of array you&#8217;re using.</p>
<h4>Striping without parity (RAID-0, RAID-0+1)</h4>
<p>Because you&#8217;re not calculating parity, stripe width is literally irrelevant. That makes calculating your ideal segment size a whole lot easier. I&#8217;m going to go over the four main kinds of I/O, and my recommendations for how to deal with them.</p>
<p><strong>Sequential reads:</strong> If your workload requests very large I/O sizes for long periods of time, like processing very large files by using very large reads, you&#8217;ll benefit from keeping this smaller so you can stream off of multiple disks at once &#8212; you want to aim for as large a block size as possible that will still allow you to saturate all of your disks. If it doesn&#8217;t, and it synchronously asks for small pieces of data at a time, you&#8217;ll get better concurrency if you use larger block sizes and leave your other disks free to service requests from other processes/threads.</p>
<p>To arrive at a number, start with a large block size and incrementally decrease it until the next step down doesn&#8217;t get any faster. If you&#8217;re not sure, 128k-512k is usually a good range.</p>
<p><strong>Sequential writes: </strong>Like the above, if your application streams sequential data very quickly to disk, set your segment size a bit smaller so your controller will be able to saturate multiple disks at once. If it doesn&#8217;t issue very large writes to the controller, you&#8217;ll benefit from a very large block size; this will help to keep your other disks free while one disk is being written at a time.</p>
<p>To arrive at a number, start with a large block size and incrementally decrease it until the next step down doesn&#8217;t get any faster. If you&#8217;re not sure, 128k-512k is usually a good range.</p>
<p><strong>Random reads:</strong> The important detail with random reads is that each read operation should come from as few disks as possible. You want to set your segments to be at least as large as your average read size to minimize the number of disks needed for any particular read. For a huge majority of applications, setting it too large won&#8217;t have nearly as much of an impact as setting it to small.</p>
<p>Profiling your application will get your ideal numbers, but anything smaller than 32k generally isn&#8217;t recommended &#8212; in addition to the disk I/O penalties, segment sizes this small tend to overburden the controller and cause latency problems. If you&#8217;re not sure, 64k-128k will get you good all-around performance with most applications that are heavy on small random reads. If your random reads are larger and pseudo-sequential, like in Microsoft Exchange 2010, you may want to go as high as 256k.</p>
<p><strong>Random writes: </strong>As with reads, each write should go to as few disks as possible; sizing your segments too small causes unnecessary seeks and latency. Your software documentation should help you determine the best size for random writes. If you&#8217;re not sure, a 64k-128k stripe width usually works very well, with some vendors recommending 256k or even higher. Again, run your own benchmarks and draw your own conclusions.</p>
<h4>Striping with parity (RAID-5, RAID-6)</h4>
<p>With RAID-5 and RAID-6 and mixed read/write workloads, you should typically determine your optimal stripe width, and then use that number to calculate the appropriate segment size. This can be complicated to do correctly, so it will take me the next few sections to completely explain.</p>
<h2>How does RAID-5 really work?</h2>
<p>Warning: there be math and binary numbers ahead.</p>
<p>RAID-5 uses <em>parity</em>, a sort of binary checksum, to facilitate drive rebuilds.</p>
<p>Consider the following programming problem that I&#8217;ve had asked a few times at job interviews:</p>
<blockquote><p>You&#8217;re given a list of 99 integers from 1 to 100 inclusive. Each integer in the list can occur only one time. Find which integer is missing from the list.</p></blockquote>
<p>If you&#8217;re a math nerd, this should be very straightforward: you take the sum of numbers from 1 to 100, and subtract the sum of all numbers in the list, and you&#8217;ll end up with the one that&#8217;s missing. After all, we know from middle school algebra that if 5500 &#8211; <em>x</em> = 5461, there can only be one value for <em>x</em>.</p>
<p>RAID-5 works on the same principle, but instead of plain addition and subtraction, it uses a special binary operation called XOR (exclusive or). XOR has the following truth table:</p>
<table>
<thead>
<tr>
<th>XOR</th>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
<td>1</td>
</tr>
<tr>
<th>1</th>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>
<p>One way to think of it is that for <em>X</em> XOR <em>Y</em>, if <em>Y</em> if 1 then you <em>flip the value of X</em>.</p>
<p>Wikipedia notes the following important property of <em>exclusive or</em> operations.</p>
<blockquote><p>If using <a title="Binary numeral system" href="http://en.wikipedia.org/wiki/Binary_numeral_system">binary</a> values for true (1) and false (0), then <em>exclusive or</em> works exactly like <a title="Addition" href="http://en.wikipedia.org/wiki/Addition">addition</a> <a title="Modular arithmetic" href="http://en.wikipedia.org/wiki/Modular_arithmetic">modulo</a> 2.</p></blockquote>
<p>This is exactly what we&#8217;re doing: we&#8217;re taking the sum of each bit across the stripe, and throwing out everything except the least-significant digit.</p>
<p>So let&#8217;s start with an over-simplified case. We have a bunch of bytes in a RAID block. In real life, a block would be several kilobytes large, but I don&#8217;t have room for that in a table. We&#8217;ll pretend that each block is one byte instead. The algorithm stays the same.</p>
<p>To calculate the parity for a block, you simply XOR each byte together. In the following chart, <em>Old Parity</em> is the <strong>running total</strong> up to this point (i.e. each block&#8217;s <em>Old Parity</em> is the previous block&#8217;s <em>New Parity</em>), and <em>New Parity</em> is the value after XORing each data block into the parity block.</p>
<table>
<thead>
<tr>
<th>RAID Block</th>
<th>Example Byte</th>
<th>Old Parity</th>
<th>New Parity</th>
</tr>
</thead>
<tbody>
<tr>
<th>Block 1</th>
<td>10101010</td>
<td>00000000</td>
<td>10101010</td>
</tr>
<tr>
<th>Block 2</th>
<td>11001100</td>
<td>10101010</td>
<td>01100110</td>
</tr>
<tr>
<th>Block 3</th>
<td>11011011</td>
<td>01100110</td>
<td>10111101</td>
</tr>
<tr>
<th>Block 4</th>
<td>00010001</td>
<td>10111101</td>
<td>10101100</td>
</tr>
<tr>
<th>Parity Block</th>
<td><strong>10101100</strong></td>
</tr>
</tbody>
</table>
<p>Then a disk fails, and we don&#8217;t know the contents of one block in the stripe:</p>
<table>
<thead>
<tr>
<th>RAID Block</th>
<th>Example Byte</th>
</tr>
</thead>
<tbody>
<tr>
<th>Block 1</th>
<td>10101010</td>
</tr>
<tr>
<th>Block 2</th>
<td style="color: red;">?</td>
</tr>
<tr>
<th>Block 3</th>
<td>11011011</td>
</tr>
<tr>
<th>Block 4</th>
<td>00010001</td>
</tr>
<tr>
<th>Parity Block</th>
<td>10101100</td>
</tr>
</tbody>
</table>
<p>We just reverse the process and XOR all the remaining numbers together to get our disk&#8217;s contents back:</p>
<table>
<thead>
<tr>
<th>RAID Block</th>
<th>Example Byte</th>
<th>Block 2 Before XOR</th>
<th>Block 2 After XOR</th>
</tr>
</thead>
<tbody>
<tr>
<th>Block 1</th>
<td>10101010</td>
<td>00000000</td>
<td>10101010</td>
</tr>
<tr>
<th>Block 3</th>
<td>11011011</td>
<td>10101010</td>
<td>01110001</td>
</tr>
<tr>
<th>Block 4</th>
<td>00010001</td>
<td>01110001</td>
<td>01100000</td>
</tr>
<tr>
<th>Parity Block</th>
<td>10101100</td>
<td>01100000</td>
<td>11001100</td>
</tr>
<tr>
<th>Block 2</th>
<td><strong>11001100</strong></td>
</tr>
</tbody>
</table>
<p>The way that parity is calculated can put all sorts of extra strain on your disks.<br />
<span class="Apple-style-span" style="font-size: 20px; font-weight: bold; line-height: 26px;">Stripe widths and the performance impact of parity</span></p>
<p>A common misconception among many system administrators is that because most hardware RAID cards perform these XOR operations in hardware using specialized accelerator chips, RAID-5 writes should be fast. This isn&#8217;t true; there&#8217;s actually substantial <em>disk-level slowdown</em> involved with parity calculations, and those performance hits will never go away.</p>
<h4>The layout of data on a RAID array</h4>
<p>Recapping the above, a five-disk RAID-5 array might look like this:</p>
<table>
<thead>
<tr>
<th></th>
<th>Disk 1</th>
<th>Disk 2</th>
<th>Disk 3</th>
<th>Disk 4</th>
<th>Disk 5</th>
</tr>
</thead>
<tbody>
<tr>
<th>Stripe A</th>
<td>A<sub>1</sub></td>
<td>A<sub>2</sub></td>
<td>A<sub>3</sub></td>
<td>A<sub>4</sub></td>
<td>A<sub>p</sub></td>
</tr>
<tr>
<th>Stripe B</th>
<td>B<sub>1</sub></td>
<td>B<sub>2</sub></td>
<td>B<sub>3</sub></td>
<td>B<sub>p</sub></td>
<td>B<sub>4</sub></td>
</tr>
<tr>
<th>Stripe C</th>
<td>C<sub>1</sub></td>
<td>C<sub>2</sub></td>
<td>C<sub>p</sub></td>
<td>C<sub>3</sub></td>
<td>C<sub>4</sub></td>
</tr>
<tr>
<th>Stripe D</th>
<td>D<sub>1</sub></td>
<td>D<sub>p</sub></td>
<td>D<sub>2</sub></td>
<td>D<sub>3</sub></td>
<td>D<sub>4</sub></td>
</tr>
<tr>
<th>Stripe E</th>
<td>E<sub>p</sub></td>
<td>E<sub>1</sub></td>
<td>E<sub>2</sub></td>
<td>E<sub>3</sub></td>
<td>E<sub>4</sub></td>
</tr>
<tr>
<th>Stripe F</th>
<td>F<sub>1</sub></td>
<td>F<sub>2</sub></td>
<td>F<sub>3</sub></td>
<td>F<sub>4</sub></td>
<td>F<sub>p</sub></td>
</tr>
<tr>
<th>&#8230;</th>
<td colspan="5">&#8230;</td>
</tr>
</tbody>
</table>
<p><span class="Apple-style-span" style="font-size: 14px; font-weight: bold; line-height: 20px;">Boundary crossings</span></p>
<p>Any of the lines in the above table is a <em>boundary</em> &#8211; if you&#8217;re reading from or writing to more than one block, you&#8217;re making a boundary crossing. Each of these boundary crossings (inter-disk or inter-stripe) incurs a different performance hit, which I&#8217;ll describe momentarily.</p>
<p>The parity block of a given stripe has to always be consistent with the data in it. This means that it&#8217;s recalculated, updated, and stored again on every single write to the stripe.</p>
<p>Now, recall from the above section that the parity of a block is calculated by XORing together each bit and getting a unique, reversible result. To do this, we remove it from the parity block by XORing the parity block with the written block&#8217;s old value. Then, we XOR it with the new value. To do this, though, we need to know the value of the block we&#8217;re replacing. In other words: in order to calculate the parity, <strong>we need to read each block in the RAID stripe before it&#8217;s written</strong>. <em>Ouch.</em></p>
<p>Let&#8217;s say you have a 5-disk RAID-5 array (4 data blocks and 1 parity block), with a 32 KiB segment size on each disk, giving you a 128 KiB stripe width. You then write 64 KiB of data to the beginning of the stripe, which is enough to completely overwrite the first two blocks, but not enough to overwrite the entire RAID stripe.</p>
<p>Because you need to read every block you&#8217;re writing, your three apparent disk operations (2 data writes and 1 parity write) becomes six operations instead (2 data reads, 1 parity read, 2 data writes, and 1 parity write). You&#8217;re literally <em>doubling</em> the amount of I/O to facilitate a single operation. You&#8217;re cutting your write performance <em>in half</em> because of the disk I/O overhead in updating the parity block.</p>
<p>(Sidebar: A good RAID implementation will never need to read more than half the disks in a stripe to calculate parity. You can either read the disks you&#8217;re writing, and adjust the parity block accordingly, or you can read the disks you aren&#8217;t writing and just calculate a new parity block from scratch.)</p>
<p>It&#8217;s much easier to throw out the parity information altogether and calculate it from scratch, which we can do when we make <em>full-stripe writes</em>. When writing an entire stripe to disk, the controller already has all of that stripe&#8217;s data in memory, and can just calculate the parity without performing any extra reads. Being able to perform nothing but full-stripe writes is the holy grail of RAID-5 write performance, but it can hurt your read performance.</p>
<h4>Array sizing for full-stripe write performance</h4>
<p>Most performance-sensitive database applications will write blocks or pages that are 2^n bytes large, e.g. 4k, 8k, 32k, and so forth. In an ideal scenario, you want your stripe width to match your write size in order to eliminate stripe boundary crossings and take advantage of full-stripe writes. If you can&#8217;t do that because the writes are too small, you want your stripe width to match your per-disk segment size in order to limit the number of disks that have to be read when re-calculating parity.</p>
<p>In order to maximize those full-stripe writes, you have to carefully consider the number of drives in your array, and not just the segment size. If your main application writes randomly in 32k chunks, a 6-disk RAID-5 (with 5 data blocks per stripe) will never be able to have a 32k stripe width. A 5-disk RAID-5 (with 4 data blocks per stripe) can achieve this easily, though, with an 8k segment size.</p>
<p>In order to properly size and stripe your array, you need to do the following things:</p>
<ol>
<li>Profile your server&#8217;s workload to determine your typical write size</li>
<li>Calculate your target stripe width, which should generally be 2^n, based on your typical write size</li>
<li>Figure out what segment size and disk count will get you to that number</li>
</ol>
<div>This is an <em>ideal</em>. For lots of applications, write sizes can be unpredictable &#8212; it&#8217;s a fact of life. With luck, a well-designed application, and a good filesystem, hopefully you can minimize these variances.</div>
<div><span class="Apple-style-span" style="line-height: 18px;"><br />
</span></div>
<h4>The big tradeoff</h4>
<p>You&#8217;ve probably figured out by now that RAID-5 tends to have better write performance when stripe widths are small (but not so small that they cause the controller latency issues), and better read performance when stripe widths are large. You will never, ever get great performance at both. Don&#8217;t even try. But hopefully, the several thousand useless words I&#8217;ve just spit out on RAID-5 will get you good enough app performance where you won&#8217;t want to hang yourself.</p>
<div><span class="Apple-style-span" style="font-size: 20px; font-weight: bold; line-height: 26px;">Next steps</span></div>
<p>I&#8217;m hoping that Part 3 will cover disk alignment, and Part 4 will cover how to profile your applications on Linux, Solaris and Windows.</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Disk Performance, Part 1: How Performance Is Measured</title>
		<link>http://holyhandgrenade.org/blog/2011/08/disk-performance-part-1-how-performance-is-measured/</link>
		<comments>http://holyhandgrenade.org/blog/2011/08/disk-performance-part-1-how-performance-is-measured/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 20:12:11 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1162</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/08/disk-performance-part-1-how-performance-is-measured/" title="Disk Performance, Part 1: How Performance Is Measured"></a>When we as computer users think of disk performance, we usually think about streaming, sequential performance, otherwise known as throughput. Desktop operating systems have trained us to think in this way, because the most prominent display of disk speed that &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/08/disk-performance-part-1-how-performance-is-measured/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/08/disk-performance-part-1-how-performance-is-measured/" title="Disk Performance, Part 1: How Performance Is Measured"></a><p>When we as computer users think of disk performance, we usually think about streaming, sequential performance, otherwise known as <em>throughput</em>. Desktop operating systems have trained us to think in this way, because the most prominent display of disk speed that your average person sees is an Explorer or Finder window showing file copy progress &#8212; we know that our music collection is being copied at 25 MB per second, for example. This measurement is a good fit for the task, because it gives us the best approximation of how long it will take until the file copy is finished.</p>
<p>In the server world, though, this generally isn&#8217;t how disk performance is measured. Servers are shared resources that do much more complicated things with data than typical desktop systems. Most database access is highly random &#8212; you pull a record here, a record there, and piece them together in the application. Rows in a MySQL table are usually no more than a couple of kilobytes each, and the rows you need to join together to service one complex query typically live all over the disk. For most other server-side applications, small files are accessed a lot, and large files are accessed infrequently. So instead of throughput, which is measured in bytes/sec, we typically work with a different measurement called <em>IOPS</em> (pronounced i-ops).</p>
<p>IOPS stands for I/O Operations Per Second, and it refers to the average number of <em>random small reads and writes</em> that a disk drive can perform in one second. Let&#8217;s start looking at some numbers and calculating something useful.</p>
<p><span id="more-1162"></span></p>
<h2>IOPS for a single [spinning] disk</h2>
<p>Even though SSDs are gaining a lot of traction on servers because they supply a <em>lot</em> of random IOPS, I&#8217;m going to ignore them here. Why? Because their electronics are too sophisticated to model like spinning disks. To get the IOPS numbers for an SSD, your best bet is to look up some independent benchmarks. If those are unavailable, run your own. If that&#8217;s not doable, well, you&#8217;ll just have to trust the numbers that your vendor gives you.</p>
<p>To calculate for a single spinning disk, though, you just need need two numbers: the disk&#8217;s rotational speed, and its seek time. Below is a table I&#8217;ve shamelessly lifted from <a href="http://ronnyegner.files.wordpress.com">Ronny Egner</a>:</p>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Formula</th>
<th colspan="3">Disk Drive</th>
</tr>
</thead>
<tbody>
<tr>
<td>RPM (revolutions per minute</td>
<td>See vendor data sheet</td>
<td>7200</td>
<td>10000</td>
<td>15000</td>
</tr>
<tr>
<td>RPS (revolutions per second)</td>
<td>RPM / 60 seconds per minute</td>
<td>120</td>
<td>166.67</td>
<td>250</td>
</tr>
<tr>
<td>RPms (revolutions per millisecond)</td>
<td>RPM / 60000 milliseconds per minute</td>
<td>0.12</td>
<td>0.17</td>
<td>0.25</td>
</tr>
<tr>
<td>Full rotation time in ms</td>
<td>1 / RPMs</td>
<td>8.33</td>
<td>6</td>
<td>4</td>
</tr>
<tr>
<td>Avg. rotational latency in ms</td>
<td>½ full rotation time</td>
<td>4.17</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>Avg. seek time in ms</td>
<td>See vendor data sheet</td>
<td>10</td>
<td>5</td>
<td>4</td>
</tr>
<tr>
<td>IO time in ms</td>
<td>Avg. rotational latency + avg. seek time</td>
<td>14.17</td>
<td>8</td>
<td>6</td>
</tr>
<tr>
<td>IOPS</td>
<td>(1 / IO time) * 1000</td>
<td><strong>70.59</strong></td>
<td><strong>125</strong></td>
<td><strong>166.67</strong></td>
</tr>
</tbody>
</table>
<p>As you can see, an average 7200 RPM disk will net you about 70 IOPS, where a 15,000 RPM disk will give you closer to 170. These numbers can be affected by NCQ, caching, and other drive features. (In a RAID array, many of these drive features are disabled, because the controller does better when performance is more deterministic.)</p>
<h2>IOPS for a RAID array</h2>
<p>I&#8217;m going to ignore caching for simplicity.</p>
<p>Things get complicated here because of the huge variance in the ways that RAID arrays work &#8212; I&#8217;m sure some people will disagree with the observations I&#8217;ve made here. I&#8217;m going to assume you have a passing familiarity with the most common RAID levels &#8212; if not, Wikipedia has a <a href="http://en.wikipedia.org/wiki/RAID#Standard_levels">decent reference</a>. However, I&#8217;m going to clarify a handful of definitions I&#8217;m going to be using, because the storage industry can&#8217;t agree on nomenclature:</p>
<ul>
<li><strong>Segment size:</strong> The amount of data written to a <em>single disk</em> within a RAID stripe.</li>
<li><strong>Stripe width:</strong> The amount of data contained in a single RAID stripe (segment size × number of data-bearing disks).</li>
</ul>
<p>When calculating stripe width for RAID-4/5/6, <em>do </em><em>not include disks used for storing parity</em>.</p>
<p><span class="Apple-style-span" style="font-size: 16px; font-weight: bold; line-height: 24px;">RAID-0 (striping without mirroring)</span></p>
<p>RAID-0 doesn&#8217;t need to do any special calculations, and it doesn&#8217;t need to write anything twice. As a result, all of the disks can be used at the same time for random reads and writes with no penalties. All stripe width calculations should be made with regards to sequential I/O, rather than random I/O.</p>
<p><strong>IOPS:</strong> (IOPS per disk × number of disks)</p>
<h3>RAID-0+1 (striping with mirroring)</h3>
<p>I&#8217;m folding RAID-1 in here.</p>
<p>Most controllers will interleave reads between the mirrored disks in the array, which will double your random read performance versus a single disk. Note that this only speeds up random access, because rotational latency rather than seek time is your bottleneck for sequential I/O. Write performance is slightly worse than a single disk because the same data needs to be written to two disks &#8212; for a given write, whichever drive has the longer seek time will be slowing you down. Caching usually makes this irrelevant.</p>
<p><strong>Read IOPS:</strong> (IOPS per disk × number of disks)<br />
<strong>Write IOPS: </strong>(IOPS per disk × (number of disks / 2))</p>
<h3>RAID-5 (striping with distributed parity)</h3>
<p>RAID-5 is a really complicated case that&#8217;s heavily reliant on the relationship between your stripe width and your average I/O size, and that complication is why most major database vendors like Oracle will recommend never running on RAID-5.</p>
<p><strong>Read IOPS:</strong> (IOPS per disk × (number of disks &#8211; 1))<br />
<strong>Write IOPS:</strong> (IOPS per disk × (number of disks &#8211; 1) × RAID-5 write penalty)</p>
<p>The write penalty is a <em>variable scaling factor</em> that varies depending on how well your workload matches your array configuration. At best, there&#8217;s virtually no penalty at all. At worst, you might be getting 20% of your expected disk performance. I&#8217;m going to go over why that is, and how to optimize your RAID-5 arrays, in Part 2.</p>
<p>(Duncan Epping over at Yellow Bricks has a <a href="http://www.yellow-bricks.com/2009/12/23/iops/">post</a> where he uses some constant ratios for his RAID write penalties. I have a philosophical disagreement with this idea, but I&#8217;ll link to it because my answer of &#8220;it depends&#8221; isn&#8217;t really constructive either.)</p>
<p><span class="Apple-style-span" style="font-size: 15px; font-weight: bold;">RAID-6 (striping with double parity)</span></p>
<p>Performance characteristics are almost identical to RAID-5, with two significant differences. First, the parity calculations take an order of magnitude more processing power, because the algorithms are much more sophisticated. Second, two disks in each stripe are reserved for parity data &#8212; these disks will not contribute to your IOPS.</p>
<p><strong>Read IOPS:</strong> (IOPS per disk × (number of disks &#8211; 2))<br />
<strong>Write IOPS:</strong> (IOPS per disk × (number of disks &#8211; 2) × RAID-6 write penalty)</p>
<p><span class="Apple-style-span" style="font-size: 20px; font-weight: bold; line-height: 26px;">Aggregating I/O profiles</span></p>
<p>One final thing to note: if you have enough concurrent sequential I/O tasks running at the same time, your I/O profile turns from sequential to random. The array is trying to keep these requests from being starved, and slow is usually better than no data at all, so it starts seeking all over the place instead of streaming a nice, even line of consecutive blocks off the disk. Keep this <em>very much</em> in mind when determining which measurement, IOPS vs. sequential throughput, is a more useful measurement for the workload you&#8217;re trying to size.</p>
<p>In Part 2, I&#8217;ll go over the impact of stripe sizing and how almost everybody does it <em>completely wrong.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/08/disk-performance-part-1-how-performance-is-measured/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Runbooks are stupid and you&#8217;re doing them wrong</title>
		<link>http://holyhandgrenade.org/blog/2011/08/runbooks-are-stupid-and-youre-doing-them-wrong/</link>
		<comments>http://holyhandgrenade.org/blog/2011/08/runbooks-are-stupid-and-youre-doing-them-wrong/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 18:11:33 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=1129</guid>
		<description><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/08/runbooks-are-stupid-and-youre-doing-them-wrong/" title="Runbooks are stupid and you&#039;re doing them wrong"></a>Well, maybe you are and maybe you aren&#8217;t. I have no idea. But if your shop is anything like the majority of IT shops I&#8217;ve seen, then this assessment is probably on the money. The runbook is one of the &#8230;<p class="read-more"><a href="http://holyhandgrenade.org/blog/2011/08/runbooks-are-stupid-and-youre-doing-them-wrong/">Continue reading &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://holyhandgrenade.org/blog/2011/08/runbooks-are-stupid-and-youre-doing-them-wrong/" title="Runbooks are stupid and you&#039;re doing them wrong"></a><p>Well, maybe you are and maybe you aren&#8217;t. I have no idea. But if your shop is anything like the majority of IT shops I&#8217;ve seen, then this assessment is probably on the money.</p>
<p>The <a href="http://en.wikipedia.org/wiki/Runbook">runbook</a> is one of the most pervasively mediocre, poorly thought-out and badly-implemented concepts in the entire IT industry. For those of you who are unfamiliar with the term, the runbook is basically a &#8220;how can grandma run this application?&#8221; document.</p>
<p>Their use should be very strongly scrutinized.</p>
<p><span id="more-1129"></span></p>
<h2>When all you have is a hammer, the whole world looks like a nail; or, don&#8217;t use a runbook when you need a script</h2>
<p>This is so obvious that it should never need documenting for anybody, for any reason, yet I&#8217;m constantly seeing people write runbooks that are just lists of actions for a system operator to take, one after the other, when something goes wrong. There are literally no decision points where a human needs to form an intelligent thought to execute this runbook. The runbook reads like a script.</p>
<p>A script. For a person.</p>
<p>Something has gone wrong. The train has flown off the rails.</p>
<p>Isn&#8217;t the entire point of technology to make people more productive? So why are we taking something that&#8217;s essentially a mechanical, computerized task, easily performed by a script or program, and turning it into a format that needs to be blasted from an output device into someone&#8217;s eyes, processed by a human brain, jammed onto a keyboard and mouse, and then back into the infrastructure? Shouldn&#8217;t we be skipping the middleman?</p>
<p>Working on fun, thought-provoking problems goes much further for staff happiness and retention than having a cabal of people whose job title is <em>Guy Who Pushes the Button</em>. I completely understand the need for staff engagement. But the right time for that is during build-out and engineering, not in the middle of a crisis where the business is losing money or pissing off customers. I assure you that they&#8217;re a lot madder about the outage than your ops team is about it being too easy to fix.</p>
<p>In a crisis, failures should be obvious and recovery should be automated as much as possible to minimize the impact of human error. This brings me to my next point.</p>
<h2>A good monitoring system, not a dumb manual process, should tell you what&#8217;s wrong</h2>
<p>There&#8217;s always going to be exceptions to this, of course. Computers are bad at deriving context about why an application&#8217;s performance profile has changed. If your page views are a hundred times higher today than they were yesterday because your site ended up on the front page of Digg or Reddit, your site will not be performing the same as it did yesterday. There will always be times where you need a human keeping an eye on the performance charts (humans are much better at reading graphs than computers are) and trying to figure out why things aren&#8217;t working the way they&#8217;re supposed to be.</p>
<p>(Those of you following the DevOps movement: look up that video about Etsy&#8217;s dashboards. The best and brightest ops people these days are keeping an eye on business metrics, like sales figures or numbers of code deployments, rather than low-level system metrics.)</p>
<p>But for a lot of other cases, the runbook is representative of somebody being lazy and not correctly integrating the process with the monitoring system. Any line saying &#8220;watch out for _____&#8221; should be immediately suspect. Human brains are really powerful, and really good at figuring out real problems. Your ops engineers should be focusing their time on <a href="http://en.wikipedia.org/wiki/There_are_known_knowns">unknown unknowns</a>. If you know what the performance criteria are that signal a problem, you should be monitoring for those conditions automatically. There are a lot of <a href="http://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity">statistical</a> <a href="http://en.wikipedia.org/wiki/Box%E2%80%93Jenkins">models</a> that can help you, if you&#8217;re willing to put in the effort to use them.</p>
<h2>Systems should be self-healing</h2>
<p>The best of IT shops often simply <em>don&#8217;t do this</em> unless they&#8217;re integrating the component into a much bigger high-availability project. I&#8217;ve found two main reasons.</p>
<p>The first is that admins and engineers seem to believe if they spend enough time building infrastructures correctly in the first place, there won&#8217;t be repeatable failures. If you&#8217;re going to put the effort into writing a bunch of code to make a system more reliable, shouldn&#8217;t you put that effort into just making sure it never happens?</p>
<p>Well, yes and no. Some failures are incredibly difficult to prevent but really easy to detect and really easy to recover from, especially if not all the factors are under your control. But other failures also have highly complex causes, and it may take several break-fix iterations before the problem actually disappears. If you&#8217;re building out a reliable service, isn&#8217;t it better to cut downtime by 95% for 90% of cases where the problem occurs, rather than eliminating 100% of downtime for 50% of cases?</p>
<p>I&#8217;m not saying that technical debt is somehow a good thing, but motivated operations people have accomplished really great things for their end-users with duct tape and staples. There&#8217;s nothing wrong with working around a problem as long as the fix isn&#8217;t fragile and it doesn&#8217;t impede your ability to maintain the application down the road. It doesn&#8217;t necessarily mean you&#8217;re avoiding the problem; rather, you&#8217;re finding better places to invest your time.</p>
<p>This brings us to the second reason: people don&#8217;t trust the idea that a system can automatically recover itself from failure. And, really, it ties into the first a little bit: we think our infrastructures are too good to suffer these minor outages, especially from obvious causes. But they aren&#8217;t, and a little creative engineering can keep a minor situation from turning into a minor outage, or a minor outage from turning into a major one. And we all have SLAs, even if there&#8217;s nothing formal and your boss&#8217;s idea of a service level is &#8220;keep the systems running well enough where I don&#8217;t feel compelled to fire you.&#8221;</p>
<p>Take special note of this if you don&#8217;t control your applications. As an aside, I used to work at a small web hosting business a number of years ago. We had a number of customers running ASP applications on top of IIS, which is Microsoft&#8217;s web server platform. Every once in awhile, a customer&#8217;s website would suffer a crash of their application pool, because classic ASP wasn&#8217;t good at releasing resources if you weren&#8217;t a diligent coder. We couldn&#8217;t control the code our customers ran on their sites, but we could monitor their sites and restart the application pool if it started to toss errors in a very specific way.</p>
<p>Many operating systems, like Solaris and Windows, take a very pragmatic approach to the problem. If the service crashes, restart it. If it crashes more than X times, leave it down and let the admin deal with it. These are obvious. Some non-obvious things you might want to consider regardless of how you&#8217;re handling high-availability:</p>
<ol>
<li>When the filesystem containing /var/log is almost to capacity, compress or delete old logs before the volume fills up.</li>
<li>When daily cronjob X fails because a network service is down, retry it a few times with an <a href="http://en.wikipedia.org/wiki/Exponential_backoff">exponential backoff</a> instead of waiting until cron runs it again tomorrow.</li>
<li>If an application crashes and writes a specific error message into the logs indicating what made it fail, identify the problem, fix it, and start the service back up without human intervention.</li>
</ol>
<p>Everyone has common, repeatable failures in their infrastructure, though the precise definition of &#8220;common&#8221; may vary from shop to shop. Not all of these issues will cause outages, especially if the infrastructure is designed for high availability, but let&#8217;s not pretend that all our applications are perfect. At the same time, let&#8217;s not delude ourselves into thinking that <em>eh, app crashed, restart the daemon</em> is always an adequate solution to a problem. It takes some thinking about the application and understanding it.</p>
<h2>Runbooks aren&#8217;t always bad</h2>
<p>I can think of the following cases where runbooks are useful to an IT organization:</p>
<ol>
<li>Ensuring there&#8217;s a contingency plan if the script goes wrong and nobody knows how to fix it</li>
<li>Orienting and coordinating staff in an emergency, so <em>everyone knows the appropriate responsibilities, escalations and handoffs</em></li>
<li>Solidifying a process that has so many moving parts that, even though it may take days to document, it might take weeks, months or years to get automated properly</li>
</ol>
<div>Runbooks should only contain the pieces that are <em>relevant to people</em> and <em>help them communicate better</em>. If you can document the intent, you can translate it into code. Even if the code has bugs in it, they&#8217;re the same bugs everywhere, and a consistent behavior is almost always better than an ambiguous one.</div>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2011/08/runbooks-are-stupid-and-youre-doing-them-wrong/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

