<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>holyhandgrenade.org &#187; nagios</title>
	<atom:link href="http://holyhandgrenade.org/blog/tag/nagios/feed/" rel="self" type="application/rss+xml" />
	<link>http://holyhandgrenade.org/blog</link>
	<description>System administration from the trenches.</description>
	<lastBuildDate>Wed, 28 Jul 2010 05:31:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Monitoring Windows MPIO through Nagios</title>
		<link>http://holyhandgrenade.org/blog/2010/05/monitoring-windows-mpio-through-nagios/</link>
		<comments>http://holyhandgrenade.org/blog/2010/05/monitoring-windows-mpio-through-nagios/#comments</comments>
		<pubDate>Sun, 30 May 2010 18:08:44 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[nagios]]></category>
		<category><![CDATA[san]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=615</guid>
		<description><![CDATA[Sometimes, we need to do SAN maintenance &#8212; firmware upgrades, disruptive fabric changes, and the like. When these situations come up, it&#8217;s useful to know if anything is in a condition where it will break if it loses its connection to SAN storage, especially if you&#8217;re a lowly storage administrator without admin access to any [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes, we need to do SAN maintenance &#8212; firmware upgrades, disruptive fabric changes, and the like. When these situations come up, it&#8217;s useful to know if anything is in a condition where it will break if it loses its connection to SAN storage, especially if you&#8217;re a lowly storage administrator without admin access to any of the Windows systems connected up to the SAN.</p>
<p>I poked around, and could not find one single utility or tool for monitoring the Windows MPIO framework, so I whipped up a quick script using VBScript and WMI. The script is called like so:</p>
<p style="padding-left: 30px;">cscript.exe //NoLogo scripts\CheckMpioPaths.vbs /paths 4</p>
<p>(4 paths are used because the server is multipathed on two fabrics, and each of the active/passive controllers is also on each fabric &#8212; the server should see 2 controllers on 2 fabrics each, for 4 paths.)</p>
<p>This will cause the script to issue a Nagios CRITICAL if any multipath-registered LUN shows fewer than the given number of paths.</p>
<p>As usual, you can find the script in the <a href="http://github.com/jgoldschrafe/CheckMpioPaths">GitHub repository for CheckMpioPaths</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2010/05/monitoring-windows-mpio-through-nagios/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Charting performance data for IBM Midrange Storage Series SANs with PNP4Nagios</title>
		<link>http://holyhandgrenade.org/blog/2010/05/charting-performance-data-for-ibm-midrange-storage-series-sans-with-pnpnagios/</link>
		<comments>http://holyhandgrenade.org/blog/2010/05/charting-performance-data-for-ibm-midrange-storage-series-sans-with-pnpnagios/#comments</comments>
		<pubDate>Mon, 24 May 2010 16:08:49 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[nagios]]></category>
		<category><![CDATA[pnp4nagios]]></category>
		<category><![CDATA[san]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=579</guid>
		<description><![CDATA[If you&#8217;ve used IBM SAN products, particularly the DS4000, DS5000 and DS6000 series (which are rebranded LSI), one of the most obnoxious things about it is how you&#8217;re pretty much forced to roll your own monitoring tools. Compared to many mainstream vendors (and Sun/Oracle in particular), IBM&#8217;s performance monitoring and modelling tools have been lackluster [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve used IBM SAN products, particularly the DS4000, DS5000 and DS6000 series (which are rebranded LSI), one of the most obnoxious things about it is how you&#8217;re pretty much forced to roll your own monitoring tools. Compared to many mainstream vendors (and Sun/Oracle in particular), IBM&#8217;s performance monitoring and modelling tools have been lackluster at best and completely unsupplied at worst. The best tool you&#8217;ve got is the SMcli, which doesn&#8217;t supply a ton of good information, but at least provides you with a starting point for capacity planning.</p>
<p>I had originally wanted to make something like this for Cacti, which probably has a much broader install base than the pnp4nagios addon, but the Nagios way was just so <em>easy</em>, and I&#8217;d like to share it with anyone who doesn&#8217;t want to roll their own basic performance aggregator for it.</p>
<p>This tool gets the following statistics:</p>
<ul>
<li>IOPS</li>
<li>Throughput</li>
<li>Read percentage</li>
<li>Cache hit percentage</li>
</ul>
<p>It gets statistics at the following levels:</p>
<ul>
<li>Logical Unit</li>
<li>Physical Array</li>
<li>Controller</li>
<li>Unit</li>
</ul>
<p>It&#8217;s a little quick-and-dirty, but it works:</p>
<p><img class="alignnone size-medium wp-image-584" title="check_smcli_io" src="http://holyhandgrenade.org/blog/wp-content/uploads/2010/05/check_smcli_io-300x122.png" alt="check_smcli_io" width="300" height="122" /></p>
<p>Like my other projects, it&#8217;s hosted on GitHub, so check out the <a href="http://github.com/jgoldschrafe/check_smcli_io">GitHub project for check_smcli_io</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2010/05/charting-performance-data-for-ibm-midrange-storage-series-sans-with-pnpnagios/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nagios plugin: check_sa.pl</title>
		<link>http://holyhandgrenade.org/blog/2009/11/nagios-plugin-check_sa-pl/</link>
		<comments>http://holyhandgrenade.org/blog/2009/11/nagios-plugin-check_sa-pl/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 21:41:48 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[nagios]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=410</guid>
		<description><![CDATA[There&#8217;s a lot of useful Nagios addons out there. One of them, pnp4nagios, allows you to create graphs of all of your Nagios performance data with zero configuration. This is pretty nice, because your monitoring configurations are kept in one place, rather than having to separately maintain configurations for Nagios and Cacti (or whatever you [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a lot of useful Nagios addons out there. One of them, <a href="http://www.pnp4nagios.org">pnp4nagios</a>, allows you to create graphs of all of your Nagios performance data with zero configuration. This is pretty nice, because your monitoring configurations are kept in one place, rather than having to separately maintain configurations for Nagios and Cacti (or whatever you use).</p>
<p>I&#8217;ve always wanted to be able to monitor things like number of open sockets, page faults, context switches, and other performance counters. Some of them are available through SNMP; others aren&#8217;t. The ones that are available aren&#8217;t all available by device. I wanted a little bit more detail.</p>
<p>The other problem with SNMP queries is that a Nagios check doesn&#8217;t query an average &#8212; something that spikes for a minute is not the same as a condition that persists for several minutes or hours. I wanted to leverage the built-in accounting in sysstat to pull together something Nagios can actually make a little bit of sense out of.</p>
<p>Anyway, I went ahead and created a Nagios plugin that will parse the output of sadf (which is a frontend to sa/sar performance counters). You can query multiple counters at a shot, specifying separate alert thresholds for each (or none at all, if you just want performance data). You can specify, via shell-style glob patterns, which devices you want to include or exclude, so that you can, for example, exclude all &#8220;lo&#8221; and &#8220;tun*&#8221; devices from network statistic monitoring. You can also pick the sampling period, so if you want an average of the last 30 minutes the plugin will produce it.</p>
<p>You can do stuff like this:</p>
<blockquote><p><strong>./check_sa.pl -i -C %usr -C %soft -C %sys -C %idle -D all</strong><br />
SA OK &#8211; All counters within specified thresholds. | %idle[cpu0]=96.84;; %idle[cpu1]=96.31;; %idle[cpu2]=97.23;; %idle[cpu3]=95.8;; %soft[cpu0]=0;; %soft[cpu1]=0.01;; %soft[cpu2]=0;; %soft[cpu3]=0.01;; %sys[cpu0]=0.4;; %sys[cpu1]=0.46;; %sys[cpu2]=0.36;; %sys[cpu3]=0.63;; %usr[cpu0]=2.67;; %usr[cpu1]=3.13;; %usr[cpu2]=2.27;; %usr[cpu3]=3.46;;</p></blockquote>
<p>Or, if you prefer to summarize:</p>
<blockquote><p><strong>./check_sa.pl -i -C %usr -C %soft -C %sys -C %idle -d all</strong><br />
SA OK &#8211; All counters within specified thresholds. | %idle[all]=96.54;; %soft[all]=0;; %sys[all]=0.46;; %usr[all]=2.89;;</p></blockquote>
<p>It&#8217;s still a tiny bit slow &#8212; it takes about 500-600 ms to run on the systems I&#8217;ve tested &#8212; but this should be good enough to be useful without bogging down Nagios too badly.</p>
<p>The script requires the Text::Glob module to be installed, so it can convert shell-style globs into regular expressions to match against.</p>
<p>View the project:</p>
<ul>
<li><a href="http://github.com/jgoldschrafe/check_sa">On GitHub</a></li>
<li><a href="http://monitoringexchange.org/inventory/Check-Plugins/Operating-Systems/Linux/check_sa">On MonitoringExchange</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2009/11/nagios-plugin-check_sa-pl/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
