<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>holyhandgrenade.org &#187; virtualization</title>
	<atom:link href="http://holyhandgrenade.org/blog/tag/virtualization/feed/" rel="self" type="application/rss+xml" />
	<link>http://holyhandgrenade.org/blog</link>
	<description>System administration from the trenches.</description>
	<lastBuildDate>Wed, 28 Jul 2010 05:31:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Bottom-Up Virtualization</title>
		<link>http://holyhandgrenade.org/blog/2009/10/bottom-up-virtualization/</link>
		<comments>http://holyhandgrenade.org/blog/2009/10/bottom-up-virtualization/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 06:13:37 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[hpc]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://holyhandgrenade.org/blog/?p=235</guid>
		<description><![CDATA[Working in the life sciences industry, I often deal with users who have requests that might be considered strange in other fields. For example, my organization has users asking for systems with 2 terabytes of RAM. We have other users asking for systems with 12 terabytes of RAM. To a normal system administrator who doesn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>Working in the life sciences industry, I often deal with users who have requests that might be considered strange in other fields. For example, my organization has users asking for systems with 2 terabytes of RAM. We have other users asking for systems with <strong>12</strong> terabytes of RAM. To a normal system administrator who doesn&#8217;t run OLTP systems for a bank or brokerage, where you might find huge-memory database systems, this technical requirement seems silly. However, for gene sequence assembly and analysis, this much memory is really a requirement with longer sequences. Short read assemblers like <a href="http://www.ebi.ac.uk/~zerbino/velvet/">Velvet</a> can chew through this in as much time as it takes the system to allocate all that memory.</p>
<p>You don&#8217;t have to be on the forefront of bleeding-edge server technology to know that x86 systems with 12 terabytes of RAM simply don&#8217;t exist. With RAM density as it is right now, there&#8217;s simply no way to fit that many DIMMs on a board. However, some inventive software steps up to the plate.</p>
<p>We&#8217;ve been meeting with a company called <a href="http://www.scalemp.com/">ScaleMP</a>. ScaleMP is, in strict terms, a virtualization software vendor. However, unlike companies like VMware, ScaleMP specializes in using virtual machine monitors to aggregate CPU and memory resources among a number of InfiniBand-connected hosts, presenting a logical system with all of the combined CPU and memory resources of the aggregated physical machines. Through their black magic technology, they apparently do this without substantial overhead to the host, and the resulting virtual machine performs on par with a parallelized MPI solution utilizing operating systems running atop bare metal on the physical nodes. The difference, of course, is that you have a very large coherent block of memory to work with. If you&#8217;re familiar with Isilon&#8217;s storage architecture, the pattern should look familiar.</p>
<p>There&#8217;s an <a href="http://www.hpcwire.com/features/Aggregating-Clusters-Through-Virtualization-Virtual-SMP-Benefits-36258739.html">article from HPCwire</a> written by Shai Fultheim, ScaleMP&#8217;s CEO, that sums up this approach a lot better than I could hope to. But the ten-cent version is that you can use this to virtualize compute, memory and I/O resources to present a very large system for single tasks that require tons of memory and extremely fast parallelism, or you can use it to aggregate an entire cluster into a single virtualized node that would completely eliminate the need for traditional cluster management tools.</p>
<p>I was thinking about this a little while ago when I posted <a href="http://holyhandgrenade.org/blog/2009/10/vmotionlive-migration-is-not-an-ha-feature/">VMotion/Live Migration is not an HA feature</a>.</p>
<blockquote><p>Maybe we’ll see cache-coherent shared-memory virtual infrastructures running over InfiniBand, removing the network overhead that was pointed to as a problem by Rational Survivability.</p></blockquote>
<p>It started out as a sidenote, but it really got me thinking about the big picture. Why isn&#8217;t this a direction we&#8217;re seeing existing virtualization vendors moving in, vendors who currently embrace the partitioning approach? Storage vendors have historically worked from the idea that true virtualization involves both aggregation and partitioning. It&#8217;s not enough to simply present disks to multiple hosts. You <em>aggregate</em> them into storage pools, and then you carve up LUNs and present them to your storage network. Why aren&#8217;t we trying to make compute cycles commoditized for generalized workloads, instead of just specific programs written for message-passing interfaces?</p>
<p>Vendors have heavy investments in distributed infrastructures, using tools like VMotion and DRS to balance resource utilization and maximize consolidation ratios. But is this really the optimal approach to this problem? What if you didn&#8217;t need to dynamically balance workloads because the hypervisor&#8217;s SMP scheduler would do it automatically on an enormous aggregated system? For day-to-day operations (as opposed to offsite migrations, where VMotion can still be rather useful), what if you were able to move virtual machines across an InfiniBand fabric as a simple in-memory copy, rather than sending the entire contents of a virtual machine&#8217;s memory over the network? What if all of your virtual page sharing was completely coherent across your virtualized compute grid, and you really could have one single OS instance in memory running your entire infrastructure?</p>
<p>Certainly there&#8217;s a lot of complications and a lot of engineering in this approach. First, of course, is resiliency and failure isolation: how do you make sure that a single server failure doesn&#8217;t bring down every OS instance on the grid, which happen to be running tasks on that system&#8217;s CPUs? (There&#8217;s checkpointing approaches for existing large-scale SMP systems, which could probably be applied to the vSMP approach as well; however, this is pretty academic discussion, and I&#8217;m not going to pretend to know how viable it is.)<a href="http://cs.binghamton.edu/~brood/HPDCHT07.pdf"></a> With resiliency in mind, what&#8217;s the best way of allocating and distributing resources so that a minimal amount of recovery has to occur in the event of a failure? It&#8217;s not useful to recover in this way if it takes longer than a regular clean boot.</p>
<p>This kind of engineering will take a very long time, but I think it&#8217;s inevitable. Virtualization vendors have gotten the host resource partitioning part down to the point where I don&#8217;t know if anything new can even happen in that space, but there&#8217;s a lot more exciting things that can happen once the aggregation piece is layered underneath the hypervisor as we know it today.</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2009/10/bottom-up-virtualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VMotion/Live Migration is not an HA feature</title>
		<link>http://holyhandgrenade.org/blog/2009/10/vmotionlive-migration-is-not-an-ha-feature/</link>
		<comments>http://holyhandgrenade.org/blog/2009/10/vmotionlive-migration-is-not-an-ha-feature/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 16:45:22 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[commentary]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://www-new.holyhandgrenade.org/wordpress/?p=191</guid>
		<description><![CDATA[I&#8217;m a couple of weeks behind the ball here, but I was a bit inspired by this (somewhat controversial) post over at Standalone Sysadmin: I’m sorry. I know you probably paid a lot for that license, but if your infrastructure is relying on a machine’s ability to transition between VM hosts without rebooting as the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a couple of weeks behind the ball here, but I was a bit inspired by this (somewhat controversial) <a href="http://www.standalone-sysadmin.com/blog/2009/09/vm-live-migration-is-the-wrong-tactic/">post</a> over at <a href="http://www.standalone-sysadmin.com">Standalone Sysadmin</a>:</p>
<blockquote><p>I’m sorry. I know you probably <a href="http://itmanagement.earthweb.com/netsys/article.php/3831561/A-Virtual-Infrastructure-Saves-Money-But-It-Aint-Free.htm">paid  a lot</a> for that license, but if your infrastructure is relying on a machine’s  ability to transition between VM hosts without rebooting as the crux of your  high availability plan, you might want to reconsider.</p>
<p>Yesterday, <a href="http://www.rationalsurvivability.com/">Rational Survivability</a> (a great  all-over-the-place IT blog) had a post titled <em><a href="http://www.rationalsurvivability.com/blog/?p=1391">The Emotion of  VMotion</a></em>. It didn’t occur to me before reading this that my own previous  search for a hypervisor that would do live migration was working directly  against my own beliefs that <a href="http://www.standalone-sysadmin.com/blog/2009/09/modern-uptime-measured-from-the-outside-in/">uptime  should only matter for services</a>. Essentially, the infrastructure should be  designed so that a single server down doesn’t contribute to the loss of  availability.</p>
<p>That being said, live migration is a neat idea, and eventually it’s going to  get to the point that it’s nearly instantaneous. When that happens, failovers  will be next to invisible. Maybe we’ll have to reevaluate our approach in that  case.</p>
<p>Until then, I read posts from people trying to rely on it to <a href="http://communities.vmware.com/thread/47097">keep their infrastructures  up</a> and I worry that their approach is flawed.</p>
<p>Please, build your services for reliability, not just the underlying systems.</p></blockquote>
<p>Now, I need to preface this by saying that I&#8217;m not missing the point of Matt&#8217;s post. There&#8217;s a lot of administrators out there who do treat live migration as a panacea for whatever ails your reliability problems. Anyone who has attempted to design real high-availability infrastructures is very aware that application-level clustering is more robust and typically more reliable than OS-level clustering, which is more robust than hypervisor-level clustering. But these features don&#8217;t compete with each other. They each function as a different piece of the datacenter puzzle. And as Matt implies, the cost savings aren&#8217;t right for everyone &#8212; but they are right for some people.</p>
<p>Absolutely, without a doubt, clustered services are a wonderful, great idea &#8212; that&#8217;s why people have been using them for decades, and continue to use them. And even though VMotion makes it very easy to add some server-level resiliency to any host or service, the application-level clusters are becoming much easier to configure and maintain at the same time, thanks to great configuration management tools like Puppet, Chef, and Cfengine.</p>
<p>But the big picture is an entire ecosystem around which VMotion thrives. The big cost driver for virtualization in large datacenter environments is consolidation, and being able to run multiple workloads on the same piece of physical hardware is only the first step. Consolidation ratios are improved substantially when you can transparently load-balance workloads in terms of network traffic, compute power and disk I/O &#8212; you don&#8217;t have to worry about a single bottleneck breaking your carefully-designed system. In addition to the raw server consolidation gains, you substantially save on engineering power, as there&#8217;s a lot less manual labor required to design a viable virtualized infrastructure, and a lot less things go wrong if you get it wrong. And if you require compute capacity on demand &#8212; say that the majority of your processing occurs during normal business hours and your servers stay mostly idle afterwards &#8212; a solution like DRS can actually completely power down your unused VMware hosts until your compute capacity is needed again.</p>
<p>Sure, this isn&#8217;t appropriate for everyone. In a pie-in-the-sky IT infrastructure, grid services would provide uniform access to compute capacity and storage on demand using commodity hardware, like Google or Facebook or other players who rely heavily on things like Hadoop or MapReduce in order to scale their operations. But for most real businesses, which have a real investment in commercial off-the-shelf software like databases, ERP systems, CRM and other necessities, we need hypervisors to abstract away the problem and do the work that the COTS vendors won&#8217;t, even if the result isn&#8217;t as elegant as it should be. And I&#8217;m sure that as the hypervisor marketplace matures and consolidates, VMware, Citrix, Microsoft, Red Hat and other vendors will begin to do things with their platforms that we haven&#8217;t even thought of yet. Maybe we&#8217;ll see cache-coherent shared-memory virtual infrastructures running over InfiniBand, removing the network overhead that was pointed to as a problem by Rational Survivability. The possibilities are endless.</p>
<p>It seems like in this instance, Matt is railing more against the idea of boot-from-SAN than he is about VMotion himself, as boot-from-SAN is another way of solving the same problem &#8212; it adds resiliency against hardware failure, but not a ton else. In various ways, he&#8217;s right: if you ignore maintenance of your systems documentation and proper server rebuild procedures in favor of a magical black box, your environment will become an unmaintainable mess as a result. It&#8217;s the same argument that Luke Kanies has been making about using Puppet or other configuration management systems versus golden master images. In this respect, I think Matt is right to want to know his systems well enough to rebuild them from scratch. It also makes upgrades and other migrations much simpler and smoother.</p>
<p>But every tool is just that: a tool. And they should be used as tools, and evaluated in terms of their effectiveness as a tool. You shouldn&#8217;t throw away a perfectly good tool because it doesn&#8217;t live up to the hype you were promised. You should use it if it delivers a real return on investment.</p>
]]></content:encoded>
			<wfw:commentRss>http://holyhandgrenade.org/blog/2009/10/vmotionlive-migration-is-not-an-ha-feature/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
