Home › Category Archives › Sysadmin

Calling custom functions from other custom functions in Puppet

This post probably describes a bug, but I haven’t had the time yet to determine if this still exists in Puppet 2.6.x. Instead, here’s a post that will hopefully help out someone else having the same problem.

The other day, I was writing some custom parser functions for our Puppet 0.25.x install. In the interest of reusability, the idea was to keep the functions small and composable, and have them call each other in a nice, maintainable way in order to prevent code duplication.

As per the Puppet documentation, Puppet functions are called from Ruby by prefixing the function name with function_:

module Puppet::Parser::Functions
    newfunction(:my_function) do |args|
        # Puppet functions expect an array containing all of the arguments,
        # so we have to wrap our single string argument
        function_notice(["Called notice() from my_function()"])
    end
end

This worked great for calling built-in functions, but when I tried to call one of my own functions from another of my own functions, Puppet would just hang:

my_function.rb:

module Puppet::Parser::Functions
    newfunction(:my_function) do |args|
        function_my_other_function(args)
    end
end

my_other_function.rb:

module Puppet::Parser::Functions
    newfunction(:my_other_function)
        # Never get here
    end
end

After a bunch of debugging, I found that the Puppet autoloader seemed to be spinning itself into an infinite loop trying to locate and load my_other_function when it was called from my_function.rb. The solution was to manually require the file containing that function:

require File.join([File.expand_path(File.dirname(__FILE__)), 'my_other_function.rb'])
 
module Puppet::Parser::Functions
    newfunction(:my_function) do |args|
        function_my_other_function(args)
    end
end

The above assumes, of course, that the functions are in the same directory as one another.

With the dependency loaded, the custom function should work the same as any other parser function.

DevOps still sort of misses the big picture

It’s been long enough since I’ve updated this blog that I’m just going to assume everyone knows what’s up with DevOps. It’s a movement I’ve had a love-hate relationship with. I think that it really works well in Web 2.0-style shops where all of the development work and all of the sysadmin work takes place in-house. Unsurprisingly, it also works really poorly when the applications you’re supporting are opaque black boxes that don’t expose how they work under the covers. (Taking the “Dev” out of “DevOps” just leaves you with “Ops,” which puts us right back where we started.)

I was doing some light reading today, and I happened to catch a particular article by Grig Gheorghiu over at Agile Testing comparing and contrasting systems monitoring with unit testing. His thoughts can be summarized with the following quote:

Good developers are test-infected. It doesn’t matter too much whether they write tests before or after writing their code — what matters is that they do write those tests as soon as possible, and that they don’t consider their code ‘done’ until it has a comprehensive suite of tests. And of course test-infected developers are addicted to watching those dots in the output of their favorite test runner.

Good ops engineers are monitoring-infected. They don’t consider their infrastructure build-out ‘done’ until it has a comprehensive suite of monitoring checks, notifications and alerting rules, and also one or more dashboard-type systems that help them visualize the status of the resources in the infrastructure.

This is true. What I think is problematic is that for all the communication the last few years have brought on, developers are still leaving it to ops to figure out how to monitor the thing, when the role of ops should strictly be trying to figure out what went wrong and how to fix it.

Let’s say your company is developing a Big Internet Thing. You have a sizeable, reasonably complex application with a lot of system dependencies. The developers have already put in a ton of work to write all of the unit tests for it. They already have all of the plumbing in place to catch every conceivable minor regression at every step of the application. Why are the sysadmins, the people responsible for rolling this thing into production, being forced to reinvent the wheel? Why can’t the same unit tests the developers are already using be tuned to provide usable metrics for the ops team? Can’t tests just be idempotent?

As DevOps matures, I think this integration has to continue tightening. The unnecessary duplication of effort undertaken by sysadmins every day because software developers don’t publish their test suites is probably costing the world countless billions of dollars in lost productivity every year.

Exchange 2010 SP1 Update Rollup 2 supports multiple public folder databases

I wasn’t going to bother posting this, because the release was already posted about on the MSExchangeTeam blog, but I saw that the nonchalant way they tagged this feature in the release notes really didn’t do it justice:

2409597 Implement OpenFlags.AlternateServer for PublicLogon

What this really means is that one of the biggest annoyances about Exchange 2010 has finally been resolved (sort of). The public folders database (which also stores Free/Busy information for calendaring) is not part of the DAG, making it the one piece that’s not accessed through the Client Access Server — instead, MAPI clients connect to the Public Folders database on the mailbox server directly. This has been a nightmare for high availability, because it seriously limited the scope of business-hours maintenance. It was possible to redirect clients to an alternate database copy, but that required editing their mailbox properties in batches via LDAP, and required all connected clients to close and reopen Outlook in order to reconnect to the new copy.

In Exchange 2010 SP1 UR2, it finally tells the connecting client about all of the available public folder database copies. When connectivity is lost to the primary, Outlook will, after a short delay of about 30 seconds in our testing, attempt to reconnect to another available server.

It would be great if they would finally make public folders part of the DAG, accessible through the CAS array like any mailbox resource, but they’re sending mixed messages about the role of public folders and generally seem to be pushing SharePoint for this type of information. Hopefully we have public folders as full DAG members in a future Exchange release.

Sharing code between Facter facts

Spurred on by Jordan Sissel’s post about nodeless Puppet configurations, which are purely  fact-driven, I’ve started writing a lot of custom facts for our environment. One thing that’s been driving me nuts has been how Facter doesn’t have a real, supported mechanism for code reuse between facts. I can understand this decision — facts are supposed to be orthogonal. At the same time, there’s a lot of flexibility to be found in custom facts, and let’s face it, there’s only so much you can do without real code reuse.

However, it is pretty easy to stick shared code into a library to be pulled by Puppet along with the Facter facts themselves, because Puppet happens to synchronize all Facter .rb files before it starts to actually run any of them. For this example, we’re going to write a fact called httpd_running that checks the process table for httpd processes, and returns true if it finds any. (Since it crawls /proc, it only runs on Linux. Sorry, cross-platform Puppet guys.)

When Puppet resynchronizes facts from the server, it dumps them all into the folder /var/lib/puppet/lib/facter (or equivalent). /var/lib/puppet/lib is part of the RUBYLIB for any Puppet invocation. This means that any .rb file synchronized in this way is loadable using

require 'facter/mylibname'

First, we’re going to need a module path for our libraries. You can really put it anywhere, but for the sake of convenience I put these all in my Facter class under modules/facter/lib/facter.

Next, we’ll create modules/facter/lib/facter/lib_process_running.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
module FacterShared
  def self.process_running?(process)
    running_pids.each do |pid|
      return true if process_name(pid) == process
    end
 
    return false
  end
 
  def self.process_name(pid)
    IO.foreach("/proc/#{pid}/status") do |line|
      if line =~ /Name:\s*(\w+)/
        return $1
      end
    end
 
    return ''
  end
 
  def self.running_pids
    Dir.entries("/proc").reject { |x| x =~ /\D/ }
  end
end

We stick this into a module so our functions don’t pollute the global namespace. I picked FacterShared because, well, it was completely arbitrary.

Next, we need a custom fact to actually call this, so let’s create modules/httpd/lib/facter/httpd_running.rb:

1
2
3
4
5
6
7
require 'facter/lib_process_running'
 
Facter.add("httpd_running") do |f|
  setcode do
    FacterShared.process_running? 'httpd'
  end
end

I’d love to hear about an official way to do this, but here’s something for those of you who need a way to do this here and now.

Changing your vCenter Update Manager database credentials

We recently outgrew the SQL Express instance on our vCenter server, and moved on to an HA cluster with two SQL Server instances and a log-shipped mirror at our DR site in preparation for clustering two vCenter instances. We’re using SQL Server authentication rather than Windows (Kerberos) authentication. It was difficult enough to find information on how to update the database username/password in vCenter 4.0+ without reinstalling, but the process for vCenter Update Manager is, as far as I know, completely undocumented.

I didn’t intend for this posting to be a full how-to: if you need information on how to detach/reattach the vCenter database in SQL Server and add back the maintenance tasks, VMware’s KB has a good article on this. It just happens to not touch on VUM. I’ll pick up where that KB left off.

The first issue is that vCenter Update Manager is a 32-bit application in 4.1, in contrast to the rest of vCenter which is now a 64-bit application. As a result, you’ll need to adjust your vCenter DSN in the 32-bit ODBC manager as well:

C:\Windows\SysWOW32\odbcad32.exe

Go through and update your DSN just as you would any ordinary DSN.

If you’re running Windows authentication, reconfiguring the Update Manager service is as simple as going into the Services view in Windows, finding the service and updating the username/password it starts with. If you’re running with SQL Server authentication, and DB-local logins, you’ll need to adjust some obviously-named but poorly-documented files in your vCenter Update Manager configuration file.

That file can be found here:

C:\Program Files (x86)\VMware\Infrastructure\Update Manager\vci-integrity.xml

Once inside, you’ll need to locate the <database> section, which looks like this:

  <database>
    <dbtype>SQL Native Client</dbtype>
    <dsn>VMware VirtualCenter</dsn>
    <initialConnections>20</initialConnections>
    <maxConnections>40</maxConnections>
  <database>

The XML elements you want to add are called username and password, as you might expect.

  <database>
    <dbtype>SQL Native Client</dbtype>
    <dsn>VMware VirtualCenter</dsn>
    <initialConnections>20</initialConnections>
    <maxConnections>40</maxConnections>
    <username>YourUsername</username>
    <password>YourPassw0rd</password>
  <database>

I found these names through trial and error (well, really, just “trial,” since these names happened to work the first time through). Restart the VUM service and you should be good to go.

Issues with Exchange 2010′s new-TestCasConnectivityUser.ps1 script

Today, I ran into this little bugger when trying to run Exchange 2010′s new-TestCasConnectivityUser.ps1 script to create test mailboxes for SCOM:

CreateTestUser : Mailbox could not be created. Verify that OU ( Users ) exists and that password meets complexity requirements.

Most of the Internet pointed to this being a problem with the OU specified, and recommended manually specifying the OU (including the domain), but in our case, it didn’t help. I dug through the script and found where they had suppressed the error reporting:

new-Mailbox -Name:$UserName -Alias:$UserName -UserPrincipalName:$UserPrincipalName -SamAccountName:$SamAccountName -Password:$SecurePassword -Database:$mailboxDatabaseName  -OrganizationalUnit:$OrganizationalUnit -ErrorVariable err -ErrorAction SilentlyContinue

Conveniently, they subsequently override $err instead of checking it and displaying something useful to the screen. Having been no stranger to Microsoft’s style of admin utility scripting since the IIS 5 days, I plodded forward and got this:

MailboxRecovery is a recovery database. Mailboxes can't be enabled on a recovery database.

How peculiar. Where is it getting this database name from?

#
# If there are multiple mailbox databases on this server, the user will be created in the last database returned
#
$mailboxDatabaseName = $null;
get-MailboxDatabase -server $mailboxServer | foreach {$mailboxDatabaseName = $_.Guid.ToString()}

Oh. I see how this is going to be. Let’s try this my way instead.

$mailboxDatabaseName = 'InfoTech'

Hello, working script.

Much better.

Using system GTK+ with VMware Workstation on Linux

VMware Workstation for Linux doesn’t seem to respect GTK+ themes on many Linux distributions. The reason for this is that it bundles a version of GTK+ that’s too old to load most modern system themes:

(vmware-modconfig:27223): Gtk-WARNING **: GModule (/usr/lib/gtk-2.0/2.10.0/engines/libxfce.so) initialization check failed: Gtk+ version too old (micro mismatch)

Workstation really tries to load the system GTK+ rather than its own, it really does. The problem is that it uses a set of C++ bindings called gtkmm that aren’t often installed on people’s desktop systems. If you’re getting the ugly old GTK+ that ships with VMware, you don’t have them installed, so it falls back on its bundled gtkmm (which links against the bundled GTK+). To fix this, just install the gtkmm libraries into your library path. On Ubuntu:

aptitude install libgtkmm-2.4-1c2a

And you’re all done. Adjust as necessary for your distro of choice.

Link redux: 10/15/2010

Things seem to be slow this week, so there’s only two particularly notable links this week:

  • High Scalability: I, Cloud
    Lori MacVittie postulates on the transformation of the IT industry, particularly in terms of job roles and skills, as a result of cloud computing. I usually hate these posts, but this is really quite good, and helps to firm up answers to some questions I’ve had burning in my head for the last week or so. I agree with her assertion that devops is on the fast track to quickly replacing traditional systems administration, as more sysadmins find themselves in cloud-centric roles.
  • VMguy: Separating the Windows Page File for Site Recovery Manager Replication
    I hadn’t really considered the impact of this, but it definitely seems like something virtualization and storage architects should keep in mind in order to reduce their replication traffic, particularly over asynchronous WAN links. This probably isn’t a useful consideration for those of us running campus/metro clusters.

An annoying and non-obvious rpmbuild “feature enhancement”

Specifically, under certain circumstances, it can dump debuginfo files into /usr/lib/debug and /usr/src/debug under your buildroot, neglect to build the corresponding -debuginfo package, and then have the gall to complain about the unpackaged files it dumped there.

I have a confession to make: I’m anal-retentive enough about the systems I administer where I need to build RPM packages for everything so they can be easily updated, but I’m lazy enough where I usually just grab source RPMs out of the most recent Fedora repositories and modify the specfiles until they work on CentOS 5. This can lead to some interesting issues, because RPM and rpmbuild are not quite the same in CentOS as they are in Fedora. Sometimes you’re never quite sure if something is a bugfix or a feature enhancement, and this was one of those lovely times.

This week, I got a request from a user to build a more recent version of gnuplot than the 4.0 version that ships with CentOS 5. Simple enough, right? I took the F13 SRPM, bumped the underlying source tarball to 4.4.1, made a couple of config fixes for the distro change and version bump, and then fired up Mock to build it for CentOS. It would build successfully, and the RPM packaging would bomb out with errors like the following in Mock’s build.log:


RPM build errors:
    Installed (but unpackaged) file(s) found:
   /usr/lib/debug/usr/bin/gnuplot-minimal.debug
   /usr/lib/debug/usr/bin/gnuplot-wx.debug
   /usr/lib/debug/usr/libexec/gnuplot/4.4/gnuplot_x11.debug
   /usr/src/debug/gnuplot-4.4.1/src/alloc.c
   /usr/src/debug/gnuplot-4.4.1/src/axis.c
   /usr/src/debug/gnuplot-4.4.1/src/axis.h
...

It took me two to three days of looking at this issue off and on to determine that the problem was related to a single innocuous line buried deep inside the package spec:

BuildArch: noarch

Interestingly, this was on a subpackage, which apparently is enough to trip up rpmbuild for all packages listed in the spec.

After removing that line, the -debuginfo was generated fine.

In short: on older rpmbuild versions, don’t build arch-specific binary packages that have noarch subpackages. This does work fine on newer rpmbuild versions.

Hope this helps somebody, somewhere.

Update: IBM DS4000/5000 replication on big LUNs works again with hotfix firmware

A couple of weeks ago, I posted about my issues with replication of >2TB LUNs on IBM SANs not working correctly using Enhanced Remote Mirroring. Well, IBM got me to install some hotfix firmware (version 07.60.40.00), and the problem appears to be resolved, though I’m still having issues with Flash Copies of one of the affected mirror LUNs showing up to Windows as an empty, uninitialized disk. I’m getting married in a week and am too busy polishing documentation before I take 2 weeks off to open yet another case with IBM. C’est la vie.

They’re probably going to kill me for calling this “hotfix firmware,” since I was assured this firmware was GA but not uploaded to the website because of some release engineering red tape. (Whatever, guys, I can’t download it without calling you, so it’s a hotfix as far as I’m concerned.)

Anyway, if you’re having this issue or are planning on replicating large LUNs with IBM Enhanced Remote Mirroring, contact your IBM support engineers and request that they send you firmware >=07.60.40.00.