EBS Snapshot script

Like it say’s

Over the last 18 months, the one key thingI have learnt about amazon is don’t use EBS, in any way shape or form, in most cases it will be okay, but if you start relying on it it can ruin even the best architected service and reduce it to a rubble. So you can imagine how pleased I was to find I’d need to write something to make EBS snapshots.

For those of you that don’t know, Alfresco Enterprise has an amp to connect to S3 which is fantastic and makes use of a local cache while it’s waiting for the s3 buckets to actually write the data and if you’re hosting in Amazon this is the way to go. It means you can separate the application from the OS & data, which is important for the following reasons:

1, EBS volumes suck, so where possible don’t use them for storing data, or for your OS,
2, Having data else where means you can, with out prejudice delete nodes and your data is safe
3, It forces you to build an environment that can be rapidly re-built

So in short, data off of the server means you can scale up and down easily and you can rebuild easily, the key is always to keep the distinctively different areas separate and do not merge them together.

So facing this need to backup EBS volumes I’d thought I’d start with snapshots, I did a bit of googling and came across a few ebs snapshot programs that seem to do the job, but I wanted one in Ruby and I’ve used amazon’s SDK’s before so why not write my own.

The script

#!/usr/bin/ruby

require 'rubygems'
require 'aws-sdk'

#Get options
access_key_id=ARGV[0]
secret_access_key=ARGV[1]



if File.exist?("/usr/local/bin/backup_volumes.txt")
  puts "File found, loading content"
  ec2 = AWS::EC2.new(:access_key_id => access_key_id, :secret_access_key=> secret_access_key)
  File.open("/usr/local/bin/backup_volumes.txt", "r") do |fh|
    fh.each do |line|
      volume_id=line.split(',')[0].chomp
      volume_desc=line.split(',')[1].chomp
      puts "Volume ID = #{volume_id} Volume Description = #{volume_desc}}"
      v = ec2.volumes["#{volume_id}"]
      if v.exists? 
        puts "creating snapshot"
        date = Time.now
        backup_string="Backup of #{volume_id} - #{date.day}-#{date.month}-#{date.year}"
        puts "#{backup_string}" 
        snapshot = v.create_snapshot(backup_string)
        sleep 1 until [:completed, :error].include?(snapshot.status)
        snapshot.tag("Name", :value =>"#{volume_desc} #{volume_id}")
      else
        puts "Volume #{volume_id} no longer exists"
      end
    end
  end
else
  puts "no file backup_volumes.txt"
end

I started writing it with the idea of having it just backup all EBS volumes that ever existed, but I thought better of it. So I added a file “backup_volumes.txt” so instead it will lead this and look for a volume id and a name for it, i.e.

vol-1264asde,Data Volume

if you wanted to backup everything it wouldn’t take much to extend this, i.e. change the following:

v = ec2.volumes["#{volume_id}"]

To

ec2.volumes.each do |v|

or at least something like that…

Anyway, the file takes the keys via the cli as params to the script so it makes it quite easy to run the script on one server in several cron jobs with different keys if needed.

It’s worth mentioning at this point that within AWS you should be using IAM to restrict the EBS policy down to the bear minimum something like this is a good start:

{
  "Statement": [
    {
      "Sid": "Stmt1353967821525",
      "Action": [
        "ec2:CreateSnapshot",
        "ec2:CreateTags",
        "ec2:DescribeSnapshots",
        "ec2:DescribeTags",
        "ec2:DescribeVolumeAttribute",
        "ec2:DescribeVolumeStatus",
        "ec2:DescribeVolumes"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

For what it’s worth, you can spend a while reading loads of stuff to work out how to set up the policy or just use the policy generator

Right, slightly back on topic, It also tags the name of the volume for you because we all know the description field isn’t good enough.

Well that is it. Short and something, something… Now the disclaimer, I ran the script a handful of times and it seemed good, So please test it :)

Vagrant & Chef

Sooo……

Last week I was starting to Play with chef and as part of that I was convinced by Tom (one of our Sysadmins) to stop building servers in Amazon and using them as development boxes. Traditionally I like to build things from the ground up and in stages, get the OS right, get the App right, get the tools right roughly in that order. By building the box from scratch, making a few changes in puppet and then re-running the scripts and continuing to iterate over the deployment you gradually build up to something that works. The down side with this can be that if you don’t re-deploy you never know for sure if it will work from scratch incase a step was missed, it can be costly and you are dependant on the network be it local or internet to be available.

So over the last few months on and off there had been various attempts to get Vagrant to work with virtual box on my Mac (10.6.8) now for what ever reason it never quite worked and therefore was pointless. However a swift reset of the laptop and an upgrade to Lion (10.7.5) seems to have resolved my issues, now being unblocked on a Vagrant front it was worth giving it a go.

Set up

It’s really quite straight forward, Firstly you need VirtualBox Secondly you need Vagrant Then you need a well constructed getting started guide like This

That’s pretty much it to get up and running, You may want to get a few more Boxes for centos or something else but other than that getting a simple box up and working is easy, and to login you just type “vagrant ssh” Simple.

But what about making it useful? So with the help of Tom we were able to hook Vagrant into chef, to do this we set a number of chef type options that define where various cook books or roles are that enables the virtual guest to access the files and there fore you can run them locally, try following This

I was using this with Roles, but just defining a list of recipes also works well. Best of all you can now open up 3 terminals, One logged in to the Vagrant box, one in the vagrant boxes directory and one in your chef repo. I found this worked well as I was able to make changes to my recipes in the chef repo, run “vagrant provision” or “vagrant reload” as needed in the other and tail any logs or watch any starts in the vagrant image. All in all it works quite well you have a disposable box if it all goes wrong you just start again and easy way to update / test the configuration before committing it or pushing it anywhere near production.

Gotcha

So overall after set up all is good. Unless for example your role file has something in it that works perfectly locally and not remotely, I’m not talking about recipes not working I’m talking about the role file having slight differences which was annoying, particularly when you’re new to it.

In particular I had

"json_class": "Chef::Role",

in my role file and this worked fine locally and then failed remotely, not sure what was causing this to be an issue but at least it is easy to resolve by just removing it.

According to This it’s needed (as below)

json_class
This should always be set to Chef::Role.
This is used internally by Chef to auto-inflate this type of object. It should be ignored if you are re-building objects outside of Ruby, and its value may change in the future.

But it id cause me probs, but in some ways much less annoying than the other issue I had. I had run chef, but it seemed to fail to update my yum repos, they already existed before hand and it just ignored them, I kept re-provisioning and nothing, was very confused by all of this. So I stopped playing around and went for a vagrant reload, Nothing still, Turns out chef has a stupid setting in its provider for yum repos. In short for file sin yum.repos.d it will not replace it if it exists which for a configuration management tool is pretty poor, every other type of file seems fine but they are “protecting” yum repo files for an unknown reason, I can only assume to stop people nuking their box, but That’s not opscodes’ call.

You can see a bit more detail Here

That was annoying, an annoying chefism, but at least it is possible to easily disable it as mentioned in the link above or simple to just remove the file first

Summary

All in all quite pleased with the local development, I hit some issues when I deployed to an actual box which is to be expected, but other than that it’s all been quite good. Going forward I’m going to carry on and I will also see how I get on trying vagrant with Puppet as well seing as it can do it, so it should help a lot with development. Unfortunately because of the laptop re-build I am yet to reap the rewards of this new efficiency but I can definitely see it helping in the longer term, particularly when I’m without Internet access. I’d recommend everyone at least spends an hour or two having a play with this as it could simplify your life, especially if you are not able to build servers in 5 mins, or if you just want to work “off grid”

Playing with Chef

It was bound to happen somewhen

Over the last few months we have gradually been building up a chef installation alongside our puppet configuration, Totally insane you may think; well it is. In our team I am pretty good with Puppet, Tom is pretty good with Chef so the only way to come up with a good solution is to use both and evaluate, that is where we are. So currently we are using puppet for all of our application based configuration and chef for our Infrastructure, and over the last few months I’ve poked it a couple of times but not too much, well today I’ve been poking it a lot more and doing stuff of use.

So I have to learn how to use Chef else I can’t decided between the two, so far so good. It has some issues but nothing major or nothing that’s bitten me yet. I should probably clarify that we run both our puppet and chef distributed so we can’t make use of the more powerful / useful server side, either way it’s handy.

Things I like so far

One of the things I like about chef instantly is the fact it is just ruby, it sounds silly but it means I can do things like …

node["rssh"]["chroot_files"].each do |link|
  link "#{node["rssh"]["chroot_jail"]}#{link}" do
    to "#{link}"
    link_type :hard
  end
end

Now in puppet, you would have to create a define and call the define with an array of “names” to do the same thing, but the chef code is more readable, especially as I do know some ruby.

In addition to this, inside Cookbooks you have an attributes directory, which s exactly what I try and do within puppet with my params stuff, Here but because it’s a well known structure it’s used more, this does mean that people can write cookbooks in a standard-ish way or at least it seems that way it is also a lot easier to maintain in this way.

Things I don’t like

At the moment i’m not too sure about having to have a recipes directory and then everything in that one place, some times the cookbooks I write may have many recipes and they look messy having so many on one directory. I don’t know if you can put them in folders but it doesn’t look like it, at least in puppet it will recursively go through folders to load the files.

Error messages, Chef’s are seemingly pointless, it may as well say Error in file at line. There is basically almost no interpretation of the actual error, but lots of information as bellow.

[2012-11-14T16:39:05+00:00] INFO: Processing template[/etc/rssh.conf] action create (rssh::default line 14)

================================================================================

Error executing action `create` on resource 'template[/etc/rssh.conf]'

================================================================================


Chef::Mixin::Template::TemplateError
------------------------------------
undefined local variable or method `nodes' for #<Erubis::Context:0x0000000458f9b8 @node=node[localhost]>

Resource Declaration:
---------------------
# In /tmp/vagrant-chef-1/chef-solo-2/cookbooks/rssh/recipes/default.rb

 14: template "/etc/rssh.conf" do
 15:   source "rssh.conf.erb"
 16:   owner  "root"
 17:   group  "root"
 18:   mode   0644
 19: end
 20: 

For those confused it’s line 12 with the actual error, and that is only half the message so it’s easy to miss one line out of the 40 odd.

Now to be fair, puppet’s errors are sometimes stupid and typically not enough information to actually be useful, but they are mostly straight forward, these errors are not as clear, but they at least do have some more value. I guess the real failing here is me not understanding how to interpret the error messages, either way it was a little annoying.

Summary

All in all, I quite like it, i’ve not seen anything in my limited play with it to say it’s going to be impossible to use and it has some nice features. Hopefully over the next few months I’ll sart using it a bit more so I can understand which one is best for us to use longer term, or we keep them both, and develop a nice hybrid solution… hopefully with clearly defined boundaries :)

Now for something a little petty

Text is nice, graphs are pretty

Over the last few weeks I have been writing more and more metric gathering tasks to identify how the systems we use are being used and what is valuable about them. All of this wonderful metric information at the moment is text based and mailed out, but text representation works fine for things that are tangible, such as the cost or number of users, but what if you wan to know how many users are online at 8 am or 10 am or 1 pm? Well this is where something pretty comes into the mix.

As part of the metrics gathering I have been looking at various graph drawing tools and there are quite a few out there, some although technically brilliant are ugly, some or pretty but limited. Over a longer term it will probably make sense to use some javascript library to draw the graphs, but I wanted something now and we had a graphite server which was being used for some more generic stuff but I hadn’t done anything with it.

Get some data in

Graphite is pretty cool, you just send some very basic information to it and it tracks it, it can then take care of the display of the information and certain functions like the average or max of a metric. All of the graphs are drawn on the flu so you can change the time frame, add extra plots and all this good stuff. Initially I was put off by Graphite because it looks messy but I decided that now was a good time to learn.

The first challenge I had was putting some data into it, because it just takes a text string you can update it using net cat if you really wanted, but I decided to g for a pure ruby implementation.

#
# Graphite DAO
#

require 'socket'

class GraphiteDAO

  def initialize (server, port)
    @server = server
    @port = port
  end

  def put_graphite(metric_name, metric_value, date = Time.now.to_i)
    string = metric_name.to_s + &quot; &quot; + metric_value.to_s + &quot; &quot; + date.to_s
    #puts &quot;string = #{string}&quot;
    write(string)
  end


  private
  def write(data)
    socket = TCPSocket.open(@server, @port)
      socket.puts &quot;#{data}&quot;
    socket.close
  end
end

This is very functional and there is no error checking but its enough to get it working, as you can see I just take the metric name i.e. the path in graphite and the value, you can optionally pass in the date if you wish, but it made more sense to just use “now” as the date in most cases.

The metric name is a path in graphie, so the metric name I pass in may be “prod.application.concurrent_users” this allows for the data to be structured logically within graphite and is easier to recall later.

Getting something out

By far the most useful thing I learnt about graphite is changing the line mode to be connected, it turns the tiny line spec that I can’t see into a connected line… so now graphs are visible. In short graphite lets you select any number of metrics to graph onto one graph, and the default view is sort of a graph builder. This typically works fine if you only have 1 metric or you are comparing the same metric from three sources, however if the range between the lowest metric and the highest is too far, you just end up with a straight line for each, this is not a graphite issue but a general issue with graphs.

Now for the really useful stuff, you can use graphite to render specific graphs for you by calling a specific url to render your graphs, so once you know what the path is to the metric you can do all sorts of stuff.

for example…

http://my-graphite-server/render?target=prod.application.concurrent_users&width=800&height=600&lineMode=connected&title=Concurrent%20Users

Because it is just a matter of adding options of multiple targets it is easy to use, and the documentation is okay (and Here)

The other element that is really useful is drawing graphs that don’t exist for example

http://my-graphite-server/render?graphType=pie&target=alpha:111&target=beta:222&target=gamma:333

The rendering engine is good and I like it, so considering this whole thing only took 25 mins to work out it was well worth the effort in doing so, a quick return for not much effort