Configuration management alone is not the answer

Everything in one place

Normally when businesses start out building s product, especially those that don’t have the pre-existing knowledge of configuration management, tend to just throw the config on the server and then forget what it is. This is all fine, it’s a way of life and progression and sometime just bashing it out could prove very valuable indeed, but typically this becomes a nightmare to manage. Very quickly when there is then 100 servers all manually built it’s a pain in the arse so then everyone jumps into configuration management.

This is sort of phase 1, everything has become too complicated to manage, no one knows what settings are on what boxes and more time is spent working out if box 1 is the same as box 2. This leads to the need to have some consistency which leads to configuration management, the sensible approach is to move an application at a time into configuration management fully, not just the configuration files.

During this phase of execution it is critical to be pedantic and get as much as possible into configuration management, if you only do certain components there will always be the question of does X affect Y which isn’t in configuration management? and quite frankly, every time you have that conversation a sysadmin dies due to embarrassment.

Reduce & Reuse

After getting to Phase 1, probably in a hack and slash way, the same problems that caused the need for Phase 1 happen. 100 servers in configuration management lots of environments with variables set in them, and servers, and in the manifests themselves and the question starts to be come well is that variable overriding that one, why is there settings for var X in 5 places, which one wins? Granted in configuration management systems there are hierarchies that determine what takes precedence but that requires someone to always look through multiple definitions. On top of having the variables set in multiple locations, it is probably becoming clear that more variables are needed, more logic is needed, what was once a sensible default is now crazy.

This is where phase 2 comes in, aim to move 80%+ of each configuration into variables, have chunks of configuration turned on or off through key variables being set and set sensible defaults inside a module/cookbook. This is half of phase 2, the second half and probably the more important side is to reduce the definitions of the systems down to as few as possible. Back in the day, we use to have a server manifest, an environment manifest and a role manifest each of these set different variables in different places, how do you make sure that your 5 web servers in prod have the same config as the 5 in staging? that’s 14 manifests! why not have 1? just define a role and set the variables appropriately, this can then contain the sensible defaults for that role, all other variables would need to be externalised in something like hiera, or you would need to push them into Facter / ohai.

By taking this approach to minimising the definitions of what a server should be and reducing it down to one you are able to reuse the same configuration so all of your roleX servers are now identical except what ever variables are set in your external data store which can now easily be diff’d.

build, don’t configure

By this point, phase 1 & 2 are done, all is well with the world but still there’s some oddities Box X has a patch level y and box A has a patch level z, or there’s some left over hack to solve a prod issue which causes a problem on one of the servers. Well treat your servers as configurable and throw-away-able, There’s many technologies to help with this be it cloud based with Amazon and OpenStack or maybe VMWare, even physical servers with cobbler. This is Phase 3, build everything from scratch every time, at this point the consistency of the environment is pretty good leaving only the data in each environment to contend with.

Summary

Try and treat configuration management as something more than just config files on servers and be persistent about making everything as simple as possible while trying to get everything into it. If you’re only going to manage the files you might as well use tar’s and if that sounds crazy it’s the same level as phase 1 which is why you have to get everything in and I realise it can seem a massive task but start with the application stack you’re running and then cherry pick the modules/cookbooks that already exist for the main OS components like ntp, ssh etc

Cut down, deliver early and often

Deliver all the things

There’s idealistic people in the world and that’s fine (thanks for reading by the way :) ), and there’s pragmatic people, I want to go through how you can provide solutions that give a pragmatic approach to delivering value and doing it in a way that gets it done on time but still helps you get to your idealistic goals.

Often when sitting down and planning with senior management bods a feature list as long as your arm comes out, the reality is even if this is thought to be the bear minimum list, in reality it probably isn’t and is instead a bloated minimum, there’s always room to cut out features, so agree some prioritisation on the features and operate a bucket approach, one in one out.

After the features are agreed you can now start about delivering them, cutting where necessary.

Deadlines for Deadlines sake

When it comes to deadlines people take them a bit like marmite, you either love it or you hate it. Some people feel that having a deadline is a sure fire way of creating a bad product as corners are cut, others think that if you have a deadline you can at least work towards something with an end in sight rather than running off into the wild forever and ever.

To deliver the features needed it’s better to have a deadline, and one that stretches you and forces you to make some cuts, it’s not about not delivering or delivering badly it’s just about delivering what is needed, worse case scenario you have to deliver everything, but this way at least you do it in stages.

Certainly with a deadline you can help focus people on delivering what is important, I think sometimes people get caught up in trying to deliver everything perfectly for the deadline rather than delivering the value they already have. Some of my colleagues and mysef are currently working on a monitoring and metrics platform that integrates fully with nagios style checks but also allows you to write them in the web browser and test them on a server of your choice before distributing. The idea being that you can take the monitoring up to a real time level while reporting back business level reporting and everything in between so you have one place to go to to find out why something isn’t working and how well it has been doing, how many people have signed up; it’s a devops dashboard really.

Anyway, for a couple of months we have been identifying the core technologies and implementing various key functionality to the product but with at no point was any of it “working” some bits sort of worked but not quite, some bits just weren’t there. There was no real end date to this project as it’s something that will keep involving until it works and is useful however we need something to work towards and after a couple of months of sorting out the technology a deadline was set to do a demo and within a week we had the product up and working with the pages we needed with the correct functionality and everything working fine. Writing a nagios check on the fly, pushing it to the server distributing it to all the others and then reporting all in less than 30 seconds, wonderful.

I’m not saying what we did in that week was “production ready” but if our livelihood depended on it, it was good enough, and thats what being agile and lean is about. What is the least amount of work I can do to get me to the minimum product I need in the least amount of effort. The key is to obviously not get stuck delivering bear minimum all the time, with every sprint you need to improve upon what was there as well as add the new stuff; I think it is necessary to always fix something up when adding new features to get the product better and it certainly works for us.

Alternatively, of course, we could have not had a deadline and kept drifting aimlessly into the distance ensuring that the technology was “just right” all the time but the reality is we have to deliver something somewhere.

Iterate

Anyone that is familiar with Agile, Scrum, Extreme programming etc knows it’s better to deliver in small bite size pieces than in large chunks, you can provide value back to the business quicker and you focus on doing the task rather than doing it well. Not all tasks can be done by cutting a few corners but there’s normally a quick way, a good way and the right way fo doing it, so choose one and go for it, if it is a bit of angular that pulls down a list of plugins, go the quick way, if it’s a graphing engine that needs to draw lots of graphs and is used everywhere do it the right way; you’re sensible people, find a balance.

I’ve been talking all about software development which is where most of these methodologies come from, but they can be applied to systems administration as well, I think the same goes for sysadmins as it does programmers, they tend to get stuck in doing the best solution rather than the solution the business needs. Just to dispel any hopes and dreams, maybe save some time by realising that the business cares it works and is stable not how elegant or easy to maintain it is. So when coming up with a load balancing solution, maybe version 1 is haproxy with basic config and version 2 is a bit more in depth, version 3 is F5 & haproxy, version 4 is F5, haproxy and caching…. By all means have the hopes and dreams of the gold solution, but deliver the bronze one okay. If people really use the system and it provides more value iteratively make it better, maybe a bronze + a bit of silver, litle chunks, often.

Summary

Don’t get stuck in the end goal, think about what does the client really need or the business really need, bear minimum; deliver that, measure usage, iterate and improve.