The mystic art of Estimation

Estimating your way out of trouble

In IT you are always put under pressure to say how long it will take to do something, typically it is how long it will take you to do something you have no idea how long it will take you. As you can imagine this can be a little annoying. Luckily experience plays a big part here but there are a few things that can be done to help you out, mainly by not putting the pressure on yourself to deliver in a unrealistic time frame.

Next time you are presented with a “how long would it take for…” question pause and have a think.

  • How much research time do you need?
  • How long would it take you if you were to do it perfectly?
  • How much time would you need to hack something together?
  • What else have you got on?
  • What else could go wrong?

Always trust your gut instinct as well and what ever else happens, add on about a third extra time, things go wrong. Worse case senario you have some room for movement when people start pushing the deadlines down.
This approach means you really have thought about what you are trying todo and you have put in a number of mitigations if they are needed, if they are not needed then you have some slack which can bu used for making it better, delivering early or my favourite, finishing the work early and starting the next piece before you were meant to. Work is a lot more fun when you are working to your own deadline.

Get ahead

So you’ve spent some time estimating your work out, you are comfortable with your time frames, so you just need to meet each little daily milestone to get to your end goal? Wrong! you could do that but then you stand the risk of being behind again, never get behind as it’s a slippery slope!

Especially in the early stages of delivering a project get ahead, put in the extra hours, be a little more pragmatic on the solution if possible try and build up as much spare time as possible. By having the spare time you can then start kicking off bits of the project that are a few days / weeks away so that when you get there you have gained back some time that may have been lost.

This will take the pressure off of the tasks later and it also means you can still do the ad-hoc tasks that comes up as and when is needed. One of the reasons I like being ahead in what I do is that I can then typically choose to spend a bit more time on areas I really want to be better or on trying out some new technology as part of the process. Normally when doing these I work a bit more iteratively than normal so that I can still deliver the basic goal even if the new thing doesn’t work out.
There is also the opposite of giving your self too much slack, I know I said to build some in but you have to manage the delivery of the task so you don’t deliver it all in one big chunk nice and early

Summary

Be realistic with your estimations and be cautious, there may be people in the team that believe they can do the work quicker, better etc etc; let them. We’re focusing on what you can control, if you think it needs a bit more time then so be it, if it comes down to arguing over a bit of time here or there then give away some of the slack you built in, if it keeps getting tighter, change the way it is being done, get two people involved, split some tasks out, agree a smaller scope. You can not keep taking time out of what you are doing, you are being paid to deliver solutions not tasks, this means it needs to be slightly more robust than a matchstick but not as robust as a Zippo.

You also have to be cautious that your time frames are not stupid, if you thin it will take 30 mins allow an hour, but be prepared to discuss in detail what you are doing and why, all the checks that will be done subconsciously as you are doing the work. You do not want to get a reputation for not being able to estimate well, you need some slack not much, you need to manage the delivery of the tasks to appear in keeping with the time frame while using any spare time to get ahead.

Cloud deployment 101 – Part3

The final instalment

Over the last couple of weeks I have posted a Foundation to what the cloud really is and How to make the best use of your cloud. This week is about tying off lose ends, better ways of working, distilling a few myths and setting some things straight.

Infrastructure as code

DevOps is not the silver bullet, but it is a framework that encourages teamwork across departments to have rapid agility on deployment of code to a production environment.

  • Agile development
    • Frequent releases of minor changes
    • Often the changes are simpler as they are broken down into smaller pieces
  • Configuration management
    • This allows a server (or hundreds) to be managed by a single sysadmin and produce reliable results
    • No need to debug 1 faulty server of 100, re-build and move on
  • Close, co-ordinated partnership with engineering
    • Mitigates “over the wall” mentality
    • Encourages a team mentality to solving issues
    • Better utilises the skills of everyone to solve complex issues

Infrastructure as code is the fundamentals of rapid deployment. Why hand build 20 systems when you can create an automated way of doing it. Utilising the api tools provided by cloud providers it is possible to build entire infrastructures automatically and rapidly.

Automation through code is not a new concept, Sysadmins have been doing this for a long time through the use Bash, Perl, Ruby and other such languages, as a result the ability to program and understand complicated object orientated code is becoming more and more important within a sysadmin role, typically this was the domain of the developer and a sysadmin just needed to ”hack” together a few commands. Likewise in this new world, Development teams are being utilised by the sysadmins to fine tune the configuration of the application platforms such as tomcat, or to make specific code changes that benefit the operation of the service.

Through using an agile delivery method frequent changes are possible. At first this can be seen to be crazy, why would you make frequent changes to a stable system? Well for one, when the changes are made they are smaller, so between each iteration there is a less likely total outage. This also means that if an update does have a negative impact it can be very quickly identified and fixed, again minimising the total outage of a system.

In an Ideal world you’d be rolling out every individual feature rather than a bunch of features together, this is a difficult concept for development teams and sysadmins to get use to, especially are they are more use to the on-premise way of doing things.

Automation is not everything

I know I said automation is key, the more we automate the more things become stable. However, as automating everything is not practical and can be very time consuming, it can also lead to large scale disaster.

  • Automation, although handy can make life difficult
    • Solutions become more complex
    • When something fails, it fails in style
  • Understand what should be automated
    • Yes you can automate everything, but ask your self, Should you?
    • Automate boring, repetitive tasks
    • Don’t automate largely complex tasks, simplify the tasks and then automate

We need to make sure we automate the things that need to be automated, deployments, updates, DR
We do not want to spend time automating a solution that is complex, it needs to be simplified first and then automated; the whole point of automation is to free up more time, if you are spending all of your time automating you are no longer saving the time.

Failure is not an option!

Anyone that thinks things won’t fail is being rather naïve, The most important thing to understand about failures is what you will do when there is one.

  • Things will fail
    • Data will be lost
    • A server will crash
    • An update will make it through QA and then into production that reduces functionality
    • A sysadmin will remove data by accident
    • The users will crash the system
  • Plan for failures
    • If we know things will fail we can think about how we should deal with them when they happen.
    • Create alerts for the failure situations you know could happen
    • Ensure that the common ones are well understood on how to fix them
  • You can not plan for everything
    • Accept this, have good processes in place for DR, Backup and partial failures

Following a process makes it quick to resolve an issue, so creating run books and DR plans is a good thing. Having a wash up after a failure to ensure you understand what happened, why and how you can prevent it in the future will ensure the mitigations are put in place to stop it again.
Regularly review operational issues to ensure that the important ones are being dealt with, there’s little point in logging all of the issues if they are not being prioritised appropriately.

DR, Backup and Restoration of service are the most important elements of an operational service, although no one cares about them until there is a failure, get these sorted first.
Deploying new code and making updates are a nice to have. People want new features, but they pay for uptime and availability of the service. This is kinda counter intuitive for DevOps as you want to allow the most rapid of changes to happen, but it still needs control, testing and gatekeeping.

Summary

Concentrate on the things that no one cares about unless there’s a failure. Make sure that your DR and backup plan is good, test it works regularly, ensure your monitoring is relavent and timely. If you have any issues with any of these fix them quick, put the controls in place to ensure they stay up to date.

In regards to automation, just be sensible about what you are trying to do, if it needs automating and is complicated, find a better way.

Do you know where you are going?

Well, Do you?

I wouldn’t say I was odd, but I do like a certain amount of order in my life, I’m totally happy with having no idea what’s happening in the short term which is why I manage my time using my Franklin Covey to manage my time in conjunction to other elements, but I have always got a long term plan for myself. Why? well I want to know where I’m going to be in a few years, I want to get everything I want and the only way of doing that sort of thing is with a plan.

I think it is important for people to have a rough plan of what they want to do and where they want to be, it helps focus your mind somewhat and helps take the personal element out of leaving a job as it never becomes you leaving because you dislike it, it’s more that it doesn’t fit with your long term goals.

Set a target

Okay, so simple starting point, pick an age, pick a job title, pick a life style, pick the sort of person you want to be, with the house you want and all that fluffy surrounding stuff. It is more or less the same process marketing folk use to work out the target audience or demographic, accept in this case it is all about you not other people.

As with any good plan, it starts with a goal, an objective; the sort of person you want to be, with the house you want and the car you want etc etc. This is not enough though and you have to add some element of realism to what you want to do, I am not saying that you can not go from being a butcher to being brain surgeon, but there is training in any element of those tasks which you need to account for which affects your time line so being the world’s best butcher is going to take more than 5 minuets with a knife in the kitchen.

At this point you can now start working backwards from your goal, I’d suggest that your objectives are set 10 years apart at this point and as you get closer to where ou are the time should decrease, There’s good reasons for this; mainly not being able to predict the future.

So if you want to be CEO of a Small-ish company by the time you’re 50 What do you need to have done / be doing by the time you are 40 ? I’d imagine being in a senior management position in a number of large / small companies with a variety of roles. Which probably means by 30 you need to be in a management position of some sort taking responsibility, But you need to get to the senior role in multiple companies by 40!? don’t worry about it, it is a long way away. By the time of getting to here 30 isn’t 2 far away, so we reduce the time frame down to 2 year chunks.

If you need to be a manager by 30 then you need to do X Y and Z by 28, and you need to be demonstrating certain skills and personality traits, acting in a certain ay etc etc

I would say in the short term view you need to set some decent objectives so you know what “there” looks like, but more importantly you can bear in mind what you need to do now to help you get to “there” but also what you need to start doing now to get to your target 10+ years away.

For myself I’m quite interested in business, psychology, planning and project stuff, one of the reasons I’m into a lot of that stuff is it helps me out on my long term goals, I don’t need most of that stuff now, but it doesn’t hurt to be ahead of the curve.

Summary

If you were expecting something more complicated then you were lucky, it really is as simple as making sure you know what you want, write it down, work out high level milestones and just work backwards in more and more detail until you end up with a credible plan.

When you get to a milestone it is important that you are objective in assessing it, did you meet your targets? Did it matter? Did you get most of the way there? Does it matter?

Things move on so even the objectives you set 2 years ago are relatively pointless in some cases, so be objective and don’t be afraid about moving on to “new opportunities” if it helps you achieve your goals and don’t be afraid of missing them either, if you are happy where you are and it doesn’t affect the overal plan stay a while longer.

It’s probably also worth mentioning your short term goals with those that are in a position to help you out, if you don’t make it clear to them what you want to do they can not help you out. Be aware though that some managers will just tell you what you need to hear to keep you around, so be objective, bring it up, give them gentle reminders on any actions every now and then, 3 months later if nothing has changed bring it up again and re-stress the point, 6 months in give up, hand your notice in and move on.

Cloud deployment 101 – Part 2

In last weeks episode…

Last week I covered what the cloud was and what the cloud wasn’t some very basic concepts that people seem to forget about when choosing to go to the cloud. This week my focus is on how to make the best use of the cloud to save the most money and utilise the flexibility that it can provide.

How to make the best use of your cloud

To start with here’s some bullet points, quite a few and after those some explanation around them…

  • Understand the limitations of the environment and mitigate against them all
    • Ensure your application can scale based on performance automatically
    • Build in spare capacity (no more than 40% utilisation)
    • Make it stateless
  • Utilise the flexibility
    • Carry out regular deployments of your environment and test DR plans
    • Implement systems across multiple regions and availability zones
    • Make use of the infrastructure tools for balancing traffic, caching data, storing data
  • Know when to compromise
    • On security
    • On functionality
    • On performance
  • Automation, Autonomy and Automatically
    • Automate the deployment and configuration of systems and applications
      • Puppet, Chef, Red Hat Satellite
      • Capistrano, Mcollective, Fabric
    • Through automation of the system you can automatically scale the environment as performance and DR requires or through monitoring actions deploy new environments and backups
    • Autonomy of day to day tasks is important, the system needs to look after its self using monitoring tools can help this
      • Nagios, Swatch to react based on log events or Monitoring statuses
  • DR
    • Make use of the different regions and automated snapshots
  • KISS
    • Keep it simple
  • De-couple
    • Each specific component of a Service should be de-coupled
    • Where possible even functions within an application should be de-coupled
  • Offload tasks
    • Just because you can do something doesn’t mean you Should…
    • Utilise the cloud providers services where possible

You need to build in spare capacity at each stage within the solution; this is so that in the event that when a host that one of your critical systems is on is under load the others are able to cope. As a result the application needs to be stateless and where possible transactions need to flow through all systems so a weak performing node does not affect the overall performance, ideally the scatter gun approach to load balancing, as your tools and understanding become better you may even start taking poorly performing nodes out of service and re-provisioning them.

Utilise the flexibility and scalability of the cloud infrastructure, Why waste your time trying to work out how to load balance the data when they can do it for you, utilise the scalable storage and make use of the all the redundancy they offer, this simplifies the tasks at hand to hopefully running a few OS’s and your desired application with out the added hassle of clustered DB’s.

You have to know when to compromise and when not to. Within a cloud environment you just don’t have the same control as you do on your own hardware. If you do not learn to compromise you will hit a wall that means your whole environment will need to be redeployed in a different way.

Automation is key, Everything should be automated. The deployment, the upgrades, the maintenance tasks. Automation leads to the path of stability and reproducible results. This is necessary for rapid deployments and to offer a stable service.

Important aspects of cloud solutions should always be simple, if the solution is overly complicated it can make it hard to support by everyone. It is vital that the solution only be as technically challenging as needed,

As many components as possible within a solution need to be isolated, we need to do this for performance, scalability and stability reasons.

We are not experts at everything, Luckily the cloud providers often hire those experts, so for the sake of paying a few dollars for a solution where possible we always need to strive to use the in built solutions and if they are not suitable, let the cloud provider know it needs to be made better, they are often happy to help make improvements, although on their time line.

Automatic recovery and scalability

This is an area I’m really interested in, The ideas and concepts of automation to a level where you don’t look at every problem but instead have a mechanism that will flag re-occurring issues for further investigation and allowing you to fix known issues by executing certain scripts is useful. It’s with this level of tooling that a handful of sysadmins could effectively look after thousands of servers.

  • Utilise auto scaling for
    • Performance
    • DR
  • Active – Active or Active – Passive?
    • Active – Active configuration provides the best reliability and most tested DR scenario
    • Active – Passive configuration is easier to maintain and implement, but does not offer the same rewards in performance
    • None, Why bother when you can re-build the whole thing in an hour?

A lot of people will preach auto-scaling as the be all and end all for Amazon uptime. Auto scaling will deploy an image and fire it up, this has a few downsides. It means every time a change is made to the OS you have to create a new image, update the auto scaling and do (ideally) some tests to make sure it all works. From an operational point of view that is not ideal, You spend so long going through a change process that releases start taking weeks if not longer, even for the most simplest of changes. Of course you could by-pass the auto-scaling and make the changes to the boxes as and when, but do this at your peril, inconsistent service lies down this route and that’s hard to trouble shoot and to explain to clients.. “What do you mean each system isn’t identical…” This type of scaling is known as the Full bake (everything exists in 1 AMI)

So how about the best of both, As you recall we mentioned puppet in an earlier post, Using tools like this you can do what is known as a “From Scratch” or “half baked” solution.

The From scratch solution means taking a stock AMI with nothing on it and using the configuration management tool to configure the OS and the application in one go, the down side of this is it can take longer to build out a solution or get a box up and working as part of the auto-scaling which could mean that by the time the box has provisioned the need for it could have disappeared.

The Half Bake is about compromise, The OS and the application are both on the AMI to a reasonable level. From this point onwards the configuration tool just has to make sure the latest configuration is in place and go from there. This would still require the AMI to be kept up to date but only when a new application is released not necessarily with every configuration change.

Summary

Don’t overreach what you are trying to do, the most important aspect is simplicity, by keeping the solution simple to start with you can do a lot of the funky scalability with a full baked solution. As time progresses and the needs develop you can start implementing all of the other elements that will improve the solution.

There is but one more part to this cloud deployment 101 post spree but that will be at least another week away.

Developers in a sysadmin world

Where to start

As I spend more time engrossing into the world of DevOps there’s been a number of occasions where something has not felt right with the relationship between operations and development. Within the team at work we will be hiring developers to work on the integration back with the development team, improving the build process, re-factoring the code to make it quicker to code / build / deploy All of which is good stuff.

However we fore mostly run a service, that is the main reason for the existence of the group. Without the correct support framework we will not be offering a service but instead offering a really fancy technology exercise so we can say we do X or Y or Z and I worry that is the route we are destined for.

Can Developers do Sysadmin tasks?

Of course they can, why couldn’t they, in the same way a mechanic can paint a car, we are not in the business of stopping people achieve their full potential, so have a go Jo is welcome here. The bigger question is if they should be doing it, in much the same way as I am not the right person to code a large enterprise product, Developers are not the right people to be making decisions about about service restarts or process niceness.

So I believe that a Developer can do the tasks of a Sysadmin, I believe that with enough training they can get to a point where they are not making random changes to a system to fix a specific problem with out understanding the consequences. However, I also believe a good graduate would provide the same level of risk and knowledge to bring to the table, so having an understanding of programming is a plus but Sysadmins aren’t in the business or random changes.

Can a Sysadmin be a Developer?

Sure, Why not, Same role in reverse? Almost. So I can program or have programmed in a number of languages which I tend not to bring up so…

  • Pascal
  • Delphi
  • C/C++
  • Java
  • PHP
  • Javascript
  • Perl
  • Ruby (as of a couple of weeks ago)
  • Bash
  • Awk
  • So I have quite a few languages that within about 10 mins and a few nudges on google I can write something reasonable, I have made a lot of different applications (another list I hear you cry out for!)

  • Maze solvers
  • Text based adventure games
  • Arkanoid
  • Web based route planner, Granted I only drew the maps form 6 million data points with zoom functionality
  • Content management system…
  • Geoblog with google maps and email updates / geo tagged pics
  • Web shops
  • System monitor with averages and weekly summaries
  • bit stream cypher
  • cd to mp3 encoder with CDDB lookups
  • Just 10 things I’ve written, So I would say I know enough about programming, probably more than necessary for my role.
    And yet I still have no interest in being a programmer. So shoe on the oppersite foot, maybe I should be doing some more developer focused work. It’s been a while but it could be just what I’m after.

    Based on that I already am a part time developer much in the same way that a Developer is a part time sysadmin, I mean their programs run on systems right…

    Who should do what?

    Well, Developers code, sysadmins admin.. I don’t think it get’s harder than that. I think it is easy for everyone to agree that the Developer will be best spent writing code and helping out with specific system scripts or puppet manifests / capistrano. It is also very easy for everyone to say that the sysadmin should check the RAM utilisation, RAID configuration, Disk layout etc etc.

    If all of that was correct this blog would end here, however; over the past few months something has been niggling at me and every now and then i’m involved in a conversation with an Developer which is ultimately “It’s not that hard just do X or Y” and it’s this which I have the biggest issue with developers on.

    Let’s take rolling out a new Amazon AMI:

    Developers approach

  • Deploy new server with AWS tools
  • Login
  • Done
  • Sysadmin approach

  • Start Deploy new server with AWS tools
  • Pause, because deploying keys isn’t a good idea or secure if everyone is using the same one…
  • Continue but with a generic “emergency key” configured
  • Check file system layout
  • Realise it’s all on one 6gb volume, fix issue
  • Create individual users
  • and so on…

    Different skill sets, Any monkey in a suit can click on a few buttons in a web UI, Knowing that splitting out the linux file system to different partitions, or at least understanding the impact of that is important.

    Summary

    I think the two skill sets can work harmoniously, but there is still a boundary, caused by experiences and expertise and for DevOps it’s about using each other strengths and avoiding the weaknesses. There’s been times when i’ve been doing OOP PHP or working with inheritance or writing a very complicated script where having someone that I can say is this or that better than that or this? so having someone around would be good I imagine it works both ways, especially when it comes to configuring the system.

    Developers tend to be very focused from my experience and because sysadmins are more generalists they are looked down on, I hope as DevOps becomes more common place and the realisation of harmony comes about. Lets see if in 6 months there’s another post on about how disastrous or successful this integrated approach becomes, it is new ground and it will be interesting to find out what happens if nothing else.

    Cloud deployment 101 – Part1

    The myth behind cloud

    Cloud is probably bigger than it has been in the last 5 years, its got to the point of maturity and a lot of people are finally starting to adopt and embrace what it means; but have they really thought about what cloud is? There are so many places out there preaching how good “Cloud” is and how much money it saves, but depending on your use case will depend on if it is right for you, do not just use cloud because it is there if you do you fall into the well established Bad sysadmin! category.

    So what is the cloud?

    What the cloud is

    • Flexible deployment
      • Time based
      • CPU usage based
      • Auto scaling
    • Utility based pricing
      • Cost per CPU hour based on performance of server
    • API based interaction
      • API’s to deploy / manage and maintain environments
    • Simple
      • To set up accounts
      • Understand the costs

    Flexible deployments deploy your systems where ever you want US, Europe, Asia, spin them up and down at different times of the day, scale your clusters based on CPU or memory usage
    utility based pricing Costs are cheap, really cheap even the largest boxes are less than $5 per hour (normally cheaper than $2) so if you need a lot of CPU for a short period of time this is good for you.
    API based interactions to be a true cloud service it has to offer you the API control over what the environment does, you should be able to write applications that turn things on or off, or scale them up or down at certain times, If you can’t do that it’s not really a cloud, just hosting hanging on the coat tails of the buzz word.
    Simple It is so easy for anyone to sign up and build the systems it gives the marketing teams the opportunity to fire up demo systems without the need for IT to get in the way

    What the cloud is not!

    • Cheap
      • Costs can escalate quickly
      • Time can be wasted working around the limitations of the cloud
    • Predictable
      • Cloud environments are complicated, as a result your systems may change
    • Practical
      • In situations where you would normally connect to a console or attach a USB drive you can not
    • Secure
      • Yes you can secure them however…
        • You have no control over who has access to the physical hardware
        • You have no access to secure the Virtual Host
        • You can not stop Joe Random hammering his service and causing an effective DoS on your system

    Cheap Everyone says how cheap the cloud is, but did anyone ever look at the costs? Cloud solutions are expensive unless your usage is sporadic and specific. On a continuous usage basis Cloud environments are more expensive than traditional hosting.
    Predictable Kernel updates, faulty hardware, software updates are all out of your control, one day a system may need a reboot because of a kernel upgrade, another the DB could be down for an automated update the next your hardware may die and a reboot is needed again, If your system is meant to always be on, this can be annoying. This can all be mitigated but in “traditional hosting” things are on your schedule not someone else’s.
    Practical You have less access and control and as such you lose out on flexibility and practical usage, for example, No console to log in to to fix a networking issue or recover the system.
    Secure The lack of physical access control, and usage of the systems and patch schedules means you have little chance to control access to the fundamental layers your secure system may be built onto, there’s no point building a castle on a pile of sand (other than sand castles…)

    Summary

    You have to work out what you want to use the cloud for, If you are planning on shutting down tyour data centre and moving everything to the cloud, think carefully. If you can turn off all your servers at 6pm ever night until 6am every morning and off totally at the weekends you could save money, but the reality is you can’t turn off all off your servers, so look at the costs carefully.

    I’d highly recommend people reading why mixpanel moved to the cloud and why a year later they moved off on a side note Mixpanel is pretty good tool…

    In next weeks post I will cover how to make the best use of your cloud.

    One month on, Where does the time go

    Where does the time go?

    I started this blog on the 11th of February 2012 with the modest real expectations that I may get a handful of views and not much more than that. You can imagine how surprised I was when I got 27 views on the first day! That was on keeping with my old personal blog, although, in fairness I use to link through that site for images and things so the number views was possibly shifted thanks to some carefully placed links on forums.

    Well For those that don’t know I Also pre-write all of my updates normally weeks in advance so even this one was optimistically written a couple of weeks ago. For those of you that thought I was pondering away and writing them just before they were posted, shame on you… say’s a lot about your time management skills…

    Which leads nicely on to…

    Make some more time, go on, give it a go…

    It’s impossible to have more time than is actually available, and as such you must not waste a precious moment of it! I do several things to help me organise my time and I use them all differently for different things, sometimes the same thing if it’s really important!

    Here’s a list of the different tools i use:

  • Franklin Covey organiser
  • Phone Calendar
  • Shared Calendar
  • That’s kinda everything I need, I haven’t yet ventured into the world of electronic time management systems, mainly because what I have works.

    I use the Franklin for tracking the “Day to Day” tasks that I need to do, typically these are very much work related and are simply reminders that I need to do X or Y. Occasionally personal errands enter this realm, it’s important to realise that if you have personal tasks to do you make sure they get done, the sooner they are the sooner you can get back to doing your day job. Certainly when I have a task that has to be done during work times, like calling solicitors or doctors I make a note and just get them done and out of the way. I also only ever really set 3 priorities for tasks, A, B or C, A tasks need doing as soon as possible, C tasks need doing some when.

    I will always carry over tasks to the next day if they don’t get done but if something is an A priority it’s because it needs doing that day, I very rarely carry over A tasks. So that, in a nut shell is how I deal with Day to Day tasks, but that really only serves as a way for me to track what I am doing, the real benefit of a system like this is scheduling. If a task cannot be done until a certain date I will schedule it on that date. this keeps the day to day list to a minimum and means I don’t even have to think about the future stuff.

    Both of the Calendars I use mainly to schedule appointments or to block out specific times of the day where I am meant to be doing a task. Occasionally I use it as a way of reminding me to do a task, especially if it is time bound and not date bound. Also recurring events, I do like setting reminders for months in the future to do X or Y.

    By combining this I spend no time worrying about what it is I am meant to be doing or when, all I have to do is remember to write it down in the appropriate place on the appropriate day, no more no less.

    All of the above gives you a better way to track what you are meant to be doing and when, as a result you free up a surprising amount of time by not worrying about it all. However, to get real time back you have to do one thing and one thing alone.

    Don’t be afraid to bin a task, If you notice the pressure for a task dwindling and your boss stops asking about it, bin it. You do need to be conscious that you can’t bin every task but you need to identify the tasks that are not going anywhere and just stop them.

    Plan daily

    Every morning when I get into work I do the same things.

    1, Set up laptop (don’t login)
    2, Grab a coffee
    3, Login to my laptop and grab my Franklin
    4, Skim my emails for anything that needs doing, if it’s a short 5 min thing I typically do it, else I write it down in the franklin
    5, Check my calendar incase a meeting or something fell through the net (I hate it when that happens)
    6, Validate the tasks on my Franklin for the day, add any new tasks
    7, Crack on.

    It typically takes me 30 mins in the morning to go through this routine and I occasionally shake up the order if I have an early morning change to do.
    Almost every evening I try to get ahead of the next morning by moving the tasks from that day to another date where achieving it is possible.

    But all of that is not enough, If there’s a larger task or a set of tasks, I estimate how long they will take in my head and then working backwards from the delivery date schedule the tasks as needed, even including time to think about the tasks in detail.

    I find that all of that helps me keep ahead of what I am meant to be doing and as such keep tasks on time.

    A bold new Ruby world

    It’s something different

    For a long time now I’ve been put off by Ruby, my interactions have been limited and most of my understanding of Ruby comes from Puppet. I’ve found it a bit of a pain, but the truth is that has nothing to do with Ruby as a language, it was more the packaging of gems and so forth. I really like the idea of yum repos and packaging systems, but Ruby uses gems, I still have no idea what they really are other than libraries to be used by your Ruby program, either way sometimes because of the quirks of yum repos and the lack of maintenance you aren’t always able to get the right version of the rubygems that you need for the application you are running. This alone was enough for me to avoid looking at Ruby as a go to programming language of choice.

    In the past I’ve traditionally done my scripting in Bash, if things got difficult in Bash or it wasn’t quite suitable I’d fall back to Perl or PHP (PHP is definitely my go-to language) but with that said I can count the number of scripts I’ve had to write in Perl on my hands, and where possible I always go with Bash. Why? Well Why not? It’s easier for most sysadmins without programming backgrounds to follow as in most cases you are using system commands combined within a framework of programming.

    Which leads onto an interesting side note, Why are there sysadmins that can’t program! I guess it happens, and to be honest I was once described as having “no natural programming ability” by one of the college tutors, so i’m not saying I’m good. I do think that every sysadmin needs a fundamental understanding of conditionals, operators, looping and scoping… Again not saying I’m brilliant but I’ve had to learn and I also force myself to learn by writing scripts for things. A sysadmin who can’t write a script is as good as a lithium coated paper umbrella in a thunderstorm.

    Moving along

    So what was my first venture into Ruby? A simple monitoring script for Solr indexing. I thought about doing it in Bash, then quickly changed my mind, in short I was dealing with JSON and needed a slightly better way of dealing with the output and an easy way to deal with getting the JSON.

    This is something I’ve done in the past but within PHP, so I thought it’d be a good comparison. I can honestly say I was rather surprised at how easy it was to get it working, I managed to Google for some code that got the JSON data and understand its use really easily, it wasn’t all obfuscated like some Perl stuff can be.

    From what I can see thus far it is quite a reasonable language, its got some useful features and some flexibility but rather than being like Perl with hundreds of ways to do the same thing it has a small selection of ways to do each thing, so you can choose an appropriate style or just one that suits your coding style.

    I’m tempted to start writing something a little more complicated to see how it is with that, I have no doubts it’ll be okay, but until I try I will not know.

    So what did my first adventure into Ruby look like:

    #!/usr/bin/ruby
    require 'rubygems'
    require 'json'
    require 'net/http'
    
    def get_metric(query, base_url)
    	url = base_url + "?" + query
    	resp = Net::HTTP.get_response(URI.parse(url))
    	data = resp.body
       
    	# we convert the returned JSON data to native Ruby
    	# data structure - a hash
    	result = JSON.parse(data)
    
    	# if the hash has 'Error' as a key, we raise an error
    	if result.has_key? 'Error'
    		raise "web service error"
    	end
    	return result
    end
    
    #
    #	Get Arguments or default
    #
    url=nil
    index=nil
    query="action=SUMMARY&wt=json"
    
    if ARGV.length == 0
    	print "You must specify one of the following options\n\n-u\thttp://example.com/path\tREQUIRED\n\n-q\taction=REPORT&wt=xml\n\n-i\tindexname\tREQUIRED\n"
    	exit
    else
    	for count in 0..ARGV.length
    		case ARGV[count]
    		when "-i"
    			if ARGV[count+1] != nil
    				index=ARGV[count+1]
    				count += 1
    			else
    				print "No argument for option -i\n"
    				exit
    			end
    		when "-q"
    			if ARGV[count+1] != nil
    				query=ARGV[count+1]
    				count += 1
    			end
    		when "-u"
    			if ARGV[count+1] != nil
    				url=ARGV[count+1]
    				count += 1
    			else
    				print "No argument for option -u\n"
    				exit
    			end
    		end
    	end
    	if (url == nil || index == nil)
    		print "You must specify a URL with -u <url> and -i index\n"
    		exit
    	end
    end
    
    rs = get_metric(query, url)
    lag=nil
    begin
    	lag = rs["Summary"][index]["Lag"]
    rescue
    	print "Invalid Index\n"
    end
    regex = Regexp.new(/\d*/)
    lag_number = regex.match(lag)
    print lag_number, "\n" if lag_number !=nil
    

    Something like that. I Know its not brill, but it’s a starting point, the next thing I write is probably going to make this look rather small by comparison.

    Anywho, that’s all on Ruby, wonder if it’ll catch on.

    Understanding Risk

    The short version

    Stuff happens, move on.

    The long version

    Risk management is a really interesting topic, I know there will be lots of people out there falling asleep at just the thought or risk management, well to you I say Hah! If you find risk management dull you’ve probably never had the fun of thinking through 101 different ways in which something could fail, and that requires a great use of imagination!

    When considering risk there is a tendency from a sysadmin point of view to get stuck in the technical detail, i.e. if Node X dies we lose service Y; which is fine, that is a valid risk, but moving past this is kinda vital, predominately as most technical risks can be avoided with change processes or redundancy and high availability. After the technical risks you end up in environmental risks, “what if…” risks, for example “What if a power failure occurs” Great, these are environmental and you’ve chosen a provider that has UPS’s, Wonderful, Do they have generators? Diesel stored on site? In multiple containers? with a deliver schedule with multiple suppliers in the event of an emergency? Divergent power sources?

    Okay, nothing to panic about here, these are just common sense issues, regardless of all of the mitigations that are in place you could just run 2 sites, 30 miles apart. So what if you are using the same provider for your 2 sites, what about the financial collapse of your hosting provider?

    Okay so being totally paranoid, You have 2 providers each 30 miles apart, each with UPS, redundant generators, divergent power sources, SLA’s with fule providers, free air cooled data centre with backup air conditioners. Great, Good job…. Wrong! Where’s the backups? Are they both in the same Country? same Planet?

    I guess the laboured point is you can’t mitigate everything, even if you think you can, you can’t.

    So what do you do?

    Kick back and relax, the problems will solve themselves! Not quite, but not far from the truth either, you have to be pragmatic, you have to consider what level of risk is affordable and justifiable. Remember that mitigating risk often costs money, and it is very easy for Senior management bods to pull you over hot coals when something fails and they will ask “How did this happen?”, It’s probably worth noting at this point you do not want to reply with “We didn’t have a suitable DR plan” That’s not going to wash.

    Luckily for you, you just have to come up with all the risks you can and a number of solutions that mitigate against varying numbers of risks, let someone else make the call about what is an acceptable amount of risk and what can we live with.

    It may also help to plot your risk management strategy against your year long or three year long strategy or against growth of the solution so there are known points at which a certain amount of resilience is needed.

    For example, You launch a new website, you don’t know if it will be popular or not, you don’t know if it will be profitable. So for this solution, what is wrong with just ensuring you have a decent backup, even if it is to local disk and not “offsite” that’s better than nothing NB I would highly recommend you at least make a regular local copy, or better yet store the website in SVN as well and back that up…

    This solution has a cheap and reasonable risk management policy, it may occasionally go down for an unknown period of time, Worse case scenario you have to apologise to all the users, promise to make it better and actually make it better (always do what you say you are going to do…)

    As time goes on you can always add in additional sites and better backups. Always go for the solution that gives you the best bang for your buck. i.e. If you need off site backups, why not run two sites in high availability and do local backups in each, more throughput and better resilience.

    Summary

    You can not mitigate everything, so don’t try, look at what really is important, make sure you can recover. Have a plan that if customers hit number X or the solution profitability reaches y% you’ll add in the additional risk mitigation.