20 Jun 2013

Continuity and Cloud – Episode 1: Defining Disaster

The reality of current use cases for Information Technology is that your users are expecting no downtime and continuous operations, and for you to provide that for lower costs than you did last year. But without an unlimited budget and resources, how do you really protect your Information Technology assets against disaster in a cost effective manner? The first step is asking your business owners to realistically evaluate how they use their information technology and what would be most important to them in the event of a catastrophic failure. It’s very important to be realistic about this as it relates to budget. In most Disaster Recovery solutions this comes down to the following metrics:
  1. Recovery Point Objective (RPO)
    1. How old will my data be when it is recovered?
    2. How much work will I have to re-do as a result of the loss of our computer systems?
  2. Recovery Time Objective (RTO)
    1. How long will it take you to provide access to the systems from the time they failed?
    2. When will our staff be able to work again?
It is very important to remember that there are costs associated with reducing both the amount of data you lose, and the time it takes to recover it. These costs are generally not linear, they increase exponentially as you approach a zero data loss, instant recovery position. Frequently, controlling these costs means operating above the infrastructure layer and relying on application recovery. So before diving into three ways to improve your RPO and RTO from base tape, here are some things you should consider as you plan:
  1. Recovering data is not a disaster recovery plan Make sure you consider your applications and services and how you will provide them back to your users. Get the business units to consider how they will deal with a disaster. This will focus them on what they need and when.
  2. Realistic Recovery Point and Time Objectives Have a frank and realistic discussion with business units and owners about how rapidly they need data restored and operational.
  3. There is no “golden hammer” Not every server and system in your production environment needs to be recovered in 5 minutes. Consider ways in which you can tier your applications to recover very important systems first, and less important services later.
  4. Some things are not worth the cost of recovery Consider which servers you can afford to not recover at all: Secondary Domain Controllers, template CITRIX or Terminal Servers etc. Often these can be rebuilt if required after the main production services have been brought back on line.
  5. Look at your current production systems Evaluate where you are now. Can you current production systems provide you the capacity to implement strenuous replication or application recovery processes?
In an age where consumer services are always up and running ask yourself why you aren’t taking advantage of the Cloud to provide continuity for your systems.