The overwhelming priority for any business is to have systems that work.
New features, platforms and applications can drive competitive advantage, but they need to be robust and reliable.
When they aren’t, you can virtually guarantee that they’ll interrupt your business at some point – something for which there’s never a good moment.
So what can you do to make your systems reliable? One question some would ask is are you thinking about reliability in the right way?
Business resilience is about maintaining continuous operations in the event of a challenge to business operations.
Rather than just responding to an outage or interruption, business resilience seeks to enable normal operations during one.
It’s a philosophy that’s most popular at the big end of town, but an increasing number of SMEs are paying attention.
It moves the emphasis away from ‘defensive’ or ‘after event’ measures such as backup (which nevertheless remains pivotal) and towards proactive measure that respond by reacting to failures in near or near-real time.
The idea is that it’s best to have systems that can deal with unexpected failures and maintain operations, rather than ones for which failures require intervention in order to restore them to working order.
Cost vs reward
Business resilience can, however, be costly, so it’s important to assess the risks you face and the costs that will be incurred.
Generally, resilience is best applied where the cost of an interrupt will be highest. It involves measures such as redundancy (something virtualisation has made easier), which, depending on the applications involved, sees multiple systems running in parallel, often across multiple physical locations.
Staying up to date with patches and regular updates will also improve the reliability of your systems.
Even if your systems are functioning well, be sure to have rigorous processes to ensure that updates – especially those addressing system security – are carried out.
In addition to their security updates, these will save you from trouble down the track by ensuring that your systems are more supportable.
Proactive monitoring is one of the most powerful tools in the reliability arsenal.
From storage to memory consumption, watching for early warning signs and acting before your systems encounter trouble is a must.
Of course should a failure happen, how you react can determine the extent of the damage.
In this instance, make sure that your IT team or provider follows best practice procedures, such as those defined in the ITIL standard.
ITIL, for example, emphasises that the true problem isn’t a technology failure, but a business outage. Thus, the first step in solving any failure is getting the business back online, rather than a pain-staking exploration of the root cause of the problem.
Make it someone else’s problem
Both resilience and reliability can often be a question of resources. And, in the age of the cloud, one shortcut is to opt for cloud-based services with strong Service Level Agreements (SLAs).
Managed services can also offer more reliable replacements for those managed internally.
The upside (and potential downside) is that someone else will be responsible for ensuring that your systems are in continuous working order. This means choosing a provider carefully. However, if you find the right one it’s likely that, dollar for dollar, their resources and experience will translate into levels of reliability and robustness higher than what your business can achieve in house.
(This blog post was first published on the SmartCompanywebsite on July 26 2012)