by Tony Campbell, ChiefIT Correspondent
Why does it take a catastrophic event like the Amazon Web Services (AWS) Sydney outage in June to remind executives that they need to invest in business continuity planning?
The extended downtime – a result of power failures following severe storms in Sydney – has prompted some of AWS’ biggest local customers to reconsider their architectures to guard against any similar problems in the future.
A simple crisis planning exercise incorporated into the requirements elicitation phase of their cloud transformation program would have unearthed this as a plausible event worth planning for. So what went wrong?
In 1999 when I was designing systems for the UK government, 99.999 percent was a real availability requirement for data centre services, and these were customer-owned data centres supported on premise.
SLAs would be breached if we didn’t provide this level of service (equating to 5.26 minutes per year), so our solutions, while expensive, were engineered to cater for every eventuality. We’d architect resilience at all levels of the solution, from the physical infrastructure through to applications and data layers, as well model the business continuity requirements of people and business processes to assure operations would continue even through the most catastrophic of continuity events.
This was an expensive approach and today’s IT executives have different imperatives. The promise of cloud computing has been the salvation of many a CIO. IT managers can now employ less staff to achieve the same outcomes, while pushing service providers to the limits of productivity on reduced headcount and giving developers the keys to the kingdom.
Cloud computing has also encouraged new service paradigms for rapid, continual deployment of system improvements, pushing responsibilities for managing non-functional service aspects into the developer arena. Since the cloud takes care of the infrastructure, innovation now happens at the application and data layer, and no-one cares about the commodity of tin and operating systems, storage or networks.
But who’s kidding who? You still need your systems to perform under all conditions. You still need them to be secure and to facilitate your business strategy. You still need to properly design these systems to meet your needs, factoring all of the those -ility words into your architectures and requirements gathering.
In 2013, Forbes reported that Amazon.com loses US$66,240 per minute if its US website goes down. If this is the kind of revenue your company will lose during an outage, it might be worth planning for a crisis.
Look at AWS’s service commitment:
“AWS will use commercially reasonable efforts to make Amazon EC2 and Amazon EBS each available with a Monthly Uptime Percentage (defined below) of at least 99.95%”
It also says it will pay higher service credits, up to 30 percent, should the service level fall below 99.0 percent in any given billing cycle. From this you can see that AWS is effectively acknowledging there is a chance, albeit small, that the service could be unavailable for 7.20 hours in a month.
When architecting your systems, AWS provides a variety of options to help you design resilience, fault-tolerant architectures. I’ve not met a single cloud salesperson that doesn’t agree that cloud platforms are just as susceptible to outages as any other platform, therefore, we can’t blame the cloud vendors for this shift of mindsets away from good engineering.
So, if it’s not the cloud service providers that are fogging the views of the business, why have we ditched good sense and good planning in our race to the adopt this new service paradigm? Is it the developers?
No. These guys just do what they are asked to do, and if you give them the directive to spin up AWS instances and Azure services to meet their development needs, without enforcing security standards, architecture good practices or any kind of governance, then that’s what they will do.
Who’s left? The c-suite? Maybe. CIOs need to start pushing back to the business – where there is a directive to cut costs, this does not mean cut out the practices of architecture and design for the sake of expediency. Leadership is what is missing and this is what needs addressing.
Without strong IT leadership, companies will be in for a rough ride that no cloud, BYOD, mobile or big data buzz-term will fix. It’s time to stop following the crowd and start delivering substance over soundbites.