Getting Out in Front of Your Application Downtime Problem

They say that all good businesses stem from someone trying to solve a real-world problem. Uptime.ly is the living, breathing, incarnation of that concept.

It’s simply a fact of life: cloud applications go down. Whether that application is running within your corporate firewall on your own private servers, or out in the “ethers”, how you react to downtime is critical. It doesn’t matter who your end-users are: employees, suppliers, or customers; communicating the status of unexpected and planned system events has a direct impact on your company’s top and bottom line.

Arrows in the Back

For the last thirteen years, the founders of Uptime.ly have been responsible for running one of the Internet’s original Software-as-a-Service applications. During that time:

the product we sold was built and re-deployed thousands of times (long before slipstream deployments were commonplace and helped reduce downtime).
we had seemingly countless “scheduled maintenance” events where we had to upgrade either our hardware, software, or some other infrastructural component. You would be surprised how many times we had to announce scheduled maintenance because our co-location facility told us that they would be doing the same.
and as much as we hate to admit it, just like every other cloud-based application that has ever been built, we suffered our share of downtime and application performance problems as a result of unforeseen circumstances.

For years our SaaS company tried to figure out the most appropriate way to manage the downtime communication process with our customers. We had our tech team maintain an emergency e-mail contact list of all our customer’s administrators, we posted scheduled maintenance on our website, and we tweeted. The communication was never the same way twice.

As the years went by and our support teams changed hands, the process (or lack thereof) changed as well. The amount of time we provided notice before scheduled maintenance events was never set in stone. The language used within unexpected system outages rarely found the right balance between providing the customers too much or too little information. And to top it off, no matter what we did, nothing seemed to reduce the number of irate customers calling our help desk, even if we gave them weeks of advance notice.

Lessons Learned

One of the best pieces of advice we can offer any burgeoning startup, or any IT manager that has application uptime on their mind: If you don’t have a plan of attack for how you communicate with your application users when something has, or is about to, bring your system down, then you are hurting your company. This happens by creating increased help desk costs, reducing employee productivity, contributing to lost revenue, and significantly diminishing your company’s brand loyalty & reputation. These problems are consistent with both cloud based application providers, and self-hosted applications managed by your IT staff:

Overstressed help desks. When applications become unavailable, the natural response of its users are to reach out and find what’s wrong. Any application with more than a handful of users is going to quickly inundate your help desk team with inbound support requests. The expense of having your help desk respond to each of these requests with (hopefully) the same message over and over again, should not be overlooked.
Lost employee productivity – Frustrated and idle employees are a nightmare and costly. The Aberdeen Group’s estimates an average size company loses $110,000 an hour when an application becomes unavailable. Your goal as someone managing downtime should be to make those times frictionless for your users. This means having a process in place that proactively keeps your users in the know so they can be as efficient as possible. If you sell a SaaS, frustrated customers don’t translate to lost employee productivity, they translate to ex-customers.

Going Forward

If you and your team are committed to improving the way you handle application downtime, there are a number of simple, effective steps that you should take:

Create a culture of communication. It should be hardwired into your team’s DNA: when something goes wrong, before we even start looking at the problem, let the customers know.
Create multiple channels for customers to find out what’s going on. Don’t expect every customer to be sitting at their desk reading e-mails, or browsing their Twitter feed.
Decide on your level of transparency up front. Employees need to feel empowered to communicate with their customers without having to worry about backlash from their manager. Clear guidelines as to what words should and shouldn’t be used are important. Does your company want to communicate in broad stroke terminology, such as “We are currently experiencing a problem”, or do you want to let your customers know exactly what’s going on “Hard drive B in our Meta Cluster is reporting a bad sector”.
Decide on your tone. Is uptime communication to your customers and application users going to be used as an opportunity to build brand and relationships? If so, your uptime messages should take a friendlier tone. Is the success of your user base tightly bound to your application uptime? If so, communicating thorough details in stark black-and-white may be more appropriate.
Keep a record. Stop relying on uptime monitoring services to determine your SLA. We’d be willing to bet that every month your manager is filtering through your uptime reports tweaking the final output used in determining an actual SLA report.

As I mentioned at the beginning, we’ve lived this process for many years and have serviced thousands of customers during every kind of application uptime event that you can ever think of. We’d love to hear about your downtime stories and some of the lessons you’ve learned in managing it. Feel free to contact us and let us know! Maybe we will publish your story.

Jasen Fici
Co-founder, Uptime.ly

Products

Features

Solutions

Resources

Featured

Navigating IT Incidents - The Role Of The Status Page

Combating IT Alert Fatigue

The Causes Of IT Incidents

Getting Out in Front of Your Application Downtime Problem

Arrows in the Back

Lessons Learned

Going Forward

Products

Products

Features

Explore

Resources

Products

Features

Explore