StatusCast Top Picks: 10 More Awesome Customer IT Status Pages

Navigating IT Incidents - The Role Of The Status Page Read More
Combating IT Alert Fatigue Read More
The Causes Of IT Incidents Read More
Why Love A Status Page: IT Transparency & Trust Read More
Becoming the Office IT Hero: Put An End To "Are You Down?" Chaos Read More
The Domino Effect Of IT Outages On Business Operations Read More
Reducing The Impact of IT Incidents Read More
A New Approach To Incident Management Read More
Announcing the StatusCast Mobile App: A Game-Changer for Status Page Users Read More
Choosing The Right StatusCast Pricing Plan Read More
1 2 3 11

Transparency goes hand-in-hand with good governance 

IT services are a critical backbone to the operations and functioning of most businesses and organizations, hence the need for awesome status pages.

As more and more IT departments have embraced the need for good governance, this has driven greater transparency and accountability by utilizing the power of the best status pages.

From the perspective of IT service management, this has manifested itself as much greater openness when communicating about IT service availability.

Status Pages are a key element of this. For many stakeholders, they are the single most important touchpoint with the IT function.

A couple of years ago we showcased how some of our customers were using StatusCast to communicate with their internal and external service users. Since then we have continued to  evolve and refine our platform, enabling further enhancements. 

Today’s Status Pages provide an even richer source of information and the page builder tools available within the platform enable even greater control over how you display and present your information. 

Coupled to this, StatusCast is able to wrap various status page examples with the brand of your company or organization, enabling you to match your status page with the rest of your family of communication assets.

10 more awesome Status Pages from the current crop!

Let’s get into it and go through the ten awesome Status Pages that will provide you with much better control over how you present and display your information. 

#1 – SUPPORTING TRANSPARENCY

https://alliedsolutions.statuscast.com

Allied Solutions, LLC is a financial services business, providing value-added support to 4,000 B2B clients. A range of solutions include insurance, consumer lending, employee benefits and technology to improve the efficiency of lending operations.

What’s great about this page?

One of the key characteristics of the Allied Solutions status page is that it provides a detailed history of incidents.

Clicking the ‘View Uptime History Report’ shows  the uptime history and using the drop down date range contro you can see any period of week, month or year in the previous 2 years .

Putting such granular levels of detailed reporting right up front helps to demonstrate Allied Solutions attaches a great deal of importance to providing a high level of transparency.

#2 – SUPPORTING YOUR BRAND

https://ns1status.com

NS1 is an internet services business that uses network technologies to connect the world’s applications with audiences. This supports faster, more reliable, and more secure application performance across the internet.

What’s great about this page?

NS1 has used our advanced support for web design to place a gradient across the status legend top menu, showing how StatusCast really enables customers to tune their status pages to fit with the branding of their existing online media assets.

This NS1 status uses the menu item on the right to allow stakeholders to view scheduled maintenance events enabling stakeholders to see upcoming planned infrastructure works that may impact availability, and adjust their workflows and schedules appropriately.

#3 – AVAILABILITY PER APPLICATION AT A GLANCE 

https://swtc.status.page

Southwest Wisconsin Technical College is a technical college in Fennimore, Wisconsin. The college's district includes the area covered by 30 school districts, and enrolls around 1,500 undergraduate students.

What’s great about this page?

SouthWest Tech makes great use of StatusCast’s ability to connect and display data from a variety of sources. This status page lists all the services that are hosted by SouthWest Tech as well as the external third party service providers  that supply services to the college.

To break it up, making it more aesthetically pleasing, and therefore easier to read and communicate to users, the services are organized under categories. Clicking on a service that has or is currently experiencing disruption gives a highly detailed view of the incident. Hovering over a calendar incident provides a pop up giving instant information.

#4 – PREVENTING UNAUTHORIZED ACCESS

https://centralstatus.sophos.com

Sophos Group plc is a British based security software and hardware company. Sophos develops products for communication endpoint, encryption, network security, email security, mobile security and unified threat management. Sophos is primarily focused on providing security software to 1- to 5,000-seat organizations.

What’s great about this page?

Sophos is a respected brand in the IT security field and is naturally very conscious, especially with respect to EU GDPR. In keeping with the ethos of Sophos as an organization, it has implemented two-factor authentication (2FA) which requires subscribers to present two pieces of information (such as a key fob and password) in order to authenticate to access the status page.

Once again, Sophos has opted for transparency by presenting information for its full range of internet security services that partners and customers need to know the status of, such as email canning.

#5 – INTEGRATING STATUS PAGES WITH YOUR WEB PRESENCE

https://navblue.status.page

NAVBLUE, an amalgamation of Navtech, Airbus LUCEM and Airbus ProSky, is Airbus’ flight operations software subsidiary. NAVBLUE provides products which include software for flight planning, aircraft performance, flight data analysis, aeronautical charts, crew planning, electronic flight bag and navigational data.

What’s great about this page?

NAVBLUE provides an easy to view 60 day history lensed by viewing through Service Status, Events and Calendar. Single-click expand and collapse provides fast access to information.

NAVBLUE links out to external resources and provides links to news on the main NAVBLUE website as well as the support portal, making the status page seem very much an integrated part of its web presence.

#6 – OVERCOMING LANGUAGE BARRIERS

https://sherweb.status.page

Sherweb is an IT services and consulting company. Combining reseller and solution selling, Sherweb status enables its customers to obtain tailored cloud  solutions by providing services and software that integrate multiple third-party cloud technologies from different vendors.

What’s great about this page?

Sherweb caters to a global audience and appreciates that not everyone has English as a first language. Sherweb’s status page incorporates Google Translate, allowing language translation on the fly.

Sherweb has also chosen to fully embrace social tools by providing RSS integration and social links to Twitter, LinkedIn, Facebook, Instagram and YouTube.

To integrate the status page with other elements of support, the page links to the  knowledgebase that provides comprehensive support for third-party services.

#7 – SUPPORTING ‘FIVE NINES’ UPTIME

https://8x8status.status.page/

8x8 is a leading business communications platform providing business phone, video meeting and team chat services. It offers contact center functionality, elevating above many other solutions on the market. 8x8 status guarantees ‘five nines’ - 99.999% availability to its customers.

What’s great about this page?

Quite a lot going on with the 8x8 status page…! Firstly 8x8 implements Google Translate to meet the need to communicate with a global audience that speaks different languages. 

Secondly, It uses a scheme of region tabs to simplify the presentation of information for stakeholders operating in different parts of the world. This approach is a simple structure that provides a very clean interface which lets stakeholders know immediately what is going on.

Thirdly, the page links out to other resources to fetch information from third parties and supports refined design with customized CSS.

#8 – MAKING IT VISUAL

https://cofcstatus.statuscast.com

The College of Charleston is a public liberal arts college in Charleston, South Carolina. Founded in 1770 and chartered in 1785, it is the oldest college in South Carolina, the 13th oldest institution of higher learning in the United States, and the oldest municipal college in the country. The college received top marks in the latest edition of the U.S. News Best Colleges 2021 rankings.

What’s great about this page?

To communicate with a diverse group of stakeholders the College of Charleston uses a simple scheme to present information and make it easy to use.

Notably, all the service components have their brand icon with their written name to provide at a glance recognition.

Rather than laying out different information panels vertically, which creates longer pages, the College of Charleston elected to go for a two column scheme which lays out services and calendar of events sie by side.

#9 – THIRD-PARTY AND CUSTOM CONTENT SUPPORT

https://trust.veeva.com

Veeva Systems Inc. is an American cloud-computing company. Headquartered in Pleasanton, California, it was founded in 2007 by Peter Gassner and Matt Wallach. It is a leading global provider of cloud-based software solutions for regulated industries such as consumer goods, chemical, cosmetics, and life sciences.

What’s great about this page?

Veeva made great use of StausCast’s support for third-party external content by adding a link to an external status page. 

This status page also provides exceptional detail of incident history, accessed via the calendar of events.

Veeva also added custom content related to their own third party processors for the purposes of supporting GDPR compliance.

#10 – SUPPORTING INNOVATION

https://trust.controlm.com/#!/

BMC is an innovator, specializing in advancing organizations through automation and data, something it terms the Autonomous Digital Enterprise. This helps organizations to be as efficient and joined up as possible while maximizing the value of their digital assets, software, hardware and data.

What’s great about this page?

BMC innovated by turning the view upside down! Instead of listing services at the top, it summarizes and then lets status page visitors drill down into each service.

This status page is also noteworthy because it is heavily customized with custom icons and branding, showing how StatusCast ensures your status pages remain integrated with your web media assets.

Transform your company’s status pages with StatusCast

If you like what you see in these best status page examples then perhaps you’re ready to transform your company’s own status pages? Simply book a demo to take a deeper dive with one of our product specialists, or you can sign up for a free 14 day trial and discover how it works for yourself.

You’ve heard it so many times: Transparent communication is the key to any successful relationship. The banking industry learned this lesson when cyber attacks began to plague their customers, and the official line for many financial institutions was to deny there was a problem. That is until the hacks became so profound and so persistent that it became impossible to cover them up any longer. Eventually, it became obvious that when breaches occurred, clear communication that addressed what was happening and what was being done to mitigate it was the best practice to preserve trust with the customer. Why then, when it comes to IT disruptions, do so many organizations fail to heed this example?

SLA Building Blocks: The Basics Of A Service Level Agreement

The ability to achieve positive outcomes after IT disruptions (yes, they can improve the relationship between the client and provider!) will hinge on the most important piece of insurance your organization has, the Service Level Agreement, or SLA. Service Level Agreements are contracts that outline how a service is delivered to a client, made popular decades ago in the telecommunications industry. With the rise of cloud-based providers—and the fact that over 90% of businesses use the cloud—the importance of SLAs has multiplied. They are so important because when there are disruptions or failures, a realistic and clear SLA can be the difference for a service provider between keeping a client and losing them; and for the client, it provides the foundation for a trusting relationship regarding a crucial function for their business. A Service Level Agreement is a binding arrangement — usually initiated by the provider — that sets the expectations, timetables, and priorities for services or applications being provided by an IT company to its client, defines acceptable parameters for continuous and efficient provision of these same services or applications, and further provides for some form of SLA Reporting. This agreement requires the provider to measure and meet minimum uptime thresholds and other requirements on a periodic (usually monthly) basis in exchange for a fee. SLA Reporting documents uptime statistics, issues that have been addressed, and other information pertinent to the provision of services or applications, usually online in the form of a dashboard.  When constructing an SLA for your client, here are some basic components you should include:

Why 9 Is Your Lucky Number

The more 9’s the better… this refers to the “Table of Nines”, a calculus used to determine the accepted measure of your reliability as an IT provider… “uptime”.Typically, uptime (also known as availability) is measured by “Nines”. This measurement is the total expected uptime within a given period of time, calculated out to a specific percentage. You will see in this table that the more “nines” you have, the better. However, maintaining the top level of uptime efficiency is challenging, so setting reasonable expectations and response mechanisms in your SLA is crucial to maintaining something equally as valuable as uptime, and that is trust. Try this calculator to see what your “9X% Factor” is in projected downtime: https://uptime.ly

Putting A Stake Down When The Ground Is Moving

Remember, for many businesses, moving to the Cloud seems to make sense but there are many concerns. Aside from the standard legal considerations such as overall liability, third-party indemnification, and data confidentiality, businesses are very worried about uptime availability and your responsibilities – and their remedies – if your service or application goes down. Understand that your clients may depend on you to facilitate mission-critical processes. If your service or application is unavailable, then you are damaging their business. Since the perception of many businesses is that when you move to the Cloud the risk of an application outage increases, you should then begin to see that your client’s concerns are justifiable.

With an appreciation for your client’s application availability concerns now in mind, you can choose to either ignore them and haggle over contractual details during your SLA negotiation – or you can proactively address their concerns. Ignoring them isn’t a good idea; demonstrating that you understand your client’s concerns and addressing them before your client even needs to ask is the client-focused approach to SLA negotiation. This will get your relationship with your client off on the right foot.

Effective Communication Methods

Ensure you have a communications platform in place to effectively alert your clients of service performance issues. This means more than just posting individually to Twitter or Tumblr. If your client’s employees are forced to take time out of their day to call your support staff and wait in a support queue, then they are wasting corporate dollars and they will become frustrated with your organization. Minimally, you should provide application uptime status by having an end-user facing, self-serve status page with the ability to subscribe to SMS, Twitter and e-mail alerts.

Well-Defined Service Levels

At the most basic level, a decent SLA should define how long it will take for you as the provider to give an initial report of a suspected issue. Sometimes you won’t proactively know that there is a problem unless your end-user submits a ticket. Once this initial response is received, or an issue is reported, you should continue to report how much time has elapsed since the last action took place and ultimately until it is closed. Critical issues should obviously get faster response times and have escalation procedures in place, whereas minor service impairments allow more leeway for less urgent resolution times.

Uptime Percentages

There are many different ways you can calculate your actual uptime statistics. Be as clear as possible about what service level agreement metrics you use.

Describe how you factor in emergency vs. regular outages, scheduled maintenance, and what application features are critical to your formula. Depending on the service you provide, your minimum uptime guarantee should be relevant to your average client’s need. We recommend that you try to guarantee 99.9% uptime or greater. However, if your Internet hosting provider can’t guarantee more than 99.9%, then you have a problem since you can only be as good as your slowest link. Don’t provide a metric you’ll never meet; this will only result in mismatched expectations and will fuel a lack of trust.

Eliminate Inaccurate Calculations

To gain an advantage, make your potential clients aware of the inconsistencies existing in your competitor’s agreements. For example, if you count scheduled maintenance and emergency service after a certain amount of time passes — and your competitor doesn’t — their uptime stats will look better.Don’t play that game of service level calculation. You need a balanced approach…if you promise the world but don’t make your metrics, your client will probably leave you; on the other hand, if you don’t maintain a reasonable uptime, you’ll never earn their business. Make sure your client can easily compare apples to apples. In other words, your 99.9% uptime might be higher than a competitor’s 99.98% if they are classifying all emergency downtime as a non-included uptime metric.

Enforcement

Live by your promise. You may need to provide higher uptimes to more elite clients. Don’t be afraid to charge premiums for this – it is just like insurance. For example, perhaps you perform most of your scheduled maintenance on non-business hours (such as late Sunday) but you have a customer who absolutely needs 24 hours per day without interruption. In today’s world, it’s possible to roll these customers over to temporary virtual environments that will not be affected. If you have any outages that require billing credits, make sure your SLA outlines that distinctly (for example, in red). Even though theoretically each customer can have different uptime metrics (based on their usage and a particular feature set and/or server locations), it should not be the duty of the client to ascertain the billing. As the provider, you will want to address the possibility (however remote you think it is) that you cannot meet some element(s) of the SLA requirements. You might include a clause that spells out the customer’s right to terminate the contract or ask for a refund for losses incurred from a failure of service. Work with your client and be generous if you mess up. It might just save your renewal the next time it comes due.

Getting Started: An SLA Blueprint And Beyond

Now that we have thoroughly addressed the mechanics of SLAs, it’s time to return to that point about communication we made at the beginning of this article. An SLA is not a cure-all…it is not going to protect you or your customers from frustration and disappointment if it does not proactively address concerns about how application downtime is communicated to end-users. Does this then beg the question as to what is a viable and uncomplicated option for communicating this critical information to customers? An application status page provides the communications platform your customers may not even know to ask for during the SLA negotiation phase of the sales process. Application downtime will certainly come up in these negotiations—and given the long timeframe SaaS companies operate on, that is a question of “when” rather than “if” application downtime happens. What may not come up in these conversations however is how application downtime is communicated.

So why should you plan to include an application status page in the service level agreement template you put in front of new customers?

Even though the customer may not think to ask for it, providing a self-service communications tool like an application status page that customers can use to directly access information about the current status of your application, be reminded of the application’s otherwise excellent track record, and elect to automatically receive SMS, Twitter and/or email alerts at the end user-level provides an alternate version of the application downtime story. By including an application status page in your standard service level agreement template, you are letting customers know that not only will your application be up and running > 99% of the time, but also for that unfortunate less than 1% of the time they will not have to wonder what’s going on—the inconvenience of application downtime will not be compounded by a time-consuming, confusing communications process.

Sample SLA

Request a sample SLA document.

Developing a customer-focused SLA that includes the elements mentioned above will set you apart from the competition as a proactive provider, and will help you to establish a more trusting, lasting relationship with your clients.

Remember, your customers are becoming smarter every day, and many of them are learning to never accept the standard SLA, they want their service level agreement best practices.

They know that in most cases the standard SLA is vendor-focused. They know that there is always something better—if they fight. Why make them fight?

Sources: CIO.com, Upwork.com

Black Friday/Cyber Monday weekend at the end of last month saw big-name brands such as Neiman Marcus and Target suffer total site outages. Even some online retailers that didn’t crash experienced performance issues that cost them sales from those customers who would not wait for the slow checkout process to resolve (for instance Walmart).

Non-retailers have a lesson to learn here too. The same disruptions to a seamless user experience impact software and hardware adoption as they do to online shopping.

Convenience is a key factor to converting new customers and developing the relationship necessary to retain their continued business, as is transparency and responsiveness about uptime and performance issues.

Newegg (whose site also went down that weekend) was responsive to customer inquiries on Twitter- for which their customers were grateful. Neiman Marcus was similarly responsive to tweets from concerned customers, but the questions just kept coming in. One customer nailed the approach the designer apparel store should have taken.

Reducing Frustration with a Status Page

End users (consumers in this case) who care to receive site or application uptime status updates should have been able to subscribe to those updates via their preferred communication channel (e.g. twitter, SMS/text, email, Slack, etc.). These notification capabilities are standard to any status page tool (a status page is no longer just a “is the site/app down?” webpage).

This proactive communication frees your users to pay attention to other things while they wait for the issue to resolve, rather than frustratedly reloading over and over, hoping your site/application comes back up- and getting increasingly irritated as it continues to fail to do so.

This also frees your own staff to fix the downtime or slowness issue, rather than focusing on communicating about it back and forth which each frustrated end user.

Not Just Application Uptime Status, but Slowness

Earlier this year, Aberdeen Group prominently re-iterated the sentiment “slow is the new downtime.” Even if your site or app isn’t down, if it’s experiencing slowness, you need to be accountable and transparent about that via your status page.

For a sense of what constituted slow for online retailers on Black Friday/Cyber Monday weekend, check out this thorough series of reports from Dynatrace.

While ideally you won’t experience disruptions to site or application uptime or even to performance more generally, when it does occur you need to be prepared to handle it professionally and efficiently. See how a status page can help you do that here.

Bringing cloud-based software to healthcare hasn’t been an entirely smooth process, despite the federal government’s massive investment in EHR and EMR technology. In a recent interview with Bob Wachter, a physician and professor of medicine at the University of California, Dr. Wachter cited a “short-term hump period”, red tape, localization, extensive user testing (or rather a lack thereof), and a complex diversity of user roles as contributing factors to the notable lag between the advancement and adoption of new software in the field of medicine vs other industries.

Added to these issues is the challenge of application downtime, which can be particularly troublesome when tools designed to “seamlessly integrate” with your EHRS go down or even just experience minor performance problems.

Measuring Uptime Status for Hospital Management Systems

Using a status page can keep your team informed in two ways about what software and device integrations are experiencing normal uptime and which require attention from IT or from your medical technology vendor. First, having a centralized status page to track the uptime status of the various devices and integrations your hospital relies upon makes it as easy as one quick glance to confirm that everything is operating normally. Second, a status page can broadcast changes in uptime status to subscribed staff, via their preferred communication method.

Broadcasted changes in uptime status can be organized by software component or hardware device, so that only the relevant staff receive the notification. The status page can also be programmed to send notifications immediately, on a delay, or pending manual approval – so minor issues won’t trigger false alarms and create alert burnout.

Your Status Page as a Communications Bridge and Log

By connecting the systems already built in to monitor your software and hardware with a status page tool like StatusCast, you are bridging the gap between tech-savvy staff and non-tech-savvy staff. StatusCast broadcasts changes in uptime status to administrative staff and others, who have a stake in knowing what software and devices are up and running and which are experiencing problems- but are not the ones who actually need to know the technical details of the problem, as they will not be the ones troubleshooting the fix to the problem.

Once the issue is repaired and uptime status is restored, subscribed staff will also receive a notification informing them that everything is back to 100%, tightening the communication loop further (and providing documentation of the frequency, duration and severity of incidents, should a more serious conversation with a vendor be necessary).

Lean DevOps is garnering increasing attention from enterprise organizations because of its promise of rapid innovation without compromising quality. While there are many elements to lean DevOps, I’d like to focus on reducing waste.

Waste could take the form of mismanagement of resources (time, money, staff, tools, etc.) or of inconsistent or inefficient processes. John Rakowski at AppDynamics succinctly articulates this in the context of the APM world with his example: “multiple overlapping monitoring tools in a typical siloed enterprise mean physical waste (licenses etc), an inconsistency in the way they are used, and ultimately absurdity as alerts are not representative of the business.”

But there is an element to the waste issue he doesn’t consider, one that absolutely represents a mismanagement of resources and likely an inefficient process as well—when things go wrong, how is downtime communication handled?

Is it the DevOps team who is responsible for communicating the incident to customers – when the team’s attention is needed most on resolving the problem? How are the executive team, customer support team and other teams informed when there’s a disruption and when full functionality is restored?

Using a Status Page to Efficiently Communicate Downtime

A DevOps team that strives for the lean ideal needs a status page to support them by handling communication during time-sensitive incidents. StatusCast enables end users to subscribe to updates via their preferred communication channel (e.g. twitter, email, SMS/text message, etc.) so that they receive alerts when your application is experiencing performance issues, and updates when service is partially or wholly restored.

The language of these alerts can be crafted ahead of time, likely in collaboration with Marketing – to keep the message in terms non-tech-savvy users can understand.

The alerts sent by your status page can also be set to one of three protocols: automatic and instant, automatic but delayed, or pending manual confirmation – as not every bump in the road is cause for a customer communication. Similarly, alerts can be sent out that are tailored to specific components (maybe only east coast servers are experiencing issues, or maybe only users of a certain product are having trouble accessing the application). In this manner, you can ensure the right users are getting the right message at the right time, without placing any additional demands on your DevOps team.

By keeping your DevOps team focused on troubleshooting issues and anticipating and avoiding future problems, you are actually facilitating a second element of lean DevOps: continuous improvement. As Jez Humble, of Chef, noted in a presentation last month: “DevOps is not a goal, but a never-ending process of continual improvement.”

You can learn more about how a status page can help you become a lean DevOps organization here.

Content marketing is a marketing strategy that operates from the premise that less sales-y, more inherently valuable content will be more effective at capturing attention and building trust with customers and prospects.

hosted status page represents a real-time report of application uptime (often capturing both present/immediate status and uptime percentage overall). It can also contain information about upcoming scheduled maintenance and any other information you’d like to communicate to your end users.

It’s easy to see how the transparency provided by a hosted status page can help build trust, but what value could a customer or prospect find in a hosted status page that would make it good SaaS content marketing material?

Your Hosted Status Page as Content

Because your hosted status page can keep track of application uptime issues by component (e.g. geographic regions or products, depending on how your servers are structured), it provides a natural framework by which to target specific populations within your customer base. While you won’t have something new to say to each audience every week, once per month or two it can be valuable to share historical uptime status as a chart (especially in comparison to previous time periods or industry averages), upgrades included in recent scheduled maintenance, and any comments users themselves may have offered that are worth sharing more broadly with your community of current and prospective end users (if you use a tool like Disqus that allows for such feedback).

Communicating via Hosted Status Page (Status Page as Delivery)

Arguably, the hardest part of content marketing is getting your message out there. It is definitely not an “if you built it, they will come” scenario – you need to put in a lot of work distributing your content through channels your target audience is likely to discover. A hosted status page will not increase your reach in this manner, but it can increase your engagement.

End users can subscribe for updates and alerts via their preferred communication channel (e.g. email, text, twitter), meaning that they are more likely to notice and consume your content. The challenge here is that these alerts should only relate to performance or accessibility issues (including updates or upgrades to your application), or you are using the channel disingenuously and are likely to negatively impact your relationship with your end users.

By keeping the message appropriate to the channel however, you can ensure your end users stay excited about the continuing improvements to your application’s performance and that the foundation of transparency and communication you’ve built continues to strengthen your business relationship with them.

Going Beyond SaaS Content Marketing

Tomasz Tunguz has outlined 9 Marketing Disciplines of Great SaaS Companies—content is only one of these nine. The others include Evangelism (leverage enthusiastic users), Customer Lifecycle (upsell/cross-sell to unlock the other 50%+ of revenue potential from existing customers), and Communications (brand strategy, brand narrative and public relations).

A hosted status page helps develop both the brand narrative (transparency, reliability, partnership) and the customer relationship necessary to capitalize on these key aspects of effective SaaS marketing.

You can get a free trial of the StatusCast hosted status page tool here.

Last month, APMdigest did a series on 18 Ways to Ensure Application Performance Before Rollout, featuring advice from well-respected members of the tech industry on when and how to check server uptime (among other aspects of application performance) prior to live deployment. While much of it is intuitive, sometimes it can be valuable to have an external, credible source to reinforce what we already know we should be doing (for instance, communicating application status to end users).

Methods of Testing and Logging to Check Server Uptime

Common themes among the advice brought together across APMdigest’s series included using Network Emulators to simulate real-world conditions, testing latency and realistic load (or cloud testing and capacity planning), establishing baselines with performance monitoring, analyzing application logs, and testing as early in the development cycle as possible.

Network Emulators allow you to test on a duplicate of the production environment, so there’s no surprises when the application actually goes live. Another commentator offered this additional piece of advice: not only test in the same environment but test using the same profiling tool you will use in the actual production environment, and make sure it can “find meaningful correlations at scale” as “A top methods list will only show you where your time went processing your synthetic load.” (Joe Rustad, Manager, Software Development & Architecture, Dell Software)

Cloud testing and capacity planning is most effective when it is designed to simulate user transactions and includes stress testing, which not only tells you the limits of what your system can support but also what happens at the limits. One commentator particularly noted that “Organizations adopting SaaS apps like Office 365 or Google Apps often don’t realize that their internet connectivity isn’t up to the increased traffic.” (Patrick Carey, VP Product Management and Marketing, Exoprise)

Establishing baselines prior to production is helpful for being able to demonstrate what “normal” performance is to stakeholders on other teams and for having a point of reference grounded in the structured data that supports your infrastructure dependencies. This can also make it easy to detect performance issues (on both the application and the network side) at the development or staging level rather than the production level (which amounts to a cheaper resolution for a number of reasons). Arguably the most important aspect of establishing baselines is checking them regularly against real time performance in production.

Creating and analyzing application logs is what facilitates early testing, by informing what pre-production issues there are and providing a tight feedback loop to guide your (and Development’s, depending on how your teams are structured) troubleshooting efforts.

Check Server Uptime to Strengthen Application Performance

Regardless of how you check server uptime (simulated environments, latency and load testing, baselines, and/or application logs), the application performance issues you’re catching now are more expensive headaches you’re avoiding later.
The most important thing a status page can do is reach your end user customers through their preferred means of communication, in a timely manner. Instantaneous is usually the speed they’re expecting! A status page that fails to communicate your software application’s status via text message, email, or whatever the end user’s preferred mode of communication may be fails to fulfill its purpose.

Your Status Page: Internal vs. External Audience

There is a second audience to consider communicating application downtime to as well – internal stakeholders. These are your support, executive, and any other customer-facing teams. These are the folks who are going to create a second crisis, a communications crisis, if they aren’t equipped with the right messaging, and who are likely to slow down “as quickly as possible” by distracting IT with rapid, endless requests for a status update. How can you communicate most effectively with both this audience and your end users?

Internal communications are most easily handled through a corporate portal or email. These communications should include instructions for how to talk about the progress on the application issue with customers/end users. “Broadcasting” updates to either of these communication mediums can be easily accomplished through your status page tool.

Using your status page software to communicate with customers is just as straightforward. Create premade messages that will go out via SMS text, email, Google calendar, etc. or that are installed on a page on your own website via a widget. You can even set preferences as to whether you’d prefer real-time communication or you’d prefer to have someone review and edit the communication before it goes out.

A Few Basic Considerations When Setting Up a Status Page

Use your status page tool to create a few premade messages, covering anticipated maintenance periods, unexpected outages, disruptions in typical performance and informational messages that would be appropriate to share with end users who have opted into notifications from your status page.

You’ll want to make sure to have the marketing team supply whatever art you’d like to use to ensure the page is consistent with the rest of your company’s brand. Though it’s a page about technical updates, the status page is meant to be customer-friendly.

You might also distinguish service components in a way that addresses specific geographic regions, application layers or business components, so each end user better understands what the disruption means for them.
A status page increases efficiency for your IT team and creates more and better opportunities to smooth over a potentially negative customer experience with your customers. You can read more about the cost of downtime here.

As many of you may already know by reading my blogs, I am not only the co-founder of StatusCast, but also a co-founder of another successful Software as a Service company.  For years we built and maintained our own set of server racks in a hosted environment.  Due to the nature of our solution, it is critical that the uptime availability for this other company is stellar.  In fact, it has maintained 99.98% or greater for over 10 years.

Uptime Availability: Crucial to Business

My other company decided not only to rely on one uptime monitoring company, but two, since uptime availability is so crucial to our business. We currently use Pingdom and BinaryCanary. These system dashboard tools are great because they alert our IT and DevOps to impending doom before it can affect our customers. However, once we receive an alert, we then need to figure out if there is an issue with our servers, applications, databases, various hardware components (load balances, firewalls, etc.) or if it is actually a problem downstream.

One of our primary data centers is located on the East Coast. The hosting provider we use doesn’t have an easily accessible application status page. They have some type of Support Management Console, but it hardly acts as a business intelligence dashboard, rarely communicating what we need for our business, and there is no easy way for us to get the data we need fast enough.

There have been several times over the years where the staff of my other company was first to alert this hosting provider that a problem existed, and even then getting proper system updates from them on the reported issue was more than brutal. Downstream issues can’t always be caught with as much advance notice as your classic application monitoring services so it makes good sense to ensure that your hosting provider feeds you uptime information as well.

Uptime Transparency is Vital to Our Brand

Uptime transparency is vital to our brand and the trusting relationship we have with our customers, as it should be for all SaaS companies. The problem is that the hosting provider we use doesn’t seem to want to help us out, so we’ve been rapidly migrating our customer base over to Microsoft Azure. This is no small feat either because major portions of our application needed to be rewritten, none-the-less we feel a hosting company worth its weight needs a system status dashboard. Check out how Microsoft displays their uptime: http://www.windowsazure.com/en-us/support/service-dashboard/

StatusCast is in the process of integrating the status feeds of several top hosting providers. If your choosing a hosting provider, and they do not provide this data, you should demand it or go elsewhere. For anyone considering StatusCast to assist them in creating their customer facing system status page, let me know as we’ll be more than happy to build out an integration from your hosting provider’s system dashboard (or even provide the dashboard for your hosting provider – as we can key off our own dashboards as well). Remember, if your hosting company doesn’t have an accessible system status application page, they aren’t only failing you, but the customers (or end-users) that put their trust in your company.

They say that all good businesses stem from someone trying to solve a real-world problem. Uptime.ly is the living, breathing, incarnation of that concept.

It’s simply a fact of life: cloud applications go down. Whether that application is running within your corporate firewall on your own private servers, or out in the “ethers”, how you react to downtime is critical. It doesn’t matter who your end-users are: employees, suppliers, or customers; communicating the status of unexpected and planned system events has a direct impact on your company’s top and bottom line.

Arrows in the Back

For the last thirteen years, the founders of Uptime.ly have been responsible for running one of the Internet’s original Software-as-a-Service applications. During that time:
  • the product we sold was built and re-deployed thousands of times (long before slipstream deployments were commonplace and helped reduce downtime).

  • we had seemingly countless “scheduled maintenance” events where we had to upgrade either our hardware, software, or some other infrastructural component.  You would be surprised how many times we had to announce scheduled maintenance because our co-location facility told us that they would be doing the same.

  • and as much as we hate to admit it, just like every other cloud-based application that has ever been built, we suffered our share of downtime and application performance problems as a result of unforeseen circumstances.

For years our SaaS company tried to figure out the most appropriate way to manage the downtime communication process with our customers. We had our tech team maintain an emergency e-mail contact list of all our customer’s administrators, we posted scheduled maintenance on our website, and we tweeted. The communication was never the same way twice.

As the years went by and our support teams changed hands, the process (or lack thereof) changed as well. The amount of time we provided notice before scheduled maintenance events was never set in stone. The language used within unexpected system outages rarely found the right balance between providing the customers too much or too little information. And to top it off, no matter what we did, nothing seemed to reduce the number of irate customers calling our help desk, even if we gave them weeks of advance notice.

Lessons Learned

One of the best pieces of advice we can offer any burgeoning startup, or any IT manager that has application uptime on their mind: If you don’t have a plan of attack for how you communicate with your application users when something has, or is about to, bring your system down, then you are hurting your company. This happens by creating increased help desk costs, reducing employee productivity, contributing to lost revenue, and significantly diminishing your company’s brand loyalty & reputation. These problems are consistent with both cloud based application providers, and self-hosted applications managed by your IT staff:
  • Overstressed help desks.  When applications become unavailable, the natural response of its users are to reach out and find what’s wrong.  Any application with more than a handful of users is going to quickly inundate your help desk team with inbound support requests.  The expense of having your help desk respond to each of these requests with (hopefully) the same message over and over again, should not be overlooked.

  • Lost employee productivity – Frustrated and idle employees are a nightmare and costly.  The Aberdeen Group’s estimates an average size company loses $110,000 an hour when an application becomes unavailable.  Your goal as someone managing downtime should be to make those times frictionless for your users.  This means having a process in place that proactively keeps your users in the know so they can be as efficient as possible.   If you sell a SaaS, frustrated customers don’t translate to lost employee productivity, they translate to ex-customers.

Going Forward

If you and your team are committed to improving the way you handle application downtime, there are a number of simple, effective steps that you should take:

  • Create a culture of communication.  It should be hardwired into your team’s DNA: when something goes wrong, before we even start looking at the problem, let the customers know.

  • Create multiple channels for customers to find out what’s going on.  Don’t expect every customer to be sitting at their desk reading e-mails, or browsing their Twitter feed.
  • Decide on your level of transparency up front.  Employees need to feel empowered to communicate with their customers without having to worry about backlash from their manager.  Clear guidelines as to what words should and shouldn’t be used are important.  Does your company want to communicate in broad stroke terminology, such as “We are currently experiencing a problem”, or do you want to let your customers know exactly what’s going on “Hard drive B in our Meta Cluster is reporting a bad sector”.
  • Decide on your tone.   Is uptime communication to your customers and application users going to be used as an opportunity to build brand and relationships?  If so, your uptime messages should take a friendlier tone.  Is the success of your user base tightly bound to your application uptime?  If so, communicating thorough details in stark black-and-white may be more appropriate.
  • Keep a record.  Stop relying on uptime monitoring services to determine your SLA.  We’d be willing to bet that every month your manager is filtering through your uptime reports tweaking the final output used in determining an actual SLA report.
As I mentioned at the beginning, we’ve lived this process for many years and have serviced thousands of customers during every kind of application uptime event that you can ever think of. We’d love to hear about your downtime stories and some of the lessons you’ve learned in managing it. Feel free to contact us and let us know! Maybe we will publish your story.

Jasen Fici
Co-founder, Uptime.ly

© Copyright StatusCast 2022 |Terms & Conditions |Privacy Policy

apartmentdatabaselockgraduation-hatcamera-videouserspie-chartdownloadcodelist linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram