IT services are a critical backbone to the operations and functioning of most every business and organization.
As more and more IT departments have embraced the need for good governance, this has driven greater transparency.
From the perspective of IT service management, this has manifested itself as much greater openness when communicating about IT service availability.
Status Pages are a key element of this. For many stakeholders, they are the single most important touchpoint with the IT function.
A couple of years ago we showcased how some of our customers were using StatusCast to communicate with their internal and external service users. Since then we have continued to evolve and refine our platform, enabling further enhancements.
Today’s Status Pages provide an even richer source of information and the page builder tools available within the platform enable even greater control over how you display and present your information.
Coupled to this, StatusCast is able to wrap your status pages with the brand of your company or organization, enabling you to match your status page with the rest of your family of communication assets.
https://alliedsolutions.statuscast.com
Allied Solutions, LLC is a financial services business, providing value-added support to 4,000 B2B clients. A range of solutions include insurance, consumer lending, employee benefits and technology to improve the efficiency of lending operations.
What’s great about this page?
One of the key characteristics of the Allied Solutions status page is that it provides a detailed history of incidents.
Clicking the ‘View Uptime History Report’ shows the uptime history and using the drop down date range contro you can see any period of week, month or year in the previous 2 years .
Putting such granular levels of detailed reporting right up front helps to demonstrate Allied Solutions attaches a great deal of importance to providing a high level of transparency.
NS1 is an internet services business that uses network technologies to connect the world’s applications with audiences. This supports faster, more reliable, and more secure application performance across the internet.
What’s great about this page?
NS1 has used our advanced support for web design to place a gradient across the status legend top menu, showing how StatusCast really enables customers to tune their status pages to fit with the branding of their existing online media assets.
NS1 uses the menu item on the right to allow stakeholders to view scheduled maintenance events enabling stakeholders to see upcoming planned infrastructure works that may impact availability, and adjust their workflows and schedules appropriately.
Southwest Wisconsin Technical College is a technical college in Fennimore, Wisconsin. The college's district includes the area covered by 30 school districts, and enrolls around 1,500 undergraduate students.
What’s great about this page?
SouthWest Tech makes great use of StatusCast’s ability to connect and display data from a variety of sources. This status page lists all the services that are hosted by SouthWest Tech as well as the external third party service providers that supply services to the college.
To break it up, making it more aesthetically pleasing, and therefore easier to read and communicate to users, the services are organized under categories. Clicking on a service that has or is currently experiencing disruption gives a highly detailed view of the incident. Hovering over a calendar incident provides a pop up giving instant information.
https://centralstatus.sophos.com
Sophos Group plc is a British based security software and hardware company. Sophos develops products for communication endpoint, encryption, network security, email security, mobile security and unified threat management. Sophos is primarily focused on providing security software to 1- to 5,000-seat organizations.
What’s great about this page?
Sophos is a respected brand in the IT security field and is naturally very conscious, especially with respect to EU GDPR. In keeping with the ethos of Sophos as an organization, it has implemented two-factor authentication (2FA) which requires subscribers to present two pieces of information (such as a key fob and password) in order to authenticate to access the status page.
Once again, Sophos has opted for transparency by presenting information for its full range of internet security services that partners and customers need to know the status of, such as email canning.
NAVBLUE, an amalgamation of Navtech, Airbus LUCEM and Airbus ProSky, is Airbus’ flight operations software subsidiary. NAVBLUE provides products which include software for flight planning, aircraft performance, flight data analysis, aeronautical charts, crew planning, electronic flight bag and navigational data.
What’s great about this page?
NAVBLUE provides an easy to view 60 day history lensed by viewing through Service Status, Events and Calendar. Single-click expand and collapse provides fast access to information.
NAVBLUE links out to external resources and provides links to news on the main NAVBLUE website as well as the support portal, making the status page seem very much an integrated part of its web presence.
Sherweb is an IT services and consulting company. Combining reseller and solution selling, Sherweb enables its customers to obtain tailored cloud solutions by providing services and software that integrate multiple third-party cloud technologies from different vendors.
What’s great about this page?
Sherweb caters to a global audience and appreciates that not everyone has English as a first language. Sherweb’s status page incorporates Google Translate, allowing language translation on the fly.
Sherweb has also chosen to fully embrace social tools by providing RSS integration and social links to Twitter, LinkedIn, Facebook, Instagram and YouTube.
To integrate the status page with other elements of support, the page links to the knowledgebase that provides comprehensive support for third-party services.
https://8x8status.status.page/
8x8 is a leading business communications platform providing business phone, video meeting and team chat services. It offers contact center functionality, elevating above many other solutions on the market. It guarantees ‘five nines’ - 99.999% availability to its customers.
What’s great about this page?
Quite a lot going on with the 8x8 status page…! Firstly 8x8 implements Google Translate to meet the need to communicate with a global audience that speaks different languages.
Secondly, It uses a scheme of region tabs to simplify the presentation of information for stakeholders operating in different parts of the world. This approach is a simple structure that provides a very clean interface which lets stakeholders know immediately what is going on.
Thirdly, the page links out to other resources to fetch information from third parties and supports refined design with customized CSS.
https://cofcstatus.statuscast.com
The College of Charleston is a public liberal arts college in Charleston, South Carolina. Founded in 1770 and chartered in 1785, it is the oldest college in South Carolina, the 13th oldest institution of higher learning in the United States, and the oldest municipal college in the country. The college received top marks in the latest edition of the U.S. News Best Colleges 2021 rankings.
What’s great about this page?
To communicate with a diverse group of stakeholders the College of Charleston uses a simple scheme to present information and make it easy to use.
Notably, all the service components have their brand icon with their written name to provide at a glance recognition.
Rather than laying out different information panels vertically, which creates longer pages, the College of Charleston elected to go for a two column scheme which lays out services and calendar of events sie by side.
Veeva Systems Inc. is an American cloud-computing company. Headquartered in Pleasanton, California, it was founded in 2007 by Peter Gassner and Matt Wallach. It is a leading global provider of cloud-based software solutions for regulated industries such as consumer goods, chemical, cosmetics, and life sciences.
What’s great about this page?
Veeva made great use of StausCast’s support for third-party external content by adding a link to an external status page.
This status page also provides exceptional detail of incident history, accessed via the calendar of events.
Veeva also added custom content related to their own third party processors for the purposes of supporting GDPR compliance.
https://trust.controlm.com/#!/
BMC is an innovator, specializing in advancing organizations through automation and data, something it terms the Autonomous Digital Enterprise. This helps organizations to be as efficient and joined up as possible while maximizing the value of their digital assets, software, hardware and data.
What’s great about this page?
BMC innovated by turning the view upside down! Instead of listing services at the top, it summarizes and then lets status page visitors drill down into each service.
This status page is also noteworthy because it is heavily customized with custom icons and branding, showing how StatusCast ensures your status pages remain integrated with your web media assets.
If you like what you see in these examples then perhaps you’re ready to transform your company’s own status pages? Simply book a demo to take a deeper dive with one of our product specialists, or you can sign up for a free 14 day trial and discover how it works for yourself.
The ability to achieve positive outcomes after IT disruptions (yes, they can improve the relationship between the client and provider!) will hinge on the most important piece of insurance your organization has, the Service Level Agreement, or SLA. Service Level Agreements are contracts that outline how a service is delivered to a client, made popular decades ago in the telecommunications industry. With the rise of cloud-based providers—and the fact that over 90% of businesses use the cloud—the importance of SLAs has multiplied. They are so important because when there are disruptions or failures, a realistic and clear SLA can be the difference for a service provider between keeping a client and losing them; and for the client, it provides the foundation for a trusting relationship regarding a crucial function for their business. A Service Level Agreement is a binding arrangement — usually initiated by the provider — that sets the expectations, timetables, and priorities for services or applications being provided by an IT company to its client, defines acceptable parameters for continuous and efficient provision of these same services or applications, and further provides for some form of SLA Reporting. This agreement requires the provider to measure and meet minimum uptime thresholds and other requirements on a periodic (usually monthly) basis in exchange for a fee. SLA Reporting documents uptime statistics, issues that have been addressed, and other information pertinent to the provision of services or applications, usually online in the form of a dashboard. When constructing an SLA for your client, here are some basic components you should include:
Remember, for many businesses, moving to the Cloud seems to make sense but there are many concerns. Aside from the standard legal considerations such as overall liability, third-party indemnification, and data confidentiality, businesses are very worried about uptime availability and your responsibilities – and their remedies – if your service or application goes down. Understand that your clients may depend on you to facilitate mission-critical processes. If your service or application is unavailable, then you are damaging their business. Since the perception of many businesses is that when you move to the Cloud the risk of an application outage increases, you should then begin to see that your client’s concerns are justifiable.
With an appreciation for your client’s application availability concerns now in mind, you can choose to either ignore them and haggle over contractual details during your SLA negotiation – or you can proactively address their concerns. Ignoring them isn’t a good idea; demonstrating that you understand your client’s concerns and addressing them before your client even needs to ask is the client-focused approach to SLA negotiation. This will get your relationship with your client off on the right foot.
Now that we have thoroughly addressed the mechanics of SLAs, it’s time to return to that point about communication we made at the beginning of this article. An SLA is not a cure-all…it is not going to protect you or your customers from frustration and disappointment if it does not proactively address concerns about how application downtime is communicated to end-users. Does this then beg the question as to what is a viable and uncomplicated option for communicating this critical information to customers? An application status page provides the communications platform your customers may not even know to ask for during the SLA negotiation phase of the sales process. Application downtime will certainly come up in these negotiations—and given the long timeframe SaaS companies operate on, that is a question of “when” rather than “if” application downtime happens. What may not come up in these conversations however is how application downtime is communicated.
So why should you plan to include an application status page in the service level agreement template you put in front of new customers?
Even though the customer may not think to ask for it, providing a self-service communications tool like an application status page that customers can use to directly access information about the current status of your application, be reminded of the application’s otherwise excellent track record, and elect to automatically receive SMS, Twitter and/or email alerts at the end user-level provides an alternate version of the application downtime story. By including an application status page in your standard service level agreement template, you are letting customers know that not only will your application be up and running > 99% of the time, but also for that unfortunate less than 1% of the time they will not have to wonder what’s going on—the inconvenience of application downtime will not be compounded by a time-consuming, confusing communications process.
Developing a customer-focused SLA that includes the elements mentioned above will set you apart from the competition as a proactive provider, and will help you to establish a more trusting, lasting relationship with your clients. Remember, your customers are becoming smarter every day, and many of them are learning to never accept the standard SLA. They know that in most cases the standard SLA is vendor-focused. They know that there is always something better—if they fight. Why make them fight?
Sources: CIO.com, Upwork.com
Black Friday/Cyber Monday weekend at the end of last month saw big-name brands such as Neiman Marcus and Target suffer total site outages. Even some online retailers that didn’t crash experienced performance issues that cost them sales from those customers who would not wait for the slow checkout process to resolve (for instance Walmart).
Non-retailers have a lesson to learn here too. The same disruptions to a seamless user experience impact software and hardware adoption as they do to online shopping.
Convenience is a key factor to converting new customers and developing the relationship necessary to retain their continued business, as is transparency and responsiveness about uptime and performance issues.
Newegg (whose site also went down that weekend) was responsive to customer inquiries on Twitter- for which their customers were grateful. Neiman Marcus was similarly responsive to tweets from concerned customers, but the questions just kept coming in. One customer nailed the approach the designer apparel store should have taken.
End users (consumers in this case) who care to receive site or application uptime status updates should have been able to subscribe to those updates via their preferred communication channel (e.g. twitter, SMS/text, email, Slack, etc.). These notification capabilities are standard to any status page tool (a status page is no longer just a “is the site/app down?” webpage).
This proactive communication frees your users to pay attention to other things while they wait for the issue to resolve, rather than frustratedly reloading over and over, hoping your site/application comes back up- and getting increasingly irritated as it continues to fail to do so.
This also frees your own staff to fix the downtime or slowness issue, rather than focusing on communicating about it back and forth which each frustrated end user.
Earlier this year, Aberdeen Group prominently re-iterated the sentiment “slow is the new downtime.” Even if your site or app isn’t down, if it’s experiencing slowness, you need to be accountable and transparent about that via your status page.
For a sense of what constituted slow for online retailers on Black Friday/Cyber Monday weekend, check out this thorough series of reports from Dynatrace.
While ideally you won’t experience disruptions to site or application uptime or even to performance more generally, when it does occur you need to be prepared to handle it professionally and efficiently. See how a status page can help you do that here.
Bringing cloud-based software to healthcare hasn’t been an entirely smooth process, despite the federal government’s massive investment in EHR and EMR technology. In a recent interview with Bob Wachter, a physician and professor of medicine at the University of California, Dr. Wachter cited a “short-term hump period”, red tape, localization, extensive user testing (or rather a lack thereof), and a complex diversity of user roles as contributing factors to the notable lag between the advancement and adoption of new software in the field of medicine vs other industries.
Added to these issues is the challenge of application downtime, which can be particularly troublesome when tools designed to “seamlessly integrate” with your EHRS go down or even just experience minor performance problems.
Using a status page can keep your team informed in two ways about what software and device integrations are experiencing normal uptime and which require attention from IT or from your medical technology vendor. First, having a centralized status page to track the uptime status of the various devices and integrations your hospital relies upon makes it as easy as one quick glance to confirm that everything is operating normally. Second, a status page can broadcast changes in uptime status to subscribed staff, via their preferred communication method.
Broadcasted changes in uptime status can be organized by software component or hardware device, so that only the relevant staff receive the notification. The status page can also be programmed to send notifications immediately, on a delay, or pending manual approval – so minor issues won’t trigger false alarms and create alert burnout.
Once the issue is repaired and uptime status is restored, subscribed staff will also receive a notification informing them that everything is back to 100%, tightening the communication loop further (and providing documentation of the frequency, duration and severity of incidents, should a more serious conversation with a vendor be necessary).
Waste could take the form of mismanagement of resources (time, money, staff, tools, etc.) or of inconsistent or inefficient processes. John Rakowski at AppDynamics succinctly articulates this in the context of the APM world with his example: “multiple overlapping monitoring tools in a typical siloed enterprise mean physical waste (licenses etc), an inconsistency in the way they are used, and ultimately absurdity as alerts are not representative of the business.”
But there is an element to the waste issue he doesn’t consider, one that absolutely represents a mismanagement of resources and likely an inefficient process as well—when things go wrong, how is downtime communication handled?
Is it the DevOps team who is responsible for communicating the incident to customers – when the team’s attention is needed most on resolving the problem? How are the executive team, customer support team and other teams informed when there’s a disruption and when full functionality is restored?
The language of these alerts can be crafted ahead of time, likely in collaboration with Marketing – to keep the message in terms non-tech-savvy users can understand.
The alerts sent by your status page can also be set to one of three protocols: automatic and instant, automatic but delayed, or pending manual confirmation – as not every bump in the road is cause for a customer communication. Similarly, alerts can be sent out that are tailored to specific components (maybe only east coast servers are experiencing issues, or maybe only users of a certain product are having trouble accessing the application). In this manner, you can ensure the right users are getting the right message at the right time, without placing any additional demands on your DevOps team.
By keeping your DevOps team focused on troubleshooting issues and anticipating and avoiding future problems, you are actually facilitating a second element of lean DevOps: continuous improvement. As Jez Humble, of Chef, noted in a presentation last month: “DevOps is not a goal, but a never-ending process of continual improvement.”
You can learn more about how a status page can help you become a lean DevOps organization here.
A hosted status page represents a real-time report of application uptime (often capturing both present/immediate status and uptime percentage overall). It can also contain information about upcoming scheduled maintenance and any other information you’d like to communicate to your end users.
It’s easy to see how the transparency provided by a hosted status page can help build trust, but what value could a customer or prospect find in a hosted status page that would make it good SaaS content marketing material?
Arguably, the hardest part of content marketing is getting your message out there. It is definitely not an “if you built it, they will come” scenario – you need to put in a lot of work distributing your content through channels your target audience is likely to discover. A hosted status page will not increase your reach in this manner, but it can increase your engagement.
End users can subscribe for updates and alerts via their preferred communication channel (e.g. email, text, twitter), meaning that they are more likely to notice and consume your content. The challenge here is that these alerts should only relate to performance or accessibility issues (including updates or upgrades to your application), or you are using the channel disingenuously and are likely to negatively impact your relationship with your end users.
By keeping the message appropriate to the channel however, you can ensure your end users stay excited about the continuing improvements to your application’s performance and that the foundation of transparency and communication you’ve built continues to strengthen your business relationship with them.
Tomasz Tunguz has outlined 9 Marketing Disciplines of Great SaaS Companies—content is only one of these nine. The others include Evangelism (leverage enthusiastic users), Customer Lifecycle (upsell/cross-sell to unlock the other 50%+ of revenue potential from existing customers), and Communications (brand strategy, brand narrative and public relations).
A hosted status page helps develop both the brand narrative (transparency, reliability, partnership) and the customer relationship necessary to capitalize on these key aspects of effective SaaS marketing.
You can get a free trial of the StatusCast hosted status page tool here.
Common themes among the advice brought together across APMdigest’s series included using Network Emulators to simulate real-world conditions, testing latency and realistic load (or cloud testing and capacity planning), establishing baselines with performance monitoring, analyzing application logs, and testing as early in the development cycle as possible.
Network Emulators allow you to test on a duplicate of the production environment, so there’s no surprises when the application actually goes live. Another commentator offered this additional piece of advice: not only test in the same environment but test using the same profiling tool you will use in the actual production environment, and make sure it can “find meaningful correlations at scale” as “A top methods list will only show you where your time went processing your synthetic load.” (Joe Rustad, Manager, Software Development & Architecture, Dell Software)
Cloud testing and capacity planning is most effective when it is designed to simulate user transactions and includes stress testing, which not only tells you the limits of what your system can support but also what happens at the limits. One commentator particularly noted that “Organizations adopting SaaS apps like Office 365 or Google Apps often don’t realize that their internet connectivity isn’t up to the increased traffic.” (Patrick Carey, VP Product Management and Marketing, Exoprise)
Establishing baselines prior to production is helpful for being able to demonstrate what “normal” performance is to stakeholders on other teams and for having a point of reference grounded in the structured data that supports your infrastructure dependencies. This can also make it easy to detect performance issues (on both the application and the network side) at the development or staging level rather than the production level (which amounts to a cheaper resolution for a number of reasons). Arguably the most important aspect of establishing baselines is checking them regularly against real time performance in production.
Creating and analyzing application logs is what facilitates early testing, by informing what pre-production issues there are and providing a tight feedback loop to guide your (and Development’s, depending on how your teams are structured) troubleshooting efforts.
Internal communications are most easily handled through a corporate portal or email. These communications should include instructions for how to talk about the progress on the application issue with customers/end users. “Broadcasting” updates to either of these communication mediums can be easily accomplished through your status page tool.
Using your status page software to communicate with customers is just as straightforward. Create premade messages that will go out via SMS text, email, Google calendar, etc. or that are installed on a page on your own website via a widget. You can even set preferences as to whether you’d prefer real-time communication or you’d prefer to have someone review and edit the communication before it goes out.
You’ll want to make sure to have the marketing team supply whatever art you’d like to use to ensure the page is consistent with the rest of your company’s brand. Though it’s a page about technical updates, the status page is meant to be customer-friendly.
You might also distinguish service components in a way that addresses specific geographic regions, application layers or business components, so each end user better understands what the disruption means for them.
A status page increases efficiency for your IT team and creates more and better opportunities to smooth over a potentially negative customer experience with your customers. You can read more about the cost of downtime here.
As many of you may already know by reading my blogs, I am not only the co-founder of StatusCast, but also a co-founder of another successful Software as a Service company. For years we built and maintained our own set of server racks in a hosted environment. Due to the nature of our solution, it is critical that the uptime availability for this other company is stellar. In fact, it has maintained 99.98% or greater for over 10 years.
One of our primary data centers is located on the East Coast. The hosting provider we use doesn’t have an easily accessible application status page. They have some type of Support Management Console, but it hardly acts as a business intelligence dashboard, rarely communicating what we need for our business, and there is no easy way for us to get the data we need fast enough.
There have been several times over the years where the staff of my other company was first to alert this hosting provider that a problem existed, and even then getting proper system updates from them on the reported issue was more than brutal. Downstream issues can’t always be caught with as much advance notice as your classic application monitoring services so it makes good sense to ensure that your hosting provider feeds you uptime information as well.
StatusCast is in the process of integrating the status feeds of several top hosting providers. If your choosing a hosting provider, and they do not provide this data, you should demand it or go elsewhere. For anyone considering StatusCast to assist them in creating their customer facing system status page, let me know as we’ll be more than happy to build out an integration from your hosting provider’s system dashboard (or even provide the dashboard for your hosting provider – as we can key off our own dashboards as well). Remember, if your hosting company doesn’t have an accessible system status application page, they aren’t only failing you, but the customers (or end-users) that put their trust in your company.
It’s simply a fact of life: cloud applications go down. Whether that application is running within your corporate firewall on your own private servers, or out in the “ethers”, how you react to downtime is critical. It doesn’t matter who your end-users are: employees, suppliers, or customers; communicating the status of unexpected and planned system events has a direct impact on your company’s top and bottom line.
the product we sold was built and re-deployed thousands of times (long before slipstream deployments were commonplace and helped reduce downtime).
we had seemingly countless “scheduled maintenance” events where we had to upgrade either our hardware, software, or some other infrastructural component. You would be surprised how many times we had to announce scheduled maintenance because our co-location facility told us that they would be doing the same.
and as much as we hate to admit it, just like every other cloud-based application that has ever been built, we suffered our share of downtime and application performance problems as a result of unforeseen circumstances.
As the years went by and our support teams changed hands, the process (or lack thereof) changed as well. The amount of time we provided notice before scheduled maintenance events was never set in stone. The language used within unexpected system outages rarely found the right balance between providing the customers too much or too little information. And to top it off, no matter what we did, nothing seemed to reduce the number of irate customers calling our help desk, even if we gave them weeks of advance notice.
Overstressed help desks. When applications become unavailable, the natural response of its users are to reach out and find what’s wrong. Any application with more than a handful of users is going to quickly inundate your help desk team with inbound support requests. The expense of having your help desk respond to each of these requests with (hopefully) the same message over and over again, should not be overlooked.
Lost employee productivity – Frustrated and idle employees are a nightmare and costly. The Aberdeen Group’s estimates an average size company loses $110,000 an hour when an application becomes unavailable. Your goal as someone managing downtime should be to make those times frictionless for your users. This means having a process in place that proactively keeps your users in the know so they can be as efficient as possible. If you sell a SaaS, frustrated customers don’t translate to lost employee productivity, they translate to ex-customers.
Create a culture of communication. It should be hardwired into your team’s DNA: when something goes wrong, before we even start looking at the problem, let the customers know.
Jasen Fici
Co-founder, Uptime.ly