Where should i start to make my application reliable

Priyanka Dey
3 min readJan 6, 2022

In a world with high expectations and availability. The challenge that often worries service provides and application owners , how to make applications more reliable.

A customer logs in to a website and often finds the unavailable message. Of course, we can derive a simple theory reliability is equal to financial gains.

So where should organisations start to make their applications reliable. Isn't that the answer to the question everyone , running a business wants to know. Thats the question, that gets asked at every meeting small or big.

Before we even answer that question, let’s establish what do we mean by reliability. Reliability as my best friend google defines it , “the quality of being trustworthy or of performing consistently well” .

The quality of being consistently performing well, so if I am the CEO of a e-commerce website that hosts pictures lets call it Piciee , making the website available to my customers /consumers to upload their pictures to the website ,will be the number one importance on my agenda .

In order to make decisions on how your application is performing you need data or metrics. And these metric then tie back to the consumer requirement. SLA’s , SLO’s and SLI’s are tools that help a business evaluate the state of a current system and recommend how to improve performance and help achieve the desired state of the application .

Service Level Objectives — to believe that a platform can be reliable as expected. Reliability is deeply associated with the health of the application , how healthy should the application be at all times.

So to define Service Level Objective — it is the percentage of the time the application is expected to be healthy. I say healthy , as when the services in the application are healthy , the services will be available for consumption

Service Level Objective is the common target objective for availability, that is agreed achievable by developers, release managers, SRE’s and various other stakeholders .

Before of-course we begin with the process of deciding the SLO , through hours of gruelling decisions .Its assumed that the application is healthy and available as we start working through a target for Service Level Objective . We cannot work with a system that is not available , unless it is functional then only can we decide on a target availability .

So let’s go back to Piciee the picture hosting website , in our article . Let’s walkthrough some of the user stories for Piciee

  1. Piciee should be available to users for uploading of pictures 90% of the time

The SRE’s ,developers and stakeholders have agreed that Piciee should be available 90% of the time in a month, which means 90% of the time the application should be healthy . Assuming, we have 30 days in a month 27 days in a month the website should be up and running and 10%r thats about 3 days in a month the website will can be unavailable . This percentage of unavailability is defined as the Error Budget.

These 3 days a month can be used for maintenance windows, downtimes for new future release, rolling out fixes.

However, if in a month ,Piciee is down for more than 3 days it’s time to re evaluate the performance of the services. Time to go back to the drawing board and review the architectural design, make changes and find the root cause of what is causing the Error Budget to dip or in other wordings the application being unavailable more.

SLO is a powerful tool backed by metrics and data to let the SRE’s, developers and stakeholders understand if the application can be available, reliable or how to make it more useful.

Service Level Indicators — based on what measurement will you probe the performance of the service. And that is Service Level Indicator . If you want to know how to make your system reliable , you should measure the rates of successful and unsuccessful queries to the application.

SLI’s become the KPI’s that drive the SLO and finally tie that back to the customer requirement .

Service Level Agreements -on the other hand is an agreement that is formed between the consumer and the service provider based on the quality of service and the expected service availability .

If you are building your application from the scratch ,make sure to make SLO’s and SLI’s a part of the system requirements. To drive home a better understanding of the workings of your application.

--

--

Priyanka Dey

I am an IT professional with 12 years expereince in Datawarehousing. I am very curious on learning new technology