Blog post

How to Succeed with Site Reliability Engineering (SRE)

By Daniel Betts | October 04, 2022 | 1 Comment

Infrastructure and Operations LeadersInfrastructure, Operations and Cloud Management

Many clients are considering Site Reliability Engineering (SRE), but often then grapple with understanding the prerequisites and implications!

Questions often involve what actually is SRE, what does an SRE do what skills do they need, how to start, gain value from, and evolve.

Over the past two years, Gartner has discussed SRE with clients in well over 2,000 inquiry calls, and a collection of impactful research showcased below has been created to meet our clients most pressing needs.


What Is Site Reliability Engineering?

  • SRE is a modern approach to operations that supports DevOps at scale by balancing the need for velocity against stability and risk.
  • SRE is a set of engineering principles and practices that focuses on improving customer experience and retention by leveraging service-level objectives to govern how services are managed.
  • Most importantly, SRE is not a simple rebranding of an existing operations team. It requires a collaborative engineering mindset and a demonstrated ability to continually learn, improve and share knowledge.

Can deep dive into the definition of SRE in the following Gartner Quick Answer note:


Steps to Start and Evolve an SRE Practice

Organizations are under pressure to drive innovation, relying heavily on digital channels to reach their customers anywhere they are. Keeping up with market demands and customer needs is driving adoption of complex architectures, leading to a combination of cloud-native applications, SaaS, platform as a service and third-party services and dependencies.

But traditional operating models are not designed to keep pace with the ever-increasing velocity of digital business transformation. In addition, many I&O teams struggle with a lack of skills required to handle emerging technologies and new ways of delivering software products. As a result, I&O teams are unable to achieve business or reliability objectives or meet customer expectations. This skill gap also leads to unnecessary friction between I&O, application development and product owners as they struggle to align goals and collaborate effectively

Gartner has defined a 7 steps approach to start and evolve your SRE practice –


Site Reliability Engineer Job Description

Site reliability engineers (SREs) are responsible for improving system reliability and resilience to make it faster and easier to develop and deploy new software capabilities. SREs focus especially on building automation to reduce manual effort and prevent operations incidents.

Gartner provides a sample job description to give a representative overview of the site reliability engineer role. This is designed to be customized to the specific needs and requirements of your organization. It is based on an analysis of publicly available job descriptions from organizations representing a range of industries and geographies. Data was sourced from TalentNeuron, a Gartner tool that leverages analytics to analyze job postings and provide labor market insights. 

More insights into the SRE job description can be found here:


Improve the Reliability of Large, Complex and Distributed IT Systems by Leveraging SRE Principles

Organizations today rely on large, complex, distributed software products that are creating a supply chain of internal and external service and/or platform providers. The varying goals and business models of these nonaligned, disparate providers, however, make it difficult for organizations to mitigate and contain risks — including reliability and resiliency.

Infrastructure reliability isn’t just a matter of creating high availability systems or ensuring adequate system performance. It goes beyond redundancy by aligning engineering practices, technology platforms and organizational practices across all relevant internal and external environments.

Therefore, for organizations to build increased reliability to mitigate and contain risks and honor customer commitments, end-to-end transparency in the software development, procurement and a deployment process need to be implemented. A supply-chain mindset expands the perspective from the software factory into the network of internal and external suppliers. We then pragmatically leverage site reliability engineering (SRE) perspectives to further improve

  • Adopt a Supply-Chain Mindset to Build Reliable IT Products
  • Build and Maintain a Map of Components and Dependencies to Have End-to-End Visibility to Your IT Product Supply Chain
  • Partner With Suppliers to Optimize Interactions and Improve Reliability

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed

1 Comment

  • Great post Daniel! Understanding the prerequisites and implicationsites of reliability engineering is key and indeed, very often many I&O teams have to cope with lack of skills to handle some key technologies or new ways of delivering software.

    It seems to me that it is increasingly crucial to focus on the right skills to be able to go beyond what traditional operating models offer. Focusing on what pertains the role of the “Site reliability engineers” (SREs) is foundamental: the SREs job description is therefore a must read for all IT organizations looking to reduce manual effort and prevent operations incidents.