This Site Reliability Engineering Practitioner® (SREP) Certification course introduces ways to economically and reliably scale services in an organization. It explores strategies to improve agility, cross-functional collaboration, and transparency of health of services towards building resiliency by design, automation, and closed-loop remediations.
SRE Practitioner Certification Training Delivery Methods
SRE Practitioner Certification Training Course Benefits
Successfully implement a flourishing SRE culture in your organization
Manage the organizational impact of introducing SRE
Build security and resilience by design in a distributed, zero-trust environment
Prepare for the DevOps Institute SRE Practitioner certification exam
Participation in unique exercises designed to apply concepts
Get sample documents, templates, tools, and techniques
Access to additional value-added resources and communities
Continue learning and face new challenges with after-course one-on-one instructor coaching
SRE Practitioner Certification Training Outline
- It is highly recommended that learners attend the SRE Foundation course (Course 3694) before attending the SRE Practitioner course.
- An understanding and knowledge of common SRE terminology, concepts, principles, and related work experience are recommended.
- Successfully passing (65%) the 90-minute examination, consisting of 40 multiple-choice questions, leads to the SRE Practitioner certificate. The certification is governed and maintained by DevOps Institute.
- Rebranding Ops or DevOps or Dev as SRE
- Users notice an issue before you do
- Measuring until my Edge
- False positives are worse than no alerts
- Configuration management trap for snowflakes
- The Dogpile: Mob incident response
- Point fixing
- Production Readiness Gatekeeper
- Fail-Safe really?
- Define SLIs that meaningfully measure the reliability of a service from a user’s perspective
- Defining System boundaries in a distributed ecosystem for defining correct SLIs
- Use error budgets to help your team have better discussions and make better data-driven decisions
- Overall, reliability is only as good as the weakest link on your service graph
- Error thresholds when 3rd party services are used
- SRE and their role in Building Secure and Reliable systems
- Design for Changing Architecture
- Fault-tolerant Design
- Design for Security
- Design for Resiliency
- Design for Scalability
- Design for Performance
- Design for Reliability
- Ensuring Data Security and Privacy
- Modern Apps are Complex & Unpredictable
- Slow is the new down
- Pillars of Observability
- Implementing Synthetic and End-user monitoring
- Observability driven development
- Distributed Tracing
- What happens to monitoring?
- Instrumenting using Libraries and Agents
- Taking a Platform Centric View solves Organizational scalability challenges such as fragmentation, inconsistency, and unpredictability
- How do you use AIOps to improve resiliency?
- How can DataOps help you in the journey?
- A simple recipe to implement AIOps
- Indicative measurement of AIOps
- SRE Key Responsibilities towards incident response
- DevOps & SRE and ITIL
- OODA and SRE Incident Response
- Closed Loop Remediation and the Advantages
- Swarming – Food for Thought
- AI/ML for better incident management
- Navigating Complexity
- Chaos Engineering Defined
- Quick Facts about Chaos Engineering
- Chaos Monkey Origin Story
- Who is adopting Chaos Engineering?
- Myths of Chaos
- Chaos Engineering Experiments
- GameDay Exercises
- Security Chaos Engineering
- Chaos Engineering Resources
- Key Principles of SRE
- SREs help increase reliability across the product spectrum
- Metrics for Success
- Selection of Target areas
- SRE Execution Model
- Culture and Behavioral Skills are key
- SRE Case study