Site Reliability Engineer
Engineers ensuring the reliability, scalability, and performance of production systems.
Skills
What companies are looking for in this role.
Designing and implementing monitoring, alerting, and observability systems for distributed infrastructure
Managing incident response, root cause analysis, and postmortem processes
Debugging production issues across application, system, and network layers
Implementing automation and tooling to reduce manual operational work
Operating and scaling Kubernetes clusters and container orchestration platforms
Designing and maintaining infrastructure as code for automated provisioning and deployment
Participating in on-call rotations and rapid incident response
Planning and executing infrastructure scaling across cloud providers and geographic regions
Designing resilience and fault tolerance for mission-critical systems at scale
Building internal platforms and tooling to simplify infrastructure operations
Optimizing system performance and resource utilization across infrastructure
Defining and measuring service level objectives and error budgets
Managing multi-cloud and hybrid cloud infrastructure deployments
Designing telemetry pipelines and data collection for observability
Implementing network reliability and designing failover mechanisms
Operating storage systems and managing data persistence at scale
Diagnosing and resolving issues at kernel and hypervisor levels
Implementing cloud security and identity management controls
Managing and operating AI-specific infrastructure for model inference and training workloads
Collaborating with product and engineering teams to embed reliability into system design
Establishing operational standards and best practices across engineering organizations
Leading and managing SRE teams through mentoring and coaching
Communicating complex technical concepts to diverse stakeholder audiences
Technology
The tools and technologies that define this role.
Open Jobs
74 open Site Reliability Engineer jobs across 30 companies.
Other Engineering roles
General-purpose software engineering roles focused on building and maintaining software systems. Covers generalist SWE positions that don't clearly fall into frontend, backend, fullstack, or other specialized tracks.
Engineers focused on server-side systems, APIs, services, and data processing pipelines. Includes roles explicitly labeled as backend or server-side development.
Engineers specializing in user-facing interfaces, web applications, and client-side development. Includes UI/UX engineering and web development roles.
Engineers working across the entire application stack, handling both frontend and backend responsibilities.
Engineers building and maintaining internal platforms, cloud infrastructure, compute systems, and developer tooling.