Applied Methods
~metaEngineeringSite Reliability Engineer

Site Reliability Engineer

Engineers ensuring the reliability, scalability, and performance of production systems.

$ titles --canonical
Site Reliability EngineerSenior SREStaff SREProduction EngineerReliability EngineerInfrastructure SRE
74open jobs
30companies hiring
$02

Skills

What companies are looking for in this role.

$ skills --core

Designing and implementing monitoring, alerting, and observability systems for distributed infrastructure

95%

Managing incident response, root cause analysis, and postmortem processes

93%

Debugging production issues across application, system, and network layers

92%

Implementing automation and tooling to reduce manual operational work

90%

Operating and scaling Kubernetes clusters and container orchestration platforms

90%

Designing and maintaining infrastructure as code for automated provisioning and deployment

88%

Participating in on-call rotations and rapid incident response

88%

Planning and executing infrastructure scaling across cloud providers and geographic regions

85%

Designing resilience and fault tolerance for mission-critical systems at scale

85%

Building internal platforms and tooling to simplify infrastructure operations

82%

Optimizing system performance and resource utilization across infrastructure

80%

Defining and measuring service level objectives and error budgets

80%

Managing multi-cloud and hybrid cloud infrastructure deployments

78%

Designing telemetry pipelines and data collection for observability

75%

Implementing network reliability and designing failover mechanisms

72%

Operating storage systems and managing data persistence at scale

70%

Diagnosing and resolving issues at kernel and hypervisor levels

68%
$ skills --emerging

Implementing cloud security and identity management controls

68%

Managing and operating AI-specific infrastructure for model inference and training workloads

65%
$ skills --soft

Collaborating with product and engineering teams to embed reliability into system design

83%

Establishing operational standards and best practices across engineering organizations

78%

Leading and managing SRE teams through mentoring and coaching

75%

Communicating complex technical concepts to diverse stakeholder audiences

72%
$03

Technology

The tools and technologies that define this role.

$ tech --language
Pythonvery high
Bashhigh
Gohigh
Rubylow
$ tech --platform
AWSvery high
Kubernetesvery high
Linuxvery high
Azuremoderate
GCPmoderate
$ tech --tool
Terraformvery high
Ansiblehigh
Dockerhigh
Grafanamoderate
PagerDutymoderate
Prometheusmoderate
Splunkmoderate
VictoriaMetricsmoderate
cert-managerlow
CoreDNSlow
Fluentbitlow
Gatekeeperlow
Jaegerlow
QEMU/KVMlow
QuickWitlow
Vectorlow
$ tech --concept
CI/CDhigh
HTTP/TLS/DNSmoderate
Kubernetes Operatorsmoderate
Service Meshmoderate
Cloud Security Posture Managementlow
$04

Open Jobs

74 open Site Reliability Engineer jobs across 30 companies.

Block2d
Senior Site Reliability Engineer
New York, NY, United States of America·Engineering
Block2d
Senior Site Reliability Engineer
Bay Area, CA, United States of America·Engineering
CoreWeave3d
Senior Site Reliability Engineer, Data Infrastructure
New York, NY / Bellevue, WA·Engineering
CoreWeave4d
Senior Production Engineer
Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA·Engineering
Crusoe5d
Senior Staff Engineer, Cloud Site Operations
San Francisco, CA - US·Engineering
Nscale1w
Senior Site Reliability Engineer -AI Infrastructure Operations
US·Engineering
Nscale1w
Senior Site Reliability Engineer -AI Infrastructure Operations
UK·Engineering
Together AI1w
Lead/Manager Site Reliability Engineering Team (Amsterdam)
Amsterdam·Engineering
Glean1w
Lead Site Reliability Engineer
San Francisco Bay Area·Engineering
Harvey1w
Staff Software Engineer, Site Reliability (SRE)
Bengaluru·Engineering
Figma1w
Software Engineer, Production Engineering (London, United Kingdom)
London, England·Engineering
Figma1w
Software Engineer, Production Engineering
San Francisco, CA • New York, NY • United States·Engineering
MongoDB1w
Site Reliability Engineer (Senior or Staff), Atlas
Austin; Boston; Chicago; New York City; Pittsburgh; United States·Engineering
MongoDB1w
Team Lead, Site Reliability Engineering - Fleet Management
Austin; Boston; Chicago; Denver; Miami; New York City; San Francisco; Seattle·Engineering
MongoDB1w
Site Reliability Engineer (Senior or Staff), Observability
Ireland·Engineering
MongoDB1w
Site Reliability Engineer (Senior or Staff), Observability
Austin; Boston; Chicago; New York City; Pittsburgh; United States; Washington DC·Engineering
MongoDB1w
Site Reliability Engineer (Senior or Staff), Infrastructure Security
Austin; San Francisco; Seattle; United States·Engineering
MongoDB1w
Team Lead, Site Reliability Engineering - Storage Layer Service
Boston; Charlotte; New York City; Philadelphia; Pittsburgh; Washington DC·Engineering
Nebius1w
Site Reliability Engineer
Amsterdam, Netherlands; Remote - Europe·Engineering
Nebius1w
Network Site Reliability Engineer (NetSRE)
Amsterdam, Netherlands; Remote - Europe·Engineering