Director of Reliability
A company is looking for a Director of Reliability.
Key Responsibilities
Implement robust monitoring, alerting, and automation strategies to prevent downtime and service disruptions
Optimize system performance to improve response times and enhance customer satisfaction
Lead the development of an effective incident response process to ensure quick resolution of outages
Required Qualifications
10+ years of experience in software engineering, DevOps, or Site Reliability Engineering, with at least 5+ years in a leadership role
Proven experience with large-scale, mission-critical distributed systems focusing on reliability and scalability
Expertise in cloud platforms such as AWS, Azure, or Google Cloud
Strong background in observability tools like Prometheus, Grafana, or Datadog
Experience with infrastructure as code (Terraform, CloudFormation) and containerization (Docker, Kubernetes)
A company is looking for a Director of Reliability.
Key Responsibilities
Implement robust monitoring, alerting, and automation strategies to prevent downtime and service disruptions
Optimize system performance to improve response times and enhance customer satisfaction
Lead the development of an effective incident response process to ensure quick resolution of outages
Required Qualifications
10+ years of experience in software engineering, DevOps, or Site Reliability Engineering, with at least 5+ years in a leadership role
Proven experience with large-scale, mission-critical distributed systems focusing on reliability and scalability
Expertise in cloud platforms such as AWS, Azure, or Google Cloud
Strong background in observability tools like Prometheus, Grafana, or Datadog
Experience with infrastructure as code (Terraform, CloudFormation) and containerization (Docker, Kubernetes)