Director of Reliability

A company is looking for a Director of Reliability. Key Responsibilities Implement robust monitoring, alerting, and automation strategies to prevent downtime and service disruptions Optimize system performance to improve response times and enhance customer satisfaction Lead the development of an effective incident response process to ensure quick resolution of outages Required Qualifications 10+ years of experience in software engineering, DevOps, or Site Reliability Engineering, with at least 5+ years in a leadership role Proven experience with large-scale, mission-critical distributed systems focusing on reliability and scalability Expertise in cloud platforms such as AWS, Azure, or Google Cloud Strong background in observability tools like Prometheus, Grafana, or Datadog Experience with infrastructure as code (Terraform, CloudFormation) and containerization (Docker, Kubernetes)

Apr 22, 2025 - 22:50
 0
Director of Reliability
A company is looking for a Director of Reliability. Key Responsibilities Implement robust monitoring, alerting, and automation strategies to prevent downtime and service disruptions Optimize system performance to improve response times and enhance customer satisfaction Lead the development of an effective incident response process to ensure quick resolution of outages Required Qualifications 10+ years of experience in software engineering, DevOps, or Site Reliability Engineering, with at least 5+ years in a leadership role Proven experience with large-scale, mission-critical distributed systems focusing on reliability and scalability Expertise in cloud platforms such as AWS, Azure, or Google Cloud Strong background in observability tools like Prometheus, Grafana, or Datadog Experience with infrastructure as code (Terraform, CloudFormation) and containerization (Docker, Kubernetes)