Discogs Inc: Senior Site Reliability Engineer - Data (REMOTE)

Headquarters: Beaverton, Oregon URL: https://www.discogs.com/about/careers The Discogs Platform team is focused on several objectives: building and supporting performant, cost-effective, reliable infrastructure; developer experience tooling and mentorship; and creating "golden paths" for organization-wide standards and velocity. As a key member of the Platform team, the Senior Site Reliability Engineer - Data will be working closely with other Discogs engineering squads to develop and optimize scalable, well-planned relational database architectures, drive best practices and stability for our use of Kafka and change data capture, and contribute to the Platform team’s operations. Location This is a remote position. Open to candidates located in OR, WA, CA, CO, TX, IL Compensation Starting Base Salary Range: $130,000 - $140,000 yearly What You’ll Accomplish Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. Stewarding Discogs’ data stores as a key subject matter expert Leading efforts on the reliability and design patterns of our Kafka and Kafka Connect implementations Establishing data contracts and clear communication standards between CDC producers and consumers Working closely with engineering squads to refactor and re-architect MySQL database schema and indexing for long-term scalability, performance, and cost effectiveness Mentoring engineering squads on Platform best practices for MySQL, Kafka, and other software development lifecycle areas  Writing documentation and runbooks that contribute to the engineering organization’s knowledge base Working in a containerized, orchestrated environment Contributing to the Platform team’s disciplines of site reliability and operations, supporting both our squads and Platform’s central infrastructure Participating in on-call rotation, responding to incidents, and troubleshooting data and other operations issues What You’ll Contribute Minimum Education and Experience A Bachelor's Degree in Computer Science or similar area of focus, or equivalent relevant work experience. 5+ years of experience working with Kafka and relational database management systems (RDBMS). 6+ years experience in Ops, DevOps, Site Reliability, Platform or other systems roles. Required Skills & Abilities: Relational database schema design, query performance optimization, administration (MySQL, Percona Server, AWS RDS) Kafka: Cluster administration (Strimzi), Kafka Connect (Debezium, JDBC) CI/CD (GitHub Actions) GitOps (ArgoCD) Kubernetes (EKS, Kustomize, Karpenter, administration, application manifests) AWS and cloud development (VPC, EKS, RDS, S3) Observability (Datadog, Sentry) Scripting (Shell, Python) Track record of collaboration and mentorship Excellent written communication and documentation skills Continuous learning Ownership and proactive approach to solving large problems Preferred: Infrastructure-as-code (Terraform) Elasticsearch (ECK administration, scaling, performance) Python (SQLAlchemy, FastAPI) GraphQL (schema design, Apollo federation) REST API Hashicorp Vault Redis Memcached NoSQL Database Data Lake/Warehouse Data Governance Data Security The Platform team covers a wide range of technical topics and we'd love to hear about your skills beyond this list! To apply: https://weworkremotely.com/remote-jobs/discogs-inc-senior-site-reliability-engineer-data-remote

May 21, 2025 - 19:40
 0
Discogs Inc: Senior Site Reliability Engineer - Data (REMOTE)

Headquarters: Beaverton, Oregon
URL: https://www.discogs.com/about/careers

The Discogs Platform team is focused on several objectives: building and supporting performant, cost-effective, reliable infrastructure; developer experience tooling and mentorship; and creating "golden paths" for organization-wide standards and velocity. As a key member of the Platform team, the Senior Site Reliability Engineer - Data will be working closely with other Discogs engineering squads to develop and optimize scalable, well-planned relational database architectures, drive best practices and stability for our use of Kafka and change data capture, and contribute to the Platform team’s operations.

Location

This is a remote position. Open to candidates located in OR, WA, CA, CO, TX, IL

Compensation

Starting Base Salary Range: $130,000 - $140,000 yearly

What You’ll Accomplish

Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

  • Stewarding Discogs’ data stores as a key subject matter expert
  • Leading efforts on the reliability and design patterns of our Kafka and Kafka Connect implementations
  • Establishing data contracts and clear communication standards between CDC producers and consumers
  • Working closely with engineering squads to refactor and re-architect MySQL database schema and indexing for long-term scalability, performance, and cost effectiveness
  • Mentoring engineering squads on Platform best practices for MySQL, Kafka, and other software development lifecycle areas 
  • Writing documentation and runbooks that contribute to the engineering organization’s knowledge base
  • Working in a containerized, orchestrated environment
  • Contributing to the Platform team’s disciplines of site reliability and operations, supporting both our squads and Platform’s central infrastructure
  • Participating in on-call rotation, responding to incidents, and troubleshooting data and other operations issues

What You’ll Contribute

Minimum Education and Experience

  • A Bachelor's Degree in Computer Science or similar area of focus, or equivalent relevant work experience.
  • 5+ years of experience working with Kafka and relational database management systems (RDBMS).
  • 6+ years experience in Ops, DevOps, Site Reliability, Platform or other systems roles.

Required Skills & Abilities:

  • Relational database schema design, query performance optimization, administration (MySQL, Percona Server, AWS RDS)
  • Kafka: Cluster administration (Strimzi), Kafka Connect (Debezium, JDBC)
  • CI/CD (GitHub Actions)
  • GitOps (ArgoCD)
  • Kubernetes (EKS, Kustomize, Karpenter, administration, application manifests)
  • AWS and cloud development (VPC, EKS, RDS, S3)
  • Observability (Datadog, Sentry)
  • Scripting (Shell, Python)
  • Track record of collaboration and mentorship
  • Excellent written communication and documentation skills
  • Continuous learning
  • Ownership and proactive approach to solving large problems

Preferred:

  • Infrastructure-as-code (Terraform)
  • Elasticsearch (ECK administration, scaling, performance)
  • Python (SQLAlchemy, FastAPI)
  • GraphQL (schema design, Apollo federation)
  • REST API
  • Hashicorp Vault
  • Redis
  • Memcached
  • NoSQL Database
  • Data Lake/Warehouse
  • Data Governance
  • Data Security

The Platform team covers a wide range of technical topics and we'd love to hear about your skills beyond this list!

To apply: https://weworkremotely.com/remote-jobs/discogs-inc-senior-site-reliability-engineer-data-remote