We Work Remotely: DevOps & Sysadmin Jobs

Chipcolate: Site Reliability Engineer

Headquarters: Milan URL: https://chipcolate.com Job Summary Chipcolate is looking for an experienced Site Reliability Engineer (SRE) to own the reliability, scalability and performance of our agent-oriented financial platform. You’ll design and automate resilient cloud infrastructure, keep latency low and uptime high, and equip our product team with the tooling they need to ship confidently at scale. Salary: €65 000 – €85 000 gross / year Location: Work from anywhere within ± 4 hours of Central European Time Employment type: Full-time, permanent About Us Chipcolate is a small Italian company of craftsman-engineers who believe great software should be as elegant as it is robust. Historically we worked on embedded systems, web applications and 3D printing. At this time, we are working with a customer to build high-throughput financial services that power thousands of autonomous agents. We operate in a quite unstructured and flexible way, that’s the only way to get speed and quality. Mission: Empower our customer to be set for success from the get go Learn more: chipcolate.com · LinkedIn (@chipcolate) · GitHub (https://github.com/chipcolate) Responsibilities Architect, provision and maintain a distributed multi-provider cloud infrastructure to meet strict availability and latency. Implement long term solutions to support thousands of concurrently executing agents. Postgres database performance and reliability, including OLAP application Develop scalable observability stacks (Grafana / OpenTelemetry) with actionable SLOs. Implement automated reliability measures: blue/green deploys, canary roll-outs, chaos testing and game days. Partner with backend teams to profile services, eliminate bottlenecks and design for horizontal scaling. Drive cost-efficient capacity planning and security best practices. Experience & Qualifications Must-have: 3+ years in SRE, DevOps or Production Engineering Deep knowledge of Linux and containers Deep knowledge of Postgres Proficiency in at least one programming language (Node, Python, Go, or Rust) Infrastructure-as-Code mastery (Ansible and Terraform) Strong monitoring/alerting chops; you think in RED/USE metrics Nice-to-have: Experience with Grafana observability stack Experience with event-driven or agent-based architectures Experience running multi-region, active-active setups Experience with Supabase Experience with DuckDB Experience managing a Kubernetes cluster at scale No formal degree required—show us how you’ve built and kept complex systems alive. Benefits Flexible hours & fully remote Fast growing environment Fun and innovative application domain 20 days paid leave + local public holidays Competitive salary Application Process Apply online with a CV (or GitHub profile) and a short note on your proudest “save-the-day” incident. Skill test. Cultural chat and Technical deep-dive with our CTO (60 min, system-design & live problem-solving). Offer within 7 working days or final interview. Ready to make high-stakes infrastructure feel effortless? Apply today—let’s engineer reliability together. To apply: https://weworkremotely.com/remote-jobs/chipcolate-site-reliability-engineer

May 22, 2025 - 14:00

Headquarters: Milan
URL: https://chipcolate.com

Job Summary

Chipcolate is looking for an experienced Site Reliability Engineer (SRE) to own the reliability, scalability and performance of our agent-oriented financial platform. You’ll design and automate resilient cloud infrastructure, keep latency low and uptime high, and equip our product team with the tooling they need to ship confidently at scale.

Salary: €65 000 – €85 000 gross / year
Location: Work from anywhere within ± 4 hours of Central European Time
Employment type: Full-time, permanent

About Us

Chipcolate is a small Italian company of craftsman-engineers who believe great software should be as elegant as it is robust. Historically we worked on embedded systems, web applications and 3D printing. At this time, we are working with a customer to build high-throughput financial services that power thousands of autonomous agents. We operate in a quite unstructured and flexible way, that’s the only way to get speed and quality.

Mission: Empower our customer to be set for success from the get go
Learn more: chipcolate.com · LinkedIn (@chipcolate) · GitHub (https://github.com/chipcolate)

Responsibilities

Architect, provision and maintain a distributed multi-provider cloud infrastructure to meet strict availability and latency.
Implement long term solutions to support thousands of concurrently executing agents.
Postgres database performance and reliability, including OLAP application
Develop scalable observability stacks (Grafana / OpenTelemetry) with actionable SLOs.
Implement automated reliability measures: blue/green deploys, canary roll-outs, chaos testing and game days.
Partner with backend teams to profile services, eliminate bottlenecks and design for horizontal scaling.
Drive cost-efficient capacity planning and security best practices.

Experience & Qualifications

Must-have:

3+ years in SRE, DevOps or Production Engineering
Deep knowledge of Linux and containers
Deep knowledge of Postgres
Proficiency in at least one programming language (Node, Python, Go, or Rust)
Infrastructure-as-Code mastery (Ansible and Terraform)
Strong monitoring/alerting chops; you think in RED/USE metrics

Nice-to-have:

Experience with Grafana observability stack
Experience with event-driven or agent-based architectures
Experience running multi-region, active-active setups
Experience with Supabase
Experience with DuckDB
Experience managing a Kubernetes cluster at scale

No formal degree required—show us how you’ve built and kept complex systems alive.

Benefits

Flexible hours & fully remote
Fast growing environment
Fun and innovative application domain
20 days paid leave + local public holidays
Competitive salary

Application Process

Apply online with a CV (or GitHub profile) and a short note on your proudest “save-the-day” incident.
Skill test.
Cultural chat and Technical deep-dive with our CTO (60 min, system-design & live problem-solving).
Offer within 7 working days or final interview.

Ready to make high-stakes infrastructure feel effortless? Apply today—let’s engineer reliability together.

To apply: https://weworkremotely.com/remote-jobs/chipcolate-site-reliability-engineer