Chipcolate: Site Reliability Engineer
Headquarters: Milan URL: https://chipcolate.com Job Summary Chipcolate is looking for an experienced Site Reliability Engineer (SRE) to own the reliability, scalability and performance of our agent-oriented financial platform. You’ll design and automate resilient cloud infrastructure, keep latency low and uptime high, and equip our product team with the tooling they need to ship confidently at scale. Salary: €65 000 – €85 000 gross / year Location: Work from anywhere within ± 4 hours of Central European Time Employment type: Full-time, permanent About Us Chipcolate is a small Italian company of craftsman-engineers who believe great software should be as elegant as it is robust. Historically we worked on embedded systems, web applications and 3D printing. At this time, we are working with a customer to build high-throughput financial services that power thousands of autonomous agents. We operate in a quite unstructured and flexible way, that’s the only way to get speed and quality. Mission: Empower our customer to be set for success from the get go Learn more: chipcolate.com · LinkedIn (@chipcolate) · GitHub (https://github.com/chipcolate) Responsibilities Architect, provision and maintain a distributed multi-provider cloud infrastructure to meet strict availability and latency. Implement long term solutions to support thousands of concurrently executing agents. Postgres database performance and reliability, including OLAP application Develop scalable observability stacks (Grafana / OpenTelemetry) with actionable SLOs. Implement automated reliability measures: blue/green deploys, canary roll-outs, chaos testing and game days. Partner with backend teams to profile services, eliminate bottlenecks and design for horizontal scaling. Drive cost-efficient capacity planning and security best practices. Experience & Qualifications Must-have: 3+ years in SRE, DevOps or Production Engineering Deep knowledge of Linux and containers Deep knowledge of Postgres Proficiency in at least one programming language (Node, Python, Go, or Rust) Infrastructure-as-Code mastery (Ansible and Terraform) Strong monitoring/alerting chops; you think in RED/USE metrics Nice-to-have: Experience with Grafana observability stack Experience with event-driven or agent-based architectures Experience running multi-region, active-active setups Experience with Supabase Experience with DuckDB Experience managing a Kubernetes cluster at scale No formal degree required—show us how you’ve built and kept complex systems alive. Benefits Flexible hours & fully remote Fast growing environment Fun and innovative application domain 20 days paid leave + local public holidays Competitive salary Application Process Apply online with a CV (or GitHub profile) and a short note on your proudest “save-the-day” incident. Skill test. Cultural chat and Technical deep-dive with our CTO (60 min, system-design & live problem-solving). Offer within 7 working days or final interview. Ready to make high-stakes infrastructure feel effortless? Apply today—let’s engineer reliability together. To apply: https://weworkremotely.com/remote-jobs/chipcolate-site-reliability-engineer


Headquarters: Milan
URL: https://chipcolate.com
Job Summary
Chipcolate is looking for an experienced Site Reliability Engineer (SRE) to own the reliability, scalability and performance of our agent-oriented financial platform. You’ll design and automate resilient cloud infrastructure, keep latency low and uptime high, and equip our product team with the tooling they need to ship confidently at scale.
- Salary: €65 000 – €85 000 gross / year
- Location: Work from anywhere within ± 4 hours of Central European Time
- Employment type: Full-time, permanent
About Us
Chipcolate is a small Italian company of craftsman-engineers who believe great software should be as elegant as it is robust. Historically we worked on embedded systems, web applications and 3D printing. At this time, we are working with a customer to build high-throughput financial services that power thousands of autonomous agents. We operate in a quite unstructured and flexible way, that’s the only way to get speed and quality.
- Mission: Empower our customer to be set for success from the get go
- Learn more: chipcolate.com · LinkedIn (@chipcolate) · GitHub (https://github.com/chipcolate)
Responsibilities
- Architect, provision and maintain a distributed multi-provider cloud infrastructure to meet strict availability and latency.
- Implement long term solutions to support thousands of concurrently executing agents.
- Postgres database performance and reliability, including OLAP application
- Develop scalable observability stacks (Grafana / OpenTelemetry) with actionable SLOs.
- Implement automated reliability measures: blue/green deploys, canary roll-outs, chaos testing and game days.
- Partner with backend teams to profile services, eliminate bottlenecks and design for horizontal scaling.
- Drive cost-efficient capacity planning and security best practices.
Experience & Qualifications
Must-have:
- 3+ years in SRE, DevOps or Production Engineering
- Deep knowledge of Linux and containers
- Deep knowledge of Postgres
- Proficiency in at least one programming language (Node, Python, Go, or Rust)
- Infrastructure-as-Code mastery (Ansible and Terraform)
- Strong monitoring/alerting chops; you think in RED/USE metrics
Nice-to-have:
- Experience with Grafana observability stack
- Experience with event-driven or agent-based architectures
- Experience running multi-region, active-active setups
- Experience with Supabase
- Experience with DuckDB
- Experience managing a Kubernetes cluster at scale
No formal degree required—show us how you’ve built and kept complex systems alive.
Benefits
- Flexible hours & fully remote
- Fast growing environment
- Fun and innovative application domain
- 20 days paid leave + local public holidays
- Competitive salary
Application Process
- Apply online with a CV (or GitHub profile) and a short note on your proudest “save-the-day” incident.
- Skill test.
- Cultural chat and Technical deep-dive with our CTO (60 min, system-design & live problem-solving).
- Offer within 7 working days or final interview.
Ready to make high-stakes infrastructure feel effortless? Apply today—let’s engineer reliability together.
To apply: https://weworkremotely.com/remote-jobs/chipcolate-site-reliability-engineer