
Site Reliability Engineer
- Amsterdam, Noord-Holland
- Vast
- Voltijds
- Operate and manage on-prem Kubernetes clusters, including upgrades, scaling, and monitoring
- Automate infrastructure tasks using CI/CD pipelines and infrastructure as code tools (e.g., Terraform, Helm, Ansible)
- Continuously monitor and improve system reliability, performance, and availability
- Take part in incident response, troubleshooting, and post-mortem analysis
- Collaborate on infrastructure planning, architecture design, and security reviews
- 5+ years of experience operating and troubleshooting Kubernetes, particularly in on-prem environments
- Experience managing both stateless and stateful workloads in Kubernetes
- Knowledge of Kubernetes RBAC and securing multi-tenant environments
- 2+ years of experience writing production-grade Python code
- Strong background in Linux systems administration
- Familiarity with observability tools such as Prometheus, Grafana, ELK/EFK
- Experience with CI/CD systems (e.g., GitLab CI, ArgoCD)
- Proficiency in Infrastructure as Code and automation tools (Terraform, Helm, Ansible)
- Solid understanding of networking, load balancing, and container security practices
- Familiarity with SRE principles, including SLIs, SLOs, and error budgets
- A high sense of ownership, initiative, and a collaborative mindset
- A keen interest in performance optimization and scalable system design