Role Description:
- You focus on the operation of scalable and reliable services in our evolving hybrid environment, combining public cloud (GCP), private cloud (OpenShift) and on-premise Linux/VM infrastructure
- Positioned at the interface between Engineering and Operation, you provide and implement reliability practices with a DevOps mindset
- You automate recurring activities and develop “Infrastructure as Code”, use suitable tests, monitoring procedures and analysis tools to ensure the availability of our services and the protection of our customer data.
- You implement and improve orchestration pipelines (deployment, canaries, rollback) and observability tooling (monitoring, logging, metrics, tracing)
- You enable Teams to implement Non Functional Requirements and SRE best practices by providing SRE consulting across the PCDM organisation
- You participate actively in evolving system architecture and cross functional requirements to help build a state of the art, flexible and transparent application landscape
- You enjoy working with and within an international team, and with departments across Europe
Qualifications:
- Multiple years of experience / relevant certification working as an SRE with Public Cloud
- Working knowledge of at least one "glue language" i.e Python, Golang. Java also welcome
- Handling Linux environment (user level), virtualization as well as containerization (Docker)
- Knowledge of DevOps tools like Terraform (Cloud) and working with git, GitHub, YAMLs and such
- Being familiar with observability tools like Datadog, Kibana, Prometheus, Logstash, Grafana
- Infrastructure experience or interest in the areas of networks, firewalls, load balancer, databases.
- Enthusiasm for agile methods and teaching operations to "You Build It, You Run It" teams
- Infrastructure experience or interest in the areas of networks, firewalls, load balancer, databases
- Ability to work in a team, resilience and openness to new challenges
- Very good written and spoken English, ideally German