
Senior Site Reliability Engineer
London, Greater London, South East, England
Apply by 1 Jun 2025
£100000 - £115000 per annum, Benefits: 15% bonus, 7% pension
Job Ref.: BH-51847
Job Description
Requirements:
- Proven experience managing and optimizing a diverse infrastructure stack.
- Extensive knowledge of cloud platforms (AWS, Azure, GCP) and infrastructure as code (Terraform, CloudFormation).
- Familiarity of service mesh technologies (Istio, Linkerd).
- Solid understanding of virtualization (VMware, Hyper-V) and containerization (Docker, Kubernetes) and orchestration.
- Understanding of storage solutions (SAN, NAS, cloud storage) and backup systems.
- Strong understanding of network protocols, routing, switching, and firewalls. • Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools.
- Experience in DNS management and troubleshooting.
- Experience in network security best practices.
- Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk).
- Proficiency in at least one scripting language (Python, Bash) for automation.
- Experience with CI/CD pipeline management and DevOps practices.
- Strong understanding of disaster recovery and business continuity planning.
- Experience with performance tuning and capacity planning.
- Understanding of chaos engineering principles and practices.
- Skills in cost optimization for cloud infrastructure.
Specific Tools and Techniques:
- Experience in using cloud native monitoring tools like AWS CloudWatch, Azure Monitor, and Google Cloud Operations Suite.
- Experience with packet capture tools like Wireshark for troubleshooting network issues.
- Experience in using traceroute utilities and performance analysis tools like perf for identifying and resolving bottlenecks.
- Familiarity with tools such as ipconfig/ifconfig for viewing network configurations, flushing DNS, and diagnosing network issues.
- Experience with SNMP-based tools for network device monitoring and performance management.
- Experience in using NetFlow for network traffic analysis.
- Experience with tools like iostat, vmstat, and dstat for monitoring storage and system performance.
- Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions.
- Familiarity with tools like Prometheus and Grafana for monitoring and observability