Permanent Site Reliability Engineer – Pepkor Vacancies
Job Description
Get daily job updates directly on WhatsApp
Pepkor Vacancies – Site Reliability Engineer
Site Reliability Engineer – Pepkor Vacancies
Job Overview
The Site Reliability Engineer plays a critical role in ensuring the reliability, scalability, and performance of large-scale systems. This role combines strong software engineering expertise with operational excellence to design resilient architectures, automate processes, and lead incident management while driving continuous improvement across platforms.
Key Responsibilities
-
Develop advanced proficiency in multiple scripting and programming languages to deliver robust, scalable solutions
-
Design, build, and implement sophisticated automation tools and processes for managing complex systems at scale
-
Lead and manage critical incident responses with efficiency, followed by detailed post-incident reviews and preventative actions
-
Influence system architecture and design decisions through strong technical insight and strategic thinking
-
Establish, promote, and enforce reliability standards to ensure sustainable and scalable operations
-
Apply strategic planning and analytical thinking to support long-term system stability and business goals
-
Provide technical leadership by influencing key decisions and collaborating effectively with cross-functional teams
-
Mentor and coach junior and intermediate engineers, fostering a culture of learning, collaboration, and continuous growth
Minimum Requirements
-
8–10 years’ relevant experience in Site Reliability Engineering, DevOps, or Systems Engineering
-
Matric qualification
-
Strong proficiency in scripting languages
-
Relevant certifications in Cloud, Oracle, or DevOps technologies
Technical Skills
-
Continuous delivery and deployment practices
-
Cloud platforms and best practices
-
System and application observability, including performance monitoring
-
Infrastructure as Code and configuration management
-
Infrastructure as a Service implementation
-
Containerization technologies
-
Automation and orchestration tools
-
Strong collaboration and communication skills
-
Coding and scripting expertise
-
Azure DevOps experience
-
System uptime and availability management
-
Service Level Objectives (SLO) management
-
Latency monitoring and optimization
-
Incident, outage, and change management
-
Capacity planning and performance forecasting
APPLY NOW
85 total views, 1 today
and then