Now hiring

Reliability Engineer, Global Reliability Intelligence Programs @ Amazon UK Services Ltd. - A10

London, England, GBROnsiteFull-timePosted 1 days ago

Opens on the employer's site

About this role

A Reliability Engineer focused on RCA and FMEA hunts down the true causes of failures and eliminates them before they happen again. They lead high-impact investigations, turn data into clear actions, and drive measurable improvements in uptime and performance. This role also gets ahead of problems by identifying risks early through FMEA and building smarter, more reliable systems. If you enjoy solving complex problems, influencing decisions, and delivering real results at scale, this is where you do it. This role may require up to 50% travel. Key job responsibilities • Lead Root Cause Analysis (RCA) for high-impact and recurring failures, driving deep-dive investigations to identify true root causes and ensure effective, lasting corrective actions • Develop, maintain, and continuously improve Failure Modes and Effects Analysis (FMEA) to proactively identify risks, prioritize mitigation, and prevent future failures • Analyze equipment and operational data to identify trends, systemic issues, and performance gaps, translating findings into actionable reliability improvements • Build and maintain BI dashboards, automated reports, and performance metrics (e.g., uptime, MTBF, failure rates) to enable data-driven decision-making • Lead cross-functional execution of reliability improvements by partnering with operations, engineering, maintenance, and external vendors across multiple sites and regions • Drive development and enhancement of RCA/FMEA tools and software by working closely with DevOps and technical teams, including requirements gathering, testing, and user feedback • Establish and standardize reliability best practices, while supporting policy creation, training, and organizational adoption of RCA and FMEA methodologies A day in the life In this role, you will partner closely with DevOps teams to refine and improve tools and systems that support RCA and FMEA at scale. You will analyze failure trends to identify recurring issues, systemic gaps, and opportunities to improve reliability and performance. A key focus is supporting FMEA initiatives by helping teams proactively identify risks and implement effective mitigation strategies. You will also review high-impact events and completed RCAs to ensure quality, consistency, and actionable outcomes. In addition, you will collaborate with engineers, operators, and vendors across regions to align corrective actions and drive execution, strengthening how the organization learns from and prevents failures.

Ready to apply?

Install the ResuMinder extension and we'll auto-fill the application in seconds — no rewriting.

Get the extension →
See how your CV scores — free
Reliability Engineer, Global Reliability Intelligence Programs at Amazon UK Services Ltd. - A10 | ResuMinder Jobs