About this role
Summary
We are seeking an experienced (Senior) Problem Analyst to join an IT organisation supporting multiple IT Solutions critical for Business daily operations. This role is pivotal in driving root cause analysis (RCA) across software and infrastructure layers. You will operate in a high-pressure, mission-critical environment, collaborating with cross-functional teams to eliminate recurring issues and strengthen system resilience.
Key Responsibilities
1. Problem Management
Own end-to-end problem management lifecycle (identification, investigation, RCA, resolution, closure).Act as coordinator across application, infrastructure, and third-party teams when necessary.Ensure adherence to ITIL Problem Management process.2. Root Cause Analysis (RCA) & Continuous Improvement
Perform deep-dive analysis on recurring or critical incidents (software defects, infrastructure failures, integration issues).Use structured methods such as 5 Why’s, Ishikawa (Fishbone) or Fault Tree AnalysisProduce clear, actionable RCA reports with remediation plans.Identify systemic weaknesses and drive permanent fixes, not just workarounds.3. Production Stability & Reliability
Monitor trends in incidents to detect patterns and systemic risks.Proactively initiate problem records based on data analysis.4. Stakeholder Communication
Provide clear, concise communication to IT leadership, Business stakeholders and Operations teamsFacilitate post-incident reviews (PIRs) and ensure lessons learned are implemented.Communicate technical findings in business-understandable language.5. Governance & Process Excellence
Ensure compliance with:ITIL best practicesInternal SLAs / OLAsMaintain and improve knowledge base articles (KBA).Contribute to runbooks/playbooks for incident handling.6. Collaboration & Leadership
Challenge teams constructively to drive accountability and quality.Mentor junior analysts and support organizational maturity.
Required Skills & Experience
5–10+ years in:Incident / Problem ManagementProduction Support / Application SupportProven experience handling/contributing to major incidents in complex environments.Experience in global, multi-team, 24/7 operations.Technical Skills
Strong understanding of:Distributed systems and enterprise applicationsCloud environments (Azure)Databases (SQL)APIs and integration layersAbility to analyse:Logs (Splunk, ELK, Dynatrace, AppDynamics, etc.)Monitoring dashboardsProcess & Methodology
Strong knowledge of ITIL (Incident, Problem, Change).Experience with tools such as: ServiceNow / BMC Remedy / Jira Service ManagementExperience implementing RCA best practices.Soft Skills
Excellent analytical and troubleshooting skills.Ability to remain calm under pressure.Strong communication and stakeholder management skills.Proactive mindset with a bias for action and ownership. Working hours:
Monday to Friday - 08:00 to 17:00 Hybrid Work
