About this role
Key Responsibilities 1. Real-Time Infrastructure Monitoring
• Perform 24x7 monitoring of critical facility systems across global data centers, including: • Electrical power systems • Mechanical systems • HVAC and cooling infrastructure • Fire detection and suppression systems • Water systems and supporting infrastructure
• Continuously monitor EPMS, BMS, DCIM, and centralized monitoring platforms. • Detect abnormal operating conditions and alarms. • Acknowledge and investigate alarms promptly. • Track incidents and issues through to closure. • Identify monitoring gaps and recommend improvements to monitoring coverage.
2. Incident Response and Coordination
• Provide first-level incident triage and technical assessment. • Respond to facility alarms and operational events in real time. • Execute escalation procedures according to defined protocols. • Coordinate with internal teams, site personnel, vendors, and regional stakeholders to ensure timely issue resolution. • Support major incident management activities for events such as: • Utility power failures • UPS and generator events • Cooling/HVAC failures • Fire alarm activations • Water leakage events • Security and environmental alerts
• Maintain end-to-end ownership of incidents until resolution.
3. Ticket Management and Change Coordination
• Create, update, and manage event tickets within established SLA targets. • Process work orders and monitor completion quality. • Track maintenance activities and change requests. • Support change management processes and ensure operational compliance. • Maintain accurate records of facility maintenance activities and change windows.
4. Compliance and Operational Governance
• Monitor and follow up on preventive maintenance activities and routine operational changes. • Review technical documentation submitted by vendors and service providers, including: • Method of Procedure (MOP) • Risk Assessment (RA) • Standard Operating Procedure (SOP)
• Ensure maintenance activities comply with operational standards and freeze-period requirements. • Support risk management and operational audit activities.
5. Monitoring Platform and Data Administration
• Maintain monitoring platform master data and infrastructure records. • Ensure the accuracy, completeness, and timeliness of asset and alarm information. • Support platform optimization and continuous improvement initiatives. • Maintain facility logs, event records, and operational documentation.
6. Reporting and Data Analysis
• Analyze facility operational data and identify trends or recurring issues. • Prepare operational reports and performance summaries. • Provide recommendations to improve reliability and operational efficiency. • Maintain records required for audit, compliance, and management reporting.
7. Operational Support and Continuous Improvement
• Participate in after-hours support and emergency escalations. • Provide remote support for overseas data center operations when required. • Support centralized cross-regional operations and collaboration. • Contribute to process improvements and monitoring platform enhancements. • Perform other duties as assigned to support business continuity and operational excellence.
Minimum Qualifications
• Associate Degree, Diploma, or higher in Engineering, Information Technology, Facilities Management, or related disciplines. • Minimum 2 years of experience in data center operations, facility monitoring, NOC, command center, or mission-critical environments. • Working knowledge of: • Electrical systems • Mechanical systems • HVAC and cooling infrastructure • Fire detection and suppression systems • Building Management Systems (BMS) • Electrical Power Monitoring Systems (EPMS) • DCIM or centralized monitoring platforms
• Experience working with incident management and escalation procedures. • Strong communication and coordination skills. • Ability to work in a 24x7 rotating shift environment. • Ability to manage multiple priorities in high-pressure situations. • Fluent in English. • Chinese language proficiency (reading, writing, and verbal communication) is preferred to support Chinese alarm messages, documentation, and communications.
Preferred Qualifications
• Experience in: • Network Operations Center (NOC) • Facility Operations Center (FOC) • Data Center Operations • Critical Environment Operations • Mission Critical Facilities
• Experience supporting global or cross-regional operations. • Familiarity with structured incident, change, and problem management processes. • Understanding of data center capacity management (space, power, cooling). • Experience working with CMMS, DCIM, EPMS, BMS, or ticketing platforms. • Ability to perform root cause analysis and drive issue resolution.
Desired Competencies
• Strong sense of ownership and urgency. • Excellent communication and stakeholder management skills. • Detail-oriented with strong documentation practices. • Analytical and problem-solving mindset. • Ability to learn quickly and adapt to changing operational environments. • Team-oriented with a proactive and customer-focused attitude.
Preferred Certifications Candidates with the following certifications will have an advantage:
• CDCP – Certified Data Centre Professional • CDCS – Certified Data Centre Specialist • FSM – Facilities Systems Management • Uptime Institute ATD • ITIL Foundation • DCCA or DCT certifications • Electrical or Mechanical engineering certifications
Shift Requirements
• Must be willing to work a 24x7 rotating shift schedule. • Participate in weekends, public holidays, and on-call duty rotations when required. • Support emergency response activities and major incidents.
Key Performance Indicators (KPIs) The successful candidate is expected to consistently achieve:
• 100% shift attendance and handover compliance. • 24x7 continuous monitoring coverage. • Alarm acknowledgement within 1 minute. • Immediate notification generation within 2 minutes. • Event ticket creation within 10 minutes. • Compliance with escalation and incident management SLAs. • Zero service-impacting human errors. • Accurate documentation and reporting. • Continuous improvement contributions to operational processes and monitoring platforms.