ob Description – L2 Enterprise Monitoring Engineer Role Overview The L2 Enterprise Monitoring Engineer is responsible for advanced monitoring, incident analysis, and troubleshooting across infrastructure, applications, and network layers. This role acts as the primary resolver group for monitoring-triggered incidents and plays a key role in reducing alert noise, improving monitoring effectiveness, and driving faster resolution. L2 engineers are expected to go beyond SOPs—analyze, fix, and improve. Key Responsibilities Advanced Monitoring & Event Analysis Perform deep analysis of alerts generated from enterprise monitoring tools (SolarWinds, SCOM, Dynatrace, etc.) Correlate multiple alerts/events to identify underlying issues (avoid symptom-based handling) Fine-tune alert thresholds and suppress false positives Identify gaps in monitoring coverage and recommend improvements Incident Troubleshooting & Resolution Take ownership of P2/P3 incidents and support P1 (Major Incidents) Perform detailed troubleshooting across: Servers (Windows/Linux) Network (connectivity, latency, packet loss) Applications (availability, performance) Execute standard fixes, workarounds, and recovery actions Engage L3/OEM vendors when required with proper diagnostics Major Incident Support (MIM) Support Major Incident calls by providing technical insights and updates Perform real-time troubleshooting and log analysis during outages Ensure quick identification of root cause or workaround Provide inputs for incident timelines and updates Automation & Monitoring Optimization Create and enhance monitoring scripts, thresholds, and alert logic Automate repetitive tasks using scripting (PowerShell / Shell / Python – basic level) Drive reduction in alert noise and manual effort Contribute to continuous improvement initiatives Knowledge Management & Documentation Create and update Knowledge Base (KB) articles and runbooks Document known errors and workarounds Ensure troubleshooting steps are reusable by L1 team Collaboration & Escalation Act as technical escalation point for L1 team Guide L1 analysts on triage and handling improvements Coordinate with cross-functional teams (Infra, App, Network, Cloud) Ensure proper escalation to L3 with complete diagnostics Shift & Operations Participate in 24x7 rotational shifts (including weekends/on-call if applicable) Ensure high-quality shift handovers with actionable insights Required Skills & Qualifications Technical Skills (Core Expectation) Strong hands-on experience in: Windows & Linux server administration Network fundamentals (DNS, TCP/IP, routing basics) Application monitoring concepts (APM tools like Dynatrace/AppDynamics preferred) Strong working knowledge of monitoring tools: SCOM / SolarWinds / Dynatrace / Nagios / Zabbix Log analysis skills (Event Viewer, syslogs, basic Splunk/Kibana exposure preferred) Basic scripting skills: PowerShell / Bash / Python (any one) Process & Frameworks Strong understanding of ITIL: Incident Management Event Management Problem Management (basic involvement) Soft Skills (Non-Negotiable) Strong communication—clear, structured, and confident (especially with US stakeholders) Analytical thinking (must move beyond checklist-based work) Ownership mindset—drives issues to closure Ability to work under pressure during incidents Experience & Education 3–5 years of experience in monitoring / infrastructure support / NOC Bachelor’s degree in IT / Computer Science or related field ITIL Foundation (preferred) Relevant certifications (Azure/AWS/Monitoring tools) – good to have

Senior Support Engineer @ Infinite Computer Solutions

About this role

Ready to apply?