About this role
<div><div style="padding:10.0px 0.0px;border:1.0px solid transparent"><div style="font-size:16.0px;word-wrap:break-word"><H2 style="font-size:1.0em;margin:0.0px">Job Summary</H2> </div><div><div> <p><span style="font-family:arial, helvetica, sans-serif">The Customer Reliability Engineering (CRE) team at Keystone blends software engineering, SRE, and customer experience with a strong customer-first mindset, owning issues end-to-end and driving systemic reliability improvements.</span></p> <p><span style="font-family:arial, helvetica, sans-serif">You act as a bridge between customers, support, and engineering resolving complex cross-system issues and improving reliability, observability, and customer experience across distributed systems like subscription, activation, telemetry, and billing.</span></p> <p><span style="font-family:arial, helvetica, sans-serif">CREs take direct ownership of debugging, fixing, and enhancing services within the subscription lifecycle, resolving issues at the source and involving development teams only for major or architectural changes.</span></p> <p><span style="font-family:arial, helvetica, sans-serif"><strong>About the Team</strong></span></p> <p><span style="font-family:arial, helvetica, sans-serif">The NetApp Keystone team powers NetApp’s storage-as-a-service (STaaS) offering, enabling customers to consume storage across on-prem and cloud environments through a flexible subscription model.</span></p> <p><span style="font-family:arial, helvetica, sans-serif">The platform spans multiple distributed components, including Subscription Engine, Activation Workflows, Data Analytics, Processors, Collectors, ASUP, Sphere, and the Keystone Console, working together to deliver a seamless, reliable, and scalable customer experience.</span></p> </div></div></div><div style="padding:10.0px 0.0px;border:1.0px solid transparent"><div style="font-size:16.0px;word-wrap:break-word"><H2 style="font-size:1.0em;margin:0.0px">Job Requirements</H2> </div><div><ul> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">5–8 years of software development or customer engineering experience, with at least 3 years in backend or technical support engineering roles</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Strong proficiency in Go or Python (preferably both); ability to debug and contribute to production codeWorking knowledge of React</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">TypeScript for diagnosing UI-layer issues</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Strong understanding of distributed systems, microservices, and event-driven architectures</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Hands-on experience with Kubernetes and Docker (log analysis, debugging, deployments)</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Proficiency with REST and gRPC APIs; ability to isolate and debug failures</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Experience with PostgreSQL and at least one NoSQL database; ability to write diagnostic queries</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Familiarity with time-series databases (ClickHouse, InfluxDB, TimescaleDB)</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Experience with Kafka or NATS (consumer lag, offsets, message flow debugging)</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Hands-on experience with Prometheus, Grafana, and log aggregation tools</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Working knowledge of CI/CD pipelines and Git workflows</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Understanding of Agile/SCRUM/LEAN methodologies</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Strong written and verbal communication skills ability to author clear RCA reports, runbooks, and customer updates</span></li> </ul> <p><span style="font-family:arial, helvetica, sans-serif"><strong>Role & Responsibilities:</strong></span></p> <div> <ul> <li>Own end-to-end resolution of customer issues across Keystone systems</li> <li>Perform RCA and act as DRI to drive incident resolution</li> <li>Deliver fixes/enhancements; involve dev teams for major changesImprove reliability, observability, and error handlingBuild diagnostics/runbooks to reduce MTTR and drive prevention</li> <li>Collaborate across teams and customers to enhance platform stability and experience</li> </ul> </div></div></div><div style="padding:10.0px 0.0px;border:1.0px solid transparent"><div style="font-size:16.0px;word-wrap:break-word"><H2 style="font-size:1.0em;margin:0.0px">Education</H2> </div><div><ul> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">IC - Typically requires a minimum of 5 years of related experience.</span></li> <li style="font-family:arial, helvetica, sans-serif"><span style="font-family:arial, helvetica, sans-serif">Bachelor of Science Degree in Computer Science, Electrical Engineering, or a related field; a Master’s Degree is preferred</span></li> </ul></div></div></div>