Now hiring

Machine Learning Engineer (Madrid, ES) @ ATOS International

Madrid, ESOnsiteFull-timePosted 1 days ago

Opens on the employer's site

About this role

<p><strong>About Bull</strong></p> <p> </p> <p>Bull is a story. One with a century of European innovation and a working environment where experts design powerful, sustainable, and sovereign digital solutions, enabling states and industries to retain full control over their data and their AI.</p> <p>Bull is also thousands of engineers, researchers and passionate tech people shaping the future of high‑performance computing, AI, and quantum technologies.</p> <p>Every day, our teams push the boundaries of what is technologically possible – from next‑generation HPC architectures to exascale supercomputers – supported by world‑class R&amp;D, more than 1,600 patents, and unique end‑to‑end capabilities spanning hardware design, software engineering, data science and quantum research.</p> <p>We are a people‑centric, innovation‑driven company, where collaboration spans Europe, the Americas and India. We share a common vision of a responsible and sustainable innovation that delivers concrete impact for our customers.</p><p> </p> <p> </p> <p>We are searching for a <strong>Machine Learning Engineer </strong>to join Bull’s innovative R&amp;D team and contribute to the development of AI-driven solutions for infrastructure monitoring, reliability, and cybersecurity in High-Performance Computing (HPC) environments.</p> <p> </p> <p><strong>Role description:</strong></p> <p>The role focuses on leveraging large-scale operational telemetry, metrics, and logs to build predictive capabilities that improve system availability, detect anomalies, and support proactive operations. <br>The selected candidate will be responsible not only for model development but also for rigorous validation, operationalization, and integration within a Kubernetes-based platform.</p> <p> </p> <p><strong>Responsibilities:</strong><br>• Design and develop ML/DL models for predicting hardware failures and detecting software or behavioral anomalies in HPC systems.<br>• Apply advanced analytics techniques such as time-series forecasting, anomaly detection, classification, and predictive maintenance using large-scale monitoring data.<br>• Build and maintain data pipelines and features from infrastructure telemetry and logs.<br>• Perform rigorous model validation to ensure robustness, reliability, and production readiness.<br>• Deploy and operationalize models within a Kubernetes-based environment, including scalable inference services and lifecycle management.<br>• Contribute to AI-driven cybersecurity use cases, such as detecting abnormal behaviors, potential intrusions, or security-related anomalies in infrastructure and system activity.<br>• Work within an Agile/Scrum environment, participating in sprint planning, stand-ups, and retrospectives.<br>• Collaborate with system administrators, support teams, and data engineers to translate operational challenges into data-driven solutions that enhance system reliability and automation.</p> <p> </p> <p><strong>Education:</strong></p> <p>• Master’s or PhD in Computer Science, Artificial Intelligence, Data Science, Telecommunications or a related field.</p> <p> </p> <p><strong>Skills &amp; Competencies:</strong></p> <p>• Strong experience with Machine Learning and Deep Learning frameworks (e.g., TensorFlow, PyTorch, Scikit-learn).<br>• Experience working with time-series data and anomaly detection models.<br>• Proficiency in Python and data science ecosystems.<br>• Experience with Prometheus or similar monitoring/telemetry systems.<br>• Familiarity with containerization and orchestration technologies, especially Kubernetes.<br>• Experience building production-grade ML pipelines.<br>• Experience handling large-scale monitoring and operational datasets.<br>• Understanding of distributed systems and infrastructure monitoring.<br>• Knowledge of HPC environments, GPUs, and high-speed interconnects (e.g., Infiniband) is highly desirable.<br>• Proficiency with Git-based version control systems (GitHub, GitLab).<br>• Solid experience working in Linux environments.<br>• Good understanding of Scrum methodology and experience with Jira and Confluence.</p> <p> </p> <p><strong>Location</strong>: Spain</p> <p> </p> <p><strong>Benefits:</strong></p> <div> <ul style="list-style-type:disc"> <li>Flexible Work Schedule: Half day Fridays and an intensive summer workday supporting work life balance.</li> <li>Learning and Growth: Opportunities to work with advanced AI technologies in an innovative and supportive R&amp;D environment.</li> </ul> <p><span style="color:white"><span>#Bull</span> </span></p> </div><p> </p> <p style="padding-left:440.0px"><strong>Join us!</strong></p> <p> </p> <p>Here, your ideas, your curiosity and your technical excellence directly shape the next era of advanced computing - unlocking enterprise value, accelerating scientific progress and driving positive impact for society.</p>

Ready to apply?

Install the ResuMinder extension and we'll auto-fill the application in seconds — no rewriting.

Get the extension →
See how your CV scores — free