Overview: This role focuses on building and operating the ML Ops / LLM Ops pipeline that closes it: ingest production signal, redact it, store it, slice it, classify it, surface the failures, mine new eval cases, and alert on regressions. You drive the toolchain decisions, the data-governance posture, and the day-to-day reliability of the pipeline itself. The Head of AI sets vision and priorities and you own the technical execution end-to-end. What will you do? Design and build a source-agnostic ingestion pipeline for production ML / LLM traffic Design storage tiering based on automotive and company requirements, policy-driven retention windows, and privacy requirements Build slicing dashboards and the query path engineers use to debug production at 11p.m. Enable autoraters and lightweight LLM classifiers across production traffic Build the rule-based triage layer for obvious failures Stand up the eval-mining workflow and wire regression alerts to model and prompt deploys Implement PII redaction at the ingestion boundary and safety / abuse classification on inbound content Define dashboard architecture, wipeout mechanisms, tool and hosting selection, and operate the pipeline end-to-end What are we looking for? Must Have Proven experience building and operating data or ML platform systems in production, covering ingest, schema, storage, access control, alerts, and on-call Hands-on experience building and running ML / LLM evaluation systems in production (offline regression sets, online autoraters, LLM-as-judge pipelines, golden datasets) Hands-on experience with LLM tracing and observability tooling Experience shipping PII redaction or comparable data-handling controls in a regulated or multi-tenant environment, with a pragmatic approach to data governance Strong understanding of how ML and LLM-based systems fail in production: hallucination, retrieval failures, agent loops that don’t terminate, ASR / TTS degradation, and prompt or model regressions across deploys Production Python proficiency; hands-on engineer, not advisory. Comfortable leveraging AI in everything you build Nice to Have Preferable multi-tenant or white-label SaaS experience with per-tenant data isolation Azure experience and ability to make self-host vs managed SaaS calls on tradeoffs Experience with autorater methodology and contamination defenses Knowledge of vector databases, embedding-based clustering or unsupervised failure-mode discovery Experience with data-versioning tooling (LakeFS, DVC, Delta Lake) GDPR / right-to-erasure work Embedded, automotive, or another constrained environment context Working knowledge of a language beyond English sufficient to validate non-English failure modes Prior experience using Cloud (Microsoft Azure and AWS); Prior experience with Claude Code; Prior experience with GitHub; Languages: Python primary, SQL, and some TypeScript for dashboards; LLM APIs: Claude (Anthropic), OpenAI, open-source models as needed Android/AAOS ecosystem as clients What can you expect from us? A permanent job contract for a long term project; Tech equipment + SIM Card + personal smartphone; Health and Life Insurance; Social events and team buildings; The commitment of letting you grow with us, and be rewarded accordingly; A dynamic and young team that will be always there to support you; Training in the latest technologies; Coffee, fruits, snacks and a warm welcoming when you pass by the office.

Senior ML Ops / LLM Ops Engineer @ Caixa Mágica Software

About this role

Ready to apply?