About this role
The MLIL DataPlane team is looking for a Software Development Engineer to own the design and implementation of our inference data plane. We build the software that makes large models run efficiently on custom hardware - spanning model execution, memory management, data movement, and serving integration. Our work covers the full inference path: integrating serving engines with custom hardware, developing high-performance compute kernels, enabling efficient data movement, and driving models from early validation through production. We operate at frontier scale with large distributed models. This is a ground-up effort with rapidly evolving hardware and software. We are looking for an IC who can write and optimize low-level code for custom hardware, validate model architectures end-to-end, build test and profiling infrastructure, and drive performance across the stack. Key job responsibilities - Develop and optimize compute kernels for a custom ML accelerator architecture, targeting production-level performance for large language model inference. - Implement and validate LLM architectures (decoder-only, mixture-of-experts) end-to-end - from PyTorch model definition through distributed execution on custom hardware. - Integrate custom accelerator backends into open-source ML serving frameworks (vLLM, PyTorch), including scheduler extensions, memory management, and model parallelism. - Build and maintain test infrastructure for model correctness validation across CPU, GPU, simulator, and hardware targets. - Profile and optimize inference workloads - identify bottlenecks, instrument critical paths, and drive latency and throughput improvements from simulation through hardware bringup. - Own features end-to-end: from design through implementation, testing, and integration into the broader software stack. - Contribute to CI/CD pipelines that gate model and kernel changes on correctness and performance regressions.