About this role
Location: San Francisco, CA
Work Model: Hybrid
Industry: AI training data infrastructure
Compensation: $140K-$250K base, plus equity
About the CompanyOur partner is a YC-backed company building a new kind of marketplace in the AI training data space. Rather than operating as a labor marketplace, they provide infrastructure that lets data producers transform their existing data into formats AI labs want and sell it directly to those labs. This democratized model unlocks far more high-value data sources, and the team is growing quickly to keep up with demand.
The OpportunityThis is the company's top hiring priority and a genuinely hard research problem. Because data flows through a decentralized marketplace, ensuring quality at scale is the single biggest bottleneck to growth. As a Research Engineer, you will build the automated systems that verify and assure data quality so that suppliers consistently deliver excellent data to buyers.
You will start by digging into the data manually to understand failure modes, then design systems to automate quality checks at scale, combining rule-based approaches with AI for fuzzier cases and human-in-the-loop review where it makes sense. This is fundamentally a research role focused on building automated systems, not manual QA.
ResponsibilitiesIdentify data quality issues including inconsistencies, formatting problems, and ingestion challengesPerform initial manual data quality review to deeply understand failure modesBuild systems to automate quality checks at scale using rule-based and AI-driven approachesDesign hybrid systems that balance automation with human-in-the-loop review where appropriateContinuously improve verification methods as the data landscape and AI tooling evolveRequirementsDeeply technical, with a strong learning slope and the ability to ramp quickly in a fast-moving fieldBackground in AI/ML engineering, or software engineering at an AI-focused company with visible data ingestion and processing experienceAbility to reason about likely data quality problems from first principlesComfortable owning ambiguous, open-ended problems end to endComfortable working in person, full-time, in a San Francisco officeBonus: experience working with noisy or unstructured data, or judgment on when to use automation versus human-in-the-loop review
