I'm hiring: Software Engineer, RL Data at Anthropic
👉 Job board link
This is a foundational hire on a new team I'm leading, so you'd get to shape our technical direction and what we build first.
The team builds the systems that produce reinforcement learning data for Claude: data collection pipelines, human feedback tooling, the execution environments RL tasks run in, and the quality assurance that keeps training data trustworthy at scale.
We aim to differentially advance beneficial capabilities, including an expected focus on AI safety and security research and potentially beneficial deployments. We also expect to build QA that catches reward hacking and other forms of outer misalignment behaviours before they get reinforced in training. However, the overall shape of what we build is likely to be dual-use and is also likely to advance Claude's general capabilities.
I'm looking for a strong senior engineer preferably with full-stack experience, who will own things end-to-end - including the unglamorous bits. Experience with LLM pipelines, RL on LLMs, or time as a forward deployed engineer, founder, or early-startup engineer are all a bonus. Familiarity with AI safety and security problems, and a desire to work on safety-relevant problems is also a big plus.
The role can be based in London (with me 🙌), San Francisco, Seattle, or New York.
I've recently updated my working at Anthropic FAQs to cover some common questions you might have.