Software Engineer (RL Data)
• Anthropic’s RL Data team builds the systems that produce high-quality reinforcement learning data for Claude: data collection pipelines, human feedback tooling, the execution environments RL tasks run in, and the quality assurance that keeps training data trustworthy at scale
- Our goal is to make Claude genuinely great at complex, real-world work — and to point those capabilities at the things that matter most, including AI safety research and beneficial deployments of AI. (To be upfront: this is dual-use work — it advances general capabilities too, though we aim to differentially advance the beneficial ones.)
- This is a foundational role on a new team: you’ll help shape our technical direction and what we build first
- The work is hands-on and varied
- Some weeks you’ll be deep in pipeline or infrastructure engineering; others you’ll be tuning prompts until the output is good, or sitting with a research team that depends on your systems and shipping the fixes they need
- We’re looking for strong engineers who will also do whatever else it takes to make their systems succeed — reading transcripts, supporting users, and wrangling vendors
- Own significant parts of our stack end-to-end, from technical architecture through the unglamorous operational work that makes it succeed
- Build data collection pipelines, read the transcripts they produce, and iterate on prompts, evals, and graders until the output is good
- Develop and improve QA frameworks to catch reward hacking and ensure environment quality
- Build interfaces that make collecting human data fast and painless for the people providing it
- Harden execution environments — sandboxing, snapshotting, tool coverage — so tasks hold up at training scale
- Embed with the teams and domain experts who use our systems day-to-day: design pipelines and evals with them, support them directly, and ship the improvements they need
- Work with operations, security, and compliance partners to roll our systems out to new users, and manage technical relationships with external data vendors
- Representative projects
- Take QA checks that a model has learned to game, and make them hold up under heavy optimization pressure
- Build a review flow that lets a busy expert check an RL task in under five minutes
- Cut the time from ‘rough task idea’ to ‘QA-passed RL task’ from days to hours
- Sit for a week with a team that uses our platform, then ship the fixes that help them most
- Harden a sandboxed environment so tasks behave correctly across millions of rollouts
- Onboard a new data vendor, and fix the rough edges they hit
Benefits
- Comprehensive health, dental, and vision insurance for you and your dependents
- Inclusive fertility benefits via Carrot Fertility
- 22 weeks of paid parental leave
- Flexible paid time off and absence policies
- Mental health support for you and your dependents
- Competitive salary and equity packages
- Optional equity donation matching at a 1:1 ratio, up to 25% of your equity grant
- Retirement plans with competitive matching
- Life and income protection plans
- $500/month flexible wellness and time saver stipend
- Commuter benefits
- Annual education stipend
- Home office stipends
- Relocation support for those moving for Anthropic
- Daily meals and snacks in the office
Apply To This Job