300,000 people are directly creating training data for AI
As of April 2025, at least 300,000 people work directly on creating training data for AI systems. This is roughly comparable to a small island nation - between Barbados and Vanuatu.
The true number is likely higher, as it seems like many of these statistics might be a year out of date. My guess would be the true number is something like 400,000 to 500,000: between Iceland and Malta.
Hanging out at a Barbados beach bar sounds more fun than data annotation to me, but more people do the latter. Image by Unionville, under CC-0.
This is the total number of contractors, many of whom will be part-time. I have not attempted to evaluate the full-time equivalents of these.
Providers
Scale AI
Scale AI, which provides training data for OpenAI, Anthropic, Google Deepmind and Meta1OpenAI and Meta are listed openly on Scale AI’s website. Scale AI contractors discuss clients on Reddit and their project codenames, with multiple credible sources suggesting the following mappings:
- Meta: Flamingo
- ChatGPT: Ostrich
- Google DeepMind: Bulba, Dolphin
- Anthropic: Alpaca
- Remotasks: “240,000+ total taskers”
- Outlier: “40,000 experts”
These numbers do seem accurate, based on:
- Conversations I’ve had with some people who contract for Scale. They said a few Slack groups2
The “Experts Project Support” and “Data Collectors Team” Slacks.
they were added to had hundreds of thousands of members (but they’ve since been moved to Discourse). - A June 2024 article by The Information said “About 300,000 [taskers] take assignments through a Slack group run by Outlier, a Scale subsidiary”. The article suggests this might also be from inside sources.
- The unofficial outlier subreddit has 44k members. This suggests Outlier must have a lot more contractors than 40,000 given not all of them will have joined the subreddit.
All these estimates are several months old - and the websites have not been updated for a while (based on internet archive snapshots). I expect the true numbers to be higher with the increased investment going into AI.
Prolific
Prolific claims to have “200,000+ active taskers” for AI data annotation projects. However it’s unclear whether these are actually all working on AI projects as prolific does a range of different data work.
The unofficial Prolific subreddit has 43k members.
Surge AI
Surge AI, operating under the DataAnnotation brand, say they have “100k+ Members”. They hire contractors in the USA, Canada, Australia, New Zealand, UK, and Ireland, and work with OpenAI, Anthropic, and Google DeepMind.
The unofficial subreddit has 31k members, which supports this scale claim.
LabelBox
LabelBox, operating under the brand Alignner has an unofficial subreddit with 14k members. They hire contractors in the USA, Canada, Australia, and New Zealand.
I did not find a public claim from them about number of taskers, but we can estimate this using their subreddit numbers. We can use the claimed taskers to redditor ratios for Prolific (4.7 taskers/redditor) and Surge AI (3.2 taskers/redditor) and assume it’s about the same here for number of taskers - maybe 4 taskers/redditor. This results in 56k estimated taskers.
Aggregating the above
Summing the above gets us about 280k + 200k + 100k + 56k = 636k taskers.
However, it’s likely that many contractors sign up to multiple platforms. The most conservative estimate would therefore say that we should take the maximum rather than sum here - pointing towards Scale AI with 280,000 contractors.
In practice not every contractor will also have registered with Scale AI - and Scale’s public numbers are likely an underestimate for the reasons above. I think this gives us at least 300k as a lower bound. My best guess is 400k-500k.
Footnotes
-
OpenAI and Meta are listed openly on Scale AI’s website. Scale AI contractors discuss clients on Reddit and their project codenames, with multiple credible sources suggesting the following mappings:
- Meta: Flamingo
- ChatGPT: Ostrich
- Google DeepMind: Bulba, Dolphin
- Anthropic: Alpaca
-
The “Experts Project Support” and “Data Collectors Team” Slacks. ↩