Crowdsourcing
Using a large group of distributed workers (often through platforms like Amazon Mechanical Turk or Scale AI) to perform data annotation and labeling tasks.
Why It Matters
Crowdsourcing enables annotation at scale but introduces quality challenges. Managing crowd worker quality is a critical skill in ML data operations.
Example
Using Scale AI to distribute 100,000 image labeling tasks across thousands of workers, with quality checks and redundant labeling to ensure accuracy.
Think of it like...
Like crowdfunding but for labor — instead of one expert spending months, hundreds of workers each contribute a small piece, completing the job quickly.
Related Terms
Annotation
The process of adding labels, tags, or metadata to raw data to make it suitable for supervised machine learning. Annotation can involve labeling images, transcribing audio, or tagging text.
Data Labeling
The process of assigning meaningful tags or annotations to raw data so it can be used for supervised learning. Labels tell the model what the correct answer should be for each training example.
Active Learning
A training strategy where the model identifies the most informative unlabeled examples and requests human labels only for those. This minimizes labeling effort by focusing on the examples that matter most.
Data Quality
The degree to which data is accurate, complete, consistent, timely, and fit for its intended use. Data quality directly impacts the reliability and performance of AI models.