Model Collapse
A phenomenon where AI models trained on AI-generated content progressively lose quality and diversity, eventually producing repetitive, low-quality outputs. Each generation of model degrades further.
Why It Matters
Model collapse is a growing concern as AI-generated content floods the internet. Future models trained on this data could lose the diversity that makes them useful.
Example
A text model trained on AI-generated text producing increasingly generic, homogeneous outputs — losing the nuance and variety present in human-written training data.
Think of it like...
Like photocopying a photocopy repeatedly — each copy loses a bit of detail until the final version is blurry and degraded beyond usefulness.
Related Terms
Training Data
The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.
Synthetic Data
Artificially generated data that mimics the statistical properties and patterns of real data. It is created using algorithms, simulations, or generative models rather than collected from real-world events.
Data Quality
The degree to which data is accurate, complete, consistent, timely, and fit for its intended use. Data quality directly impacts the reliability and performance of AI models.
Generative AI
AI systems that can create new content — text, images, music, code, video — rather than just analyzing or classifying existing data. These models learn patterns from training data and generate novel outputs that resemble the original data.