Chinchilla Scaling
Research by DeepMind showing that many LLMs were significantly undertrained — for a given compute budget, training a smaller model on more data yields better performance.
Why It Matters
Chinchilla changed how the industry allocates compute, shifting focus toward training smaller models longer rather than building ever-larger models.
Example
Chinchilla 70B outperforming the much larger Gopher 280B by being trained on 4x more data — demonstrating that data quantity matters as much as model size.
Think of it like...
Like discovering that running 5 miles daily is better training than running 20 miles once a week with the same total effort — distribution matters.
Related Terms
Scaling Laws
Empirical findings showing predictable relationships between model performance and factors like model size (parameters), dataset size, and compute budget. Performance improves as a power law with these factors.
Compute
The computational resources (processing power, memory, time) required to train or run AI models. Compute is measured in FLOPs (floating-point operations) and is a primary constraint and cost in AI development.
Training Data
The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.
Parameter
Any learnable value in a machine learning model that is adjusted during training. Parameters include weights and biases in neural networks. Model size is often described by parameter count.