Artificial Intelligence

Compute-Optimal Training

Allocating a fixed compute budget optimally between model size and training data quantity, based on scaling law research like the Chinchilla findings.

Why It Matters

Compute-optimal training prevents wasting millions on undertrained large models or overtrained small ones — getting the most capability per dollar.

Example

Given a $10M compute budget, determining whether to train a 30B model on 600B tokens or a 10B model on 2T tokens — Chinchilla scaling says the latter wins.

Think of it like...

Like optimizing a fixed travel budget between flight quality and hotel quality — the best trip comes from balancing both, not spending everything on one.

Compute-Optimal Training

Why It Matters

Example

Think of it like...

Related Terms

Chinchilla Scaling

Scaling Laws

Compute

Parameter