Machine Learning

Vanishing Gradient Problem

A training difficulty in deep networks where gradients become exponentially smaller as they are propagated back through many layers, making it nearly impossible for early layers to learn.

Why It Matters

The vanishing gradient problem limited neural networks to shallow architectures for decades. Solutions like ReLU, LSTMs, and residual connections unlocked deep learning.

Example

In a 50-layer network, the gradient signal reaching the first layer might be 0.0000001 — effectively zero — meaning those early layers never update their weights.

Think of it like...

Like playing telephone with 50 people — the message gets so distorted by the end that the first person cannot effectively adjust what they said based on the final output.

Related Terms