Gradient Descent
An optimization algorithm used to minimize the error (loss) of a model by iteratively adjusting parameters in the direction that reduces the loss most quickly. It is the primary method for training machine learning models.
Why It Matters
Gradient descent is the engine that powers model training. Variants like Adam and SGD determine how quickly and effectively models learn.
Example
Imagine you are blindfolded on a hilly terrain trying to reach the lowest valley — you feel the slope under your feet and take steps downhill until you reach the bottom.
Think of it like...
Like rolling a ball down a bumpy landscape — it naturally rolls toward the lowest point, just as gradient descent moves model parameters toward the lowest error.
Related Terms
Backpropagation
The primary algorithm used to train neural networks. It calculates how much each weight in the network contributed to the error, then adjusts weights backward from the output layer to reduce future errors.
Loss Function
A mathematical function that measures how far a model's predictions are from the actual correct values. The goal of training is to minimize this loss function, making predictions as accurate as possible.
Learning Rate
A hyperparameter that controls how much the model's weights are adjusted in response to errors during each training step. It determines the size of the steps taken during gradient descent optimization.
Stochastic Gradient Descent
A variant of gradient descent that updates model parameters using a single random training example (or small batch) at each step instead of the entire dataset. It is faster and can escape local minima.
Adam Optimizer
An adaptive optimization algorithm that combines momentum and adaptive learning rates for each parameter. Adam maintains running averages of both gradients and squared gradients.