Batch Size
The number of training examples processed together before the model updates its parameters. Batch size affects training speed, memory usage, and how smoothly the model learns.
Why It Matters
Batch size is a critical trade-off between training speed, memory constraints, and model quality. Larger batches train faster but may generalize less well.
Example
Processing 32 images at a time through a neural network, computing the average error across all 32, then updating the weights once — that is a batch size of 32.
Think of it like...
Like grading papers — you could adjust your rubric after each paper (batch size 1) or after reading the whole stack (full batch), each approach giving different feedback quality.
Related Terms
Epoch
One complete pass through the entire training dataset during model training. Models typically require multiple epochs to learn effectively, with each pass refining the model's understanding.
Learning Rate
A hyperparameter that controls how much the model's weights are adjusted in response to errors during each training step. It determines the size of the steps taken during gradient descent optimization.
Stochastic Gradient Descent
A variant of gradient descent that updates model parameters using a single random training example (or small batch) at each step instead of the entire dataset. It is faster and can escape local minima.
Gradient Descent
An optimization algorithm used to minimize the error (loss) of a model by iteratively adjusting parameters in the direction that reduces the loss most quickly. It is the primary method for training machine learning models.