GRU
Gated Recurrent Unit — a simplified version of LSTM that uses fewer gates and parameters while achieving similar performance on many sequence tasks. It is faster to train than LSTM.
Why It Matters
GRUs offer a practical alternative to LSTMs with simpler architecture and faster training, making them a good default choice for sequence modeling when transformers are overkill.
Example
A GRU processing stock price sequences to predict next-day movements, using its update and reset gates to determine what historical information is relevant.
Think of it like...
Like a simplified version of a filing system — instead of three separate bins for keeping, adding, and sharing, it combines some functions for efficiency.
Related Terms
Recurrent Neural Network
A type of neural network designed for sequential data where the output at each step depends on previous steps. RNNs have a form of memory that allows them to process sequences like text, time series, and audio.
Sequence-to-Sequence
A model architecture that transforms one sequence into another, where the input and output can be different lengths. It uses an encoder to process input and a decoder to generate output.
Vanishing Gradient Problem
A training difficulty in deep networks where gradients become exponentially smaller as they are propagated back through many layers, making it nearly impossible for early layers to learn.