Recurrent Neural Network
A type of neural network designed for sequential data where the output at each step depends on previous steps. RNNs have a form of memory that allows them to process sequences like text, time series, and audio.
Why It Matters
RNNs were pivotal in advancing NLP and time-series analysis before transformers. Understanding them provides context for why transformers were such a breakthrough.
Example
A language model using an RNN to predict the next word in a sentence by considering all the words that came before it.
Think of it like...
Like reading a book where your understanding of each sentence depends on remembering what you read in previous sentences — context builds over time.
Related Terms
GRU
Gated Recurrent Unit — a simplified version of LSTM that uses fewer gates and parameters while achieving similar performance on many sequence tasks. It is faster to train than LSTM.
Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel rather than sequentially. Transformers are the foundation of modern LLMs like GPT, Claude, and Gemini.
Sequence-to-Sequence
A model architecture that transforms one sequence into another, where the input and output can be different lengths. It uses an encoder to process input and a decoder to generate output.
Vanishing Gradient Problem
A training difficulty in deep networks where gradients become exponentially smaller as they are propagated back through many layers, making it nearly impossible for early layers to learn.