Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel rather than sequentially. Transformers are the foundation of modern LLMs like GPT, Claude, and Gemini.
Why It Matters
The transformer architecture revolutionized NLP and is now expanding to vision, audio, and multimodal AI. It is the most important architecture in modern AI.
Example
GPT-4, Claude, BERT, and virtually every modern language model is built on the transformer architecture.
Think of it like...
Like a speed reader who can look at an entire page at once and understand how every word relates to every other word, instead of reading one word at a time left to right.
Related Terms
Attention Mechanism
A component in neural networks that allows the model to focus on the most relevant parts of the input when producing each part of the output. It assigns different weights to different input elements based on their relevance.
Self-Attention
A mechanism where each element in a sequence attends to all other elements to compute a representation, determining how much focus to place on each part of the input. It is the core innovation of the transformer.
Encoder-Decoder
An architecture where the encoder compresses input into a fixed representation and the decoder generates output from that representation. This structure is used in translation, summarization, and image captioning.
Positional Encoding
A technique used in transformers to inject information about the position of each token in a sequence. Since transformers process all tokens in parallel, they need explicit position information.
BERT
Bidirectional Encoder Representations from Transformers — a language model developed by Google that reads text in both directions simultaneously. BERT excels at understanding language rather than generating it.
GPT
Generative Pre-trained Transformer — a family of large language models developed by OpenAI. GPT models are trained to predict the next token in a sequence and can generate coherent, contextually relevant text across many tasks.