Positional Encoding
A technique used in transformers to inject information about the position of each token in a sequence. Since transformers process all tokens in parallel, they need explicit position information.
Why It Matters
Without positional encoding, a transformer would treat 'the dog bit the man' and 'the man bit the dog' identically. Position information is essential for language understanding.
Example
Adding sinusoidal position values to token embeddings so that the model knows that 'cat' is the 3rd word in one sentence but the 7th word in another.
Think of it like...
Like page numbers in a book — the words alone do not tell you what order they are in, so you need an explicit numbering system to maintain sequence.
Related Terms
Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel rather than sequentially. Transformers are the foundation of modern LLMs like GPT, Claude, and Gemini.
Self-Attention
A mechanism where each element in a sequence attends to all other elements to compute a representation, determining how much focus to place on each part of the input. It is the core innovation of the transformer.
Embedding
A numerical representation of data (text, images, etc.) as a vector of numbers in a high-dimensional space. Similar items are placed closer together in this space, enabling machines to understand semantic relationships.
Attention Mechanism
A component in neural networks that allows the model to focus on the most relevant parts of the input when producing each part of the output. It assigns different weights to different input elements based on their relevance.