Embedding Drift
Changes in embedding vector distributions over time as the underlying data, vocabulary, or user behavior shifts. Drift degrades retrieval quality in RAG and search systems.
Why It Matters
Embedding drift silently degrades RAG quality. Regular monitoring and re-embedding are necessary to maintain search relevance.
Example
Product descriptions updated over six months cause the pre-computed embeddings to become stale — new searches return increasingly irrelevant results.
Think of it like...
Like a GPS map that has not been updated — the roads (data) have changed but the map (embeddings) still shows the old layout.
Related Terms
Data Drift
A change in the statistical properties of the input data over time compared to the data the model was trained on. When data drifts, model predictions become less reliable.
Embedding
A numerical representation of data (text, images, etc.) as a vector of numbers in a high-dimensional space. Similar items are placed closer together in this space, enabling machines to understand semantic relationships.
Vector Database
A specialized database designed to store, index, and search high-dimensional vector embeddings efficiently. It enables fast similarity searches across millions or billions of vectors.
Model Monitoring
The practice of continuously tracking an ML model's performance, predictions, and input data in production to detect degradation, drift, or anomalies after deployment.
Retrieval-Augmented Generation
A technique that enhances LLM outputs by first retrieving relevant information from external knowledge sources and then using that information as context for generation. RAG combines the power of search with the fluency of language models.