AI Glossary

The definitive dictionary for AI, Machine Learning, and Governance terminology. From Flash Attention to RAG — look up any term.

D

Data Annotation Pipeline

An end-to-end workflow for producing labeled training data, from task design through annotator training, quality assurance, and delivery of labeled datasets.

Data Science

Data Augmentation

Techniques for artificially expanding a training dataset by creating modified versions of existing data. This helps models generalize better, especially when training data is limited.

Data Science

Data Drift

A change in the statistical properties of the input data over time compared to the data the model was trained on. When data drifts, model predictions become less reliable.

Data Science

Data Engineering

The practice of designing, building, and maintaining the systems and infrastructure that collect, store, and prepare data for analysis and machine learning.

Data Science

Data Labeling

The process of assigning meaningful tags or annotations to raw data so it can be used for supervised learning. Labels tell the model what the correct answer should be for each training example.

Data Science

Data Lake

A centralized repository that stores vast amounts of raw data in its native format until needed. Data lakes accept structured, semi-structured, and unstructured data at any scale.

Data Science

Data Lineage

The tracking of data's origins, transformations, and movements throughout its lifecycle. Data lineage answers the question 'Where did this data come from and what happened to it?'

Data Science

Data Mesh

A decentralized approach to data architecture where domain teams own and manage their own data as products, rather than centralizing all data in a single warehouse or lake.

Data Science

Data Pipeline

An automated workflow that extracts data from sources, transforms it through processing steps, and loads it into a destination for use. In ML, data pipelines ensure consistent data flow from raw sources to model training.

Data Science

Data Preprocessing

The process of cleaning, transforming, and organizing raw data into a format suitable for machine learning. This includes handling missing values, encoding categories, scaling features, and removing outliers.

Data Science

Data Quality

The degree to which data is accurate, complete, consistent, timely, and fit for its intended use. Data quality directly impacts the reliability and performance of AI models.

Data Science

Data Warehouse

A structured, organized repository of cleaned and processed data optimized for analysis and reporting. Unlike data lakes, data warehouses store data in defined schemas.

Data Science