AI Glossary

The definitive dictionary for AI, Machine Learning, and Governance terminology. From Flash Attention to RAG — look up any term.

A

Accuracy

The percentage of correct predictions out of all predictions made by a model. While intuitive, accuracy can be misleading for imbalanced datasets.

Machine Learning

Activation Function

A mathematical function applied to the output of each neuron in a neural network that introduces non-linearity. Without activation functions, a neural network would just be a series of linear transformations.

Machine Learning

Active Learning

A training strategy where the model identifies the most informative unlabeled examples and requests human labels only for those. This minimizes labeling effort by focusing on the examples that matter most.

Machine Learning

Adam Optimizer

An adaptive optimization algorithm that combines momentum and adaptive learning rates for each parameter. Adam maintains running averages of both gradients and squared gradients.

Machine Learning

Adversarial Training

A defense technique where adversarial examples are included in the training data to make the model more robust against attacks. The model learns to handle both normal and adversarial inputs.

Machine Learning

Anomaly Detection

Techniques for identifying data points, events, or observations that deviate significantly from expected patterns. Anomalies can indicate fraud, equipment failure, security breaches, or other important events.

Machine Learning

Autoencoder

A neural network that learns to compress data into a lower-dimensional representation (encoding) and then reconstruct it back (decoding). It learns what features are most important for faithful reconstruction.

Machine Learning

AutoML

Automated Machine Learning — tools and techniques that automate the end-to-end process of applying machine learning, including feature engineering, model selection, and hyperparameter tuning.

Machine Learning

C

Catastrophic Forgetting

The tendency of neural networks to completely forget previously learned information when trained on new data or tasks. New learning overwrites old knowledge.

Machine Learning

Catastrophic Interference

When learning new information in a neural network severely disrupts previously learned knowledge. It is the underlying mechanism behind catastrophic forgetting.

Machine Learning

CatBoost

A gradient boosting library by Yandex that handles categorical features natively without requiring manual encoding. CatBoost also addresses prediction shift and target leakage.

Machine Learning

Causal Inference

Statistical methods for determining cause-and-effect relationships from data, going beyond correlation to understand whether X actually causes Y.

Machine Learning

Causal Language Model

A training approach where the model predicts the next token given only the preceding tokens (left-to-right). This is how GPT models are trained and is the basis for text generation.

Machine Learning

Classification

A type of supervised learning task where the model predicts which category or class an input belongs to. The output is a discrete label rather than a continuous value.

Machine Learning

Clustering

An unsupervised learning technique that groups similar data points together based on their characteristics, without predefined labels. The algorithm discovers natural groupings in the data.

Machine Learning

Cold Start Problem

The challenge of making recommendations for new users (who have no history) or new items (which have no ratings). Cold start is a fundamental difficulty in recommendation systems.

Machine Learning

Collaborative Filtering

A recommendation technique that predicts a user's interests based on the preferences of similar users. It assumes people who agreed in the past will agree again in the future.

Machine Learning

Concept Bottleneck

A model architecture that forces predictions through a set of human-interpretable concepts. The model first predicts concepts, then uses those concepts to make the final prediction.

Machine Learning

Confusion Matrix

A table that summarizes the performance of a classification model by showing true positives, true negatives, false positives, and false negatives. It reveals the types of errors a model makes.

Machine Learning

Confusion Matrix Metrics

The set of performance metrics derived from the confusion matrix including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).

Machine Learning

Content-Based Filtering

A recommendation technique that suggests items similar to those a user has previously liked, based on the items' features and attributes rather than other users' behavior.

Machine Learning

Context Distillation

A technique where the behavior of a model prompted with detailed instructions is distilled into a model that exhibits the same behavior without the instructions.

Machine Learning

Contextual Bandits

An extension of multi-armed bandits where the agent observes context (features) before making a decision, enabling personalized choices based on the current situation.

Machine Learning

Continual Learning

Training a model on new data or tasks over time without forgetting previously learned knowledge. Also called lifelong learning or incremental learning.

Machine Learning

Continual Pre-Training

Extending a pre-trained model's training on new domain-specific data without starting from scratch. It adapts the model to a new domain while preserving general capabilities.

Machine Learning

Contrastive Learning

A self-supervised technique where the model learns by comparing similar (positive) and dissimilar (negative) pairs of examples. It learns representations where similar items are close and different items are far apart.

Machine Learning

Convolutional Neural Network

A type of neural network specifically designed for processing grid-like data such as images. CNNs use convolutional layers that apply filters to detect patterns like edges, textures, and shapes at different scales.

Machine Learning

Cosine Similarity

A metric that measures the similarity between two vectors by calculating the cosine of the angle between them. Values range from -1 (opposite) to 1 (identical), with 0 meaning unrelated.

Machine Learning

Cross-Encoder

A model that takes two texts as input simultaneously and outputs a relevance or similarity score. Unlike bi-encoders, cross-encoders consider the full interaction between both texts.

Machine Learning

Cross-Entropy

A loss function commonly used in classification tasks that measures the difference between the predicted probability distribution and the actual distribution. Lower cross-entropy means better predictions.

Machine Learning

Cross-Validation

A model evaluation technique that splits data into multiple folds, trains on some folds and tests on the held-out fold, repeating so every fold serves as the test set. It provides a robust estimate of model performance.

Machine Learning

Curriculum Learning

A training strategy inspired by human education where the model is exposed to training examples in a meaningful order — starting with easier examples and gradually increasing difficulty.

Machine Learning

D

Data Parallelism

A distributed training approach where the training data is split across multiple GPUs, each holding a complete copy of the model. Gradients are averaged across GPUs after each batch.

Machine Learning

Decision Tree

A supervised learning algorithm that makes predictions by learning a series of if-then-else decision rules from the data. It creates a tree-like structure where each internal node tests a feature and each leaf provides a prediction.

Machine Learning

Deep Learning

A specialized subset of machine learning that uses artificial neural networks with multiple layers (hence 'deep') to learn complex patterns in data. Deep learning excels at tasks like image recognition, speech processing, and natural language understanding.

Machine Learning

Dimensionality Reduction

Techniques that reduce the number of features (dimensions) in a dataset while preserving the most important information. This makes data easier to visualize, speeds up training, and can improve model performance.

Machine Learning

Distributed Training

Splitting model training across multiple GPUs or machines to handle larger models or datasets and reduce training time. Techniques include data parallelism and model parallelism.

Machine Learning

DPO

Direct Preference Optimization — a simpler alternative to RLHF that directly optimizes a language model from human preference data without needing a separate reward model. It is more stable and easier to implement.

Machine Learning

Dropout

A regularization technique where random neurons are temporarily disabled (dropped out) during each training step. This forces the network to not rely too heavily on any single neuron and builds redundancy.

Machine Learning

E

Early Stopping

A regularization technique where training is halted when the model's performance on validation data stops improving, even if training loss continues to decrease. It prevents overfitting by finding the optimal training duration.

Machine Learning

Elastic Weight Consolidation

A technique for continual learning that identifies which weights are important for previously learned tasks and penalizes changes to those weights during new learning.

Machine Learning

Embedding Fine-Tuning

Adapting a pre-trained embedding model to a specific domain or task by further training it on domain-specific data, improving retrieval quality for specialized applications.

Machine Learning

Ensemble Learning

A strategy that combines multiple models to produce better predictions than any single model alone. Ensemble methods leverage the diversity of different models to reduce errors.

Machine Learning

Epoch

One complete pass through the entire training dataset during model training. Models typically require multiple epochs to learn effectively, with each pass refining the model's understanding.

Machine Learning

Exploding Gradient

A training problem where gradients become extremely large during backpropagation, causing weight updates to be so drastic that the model becomes unstable and training diverges.

Machine Learning

Exploration vs Exploitation

The fundamental tradeoff in reinforcement learning between trying new actions (exploration) to discover potentially better strategies and using known good actions (exploitation) to maximize current reward.

Machine Learning

G

Generalization

A model's ability to perform well on new, unseen data that was not part of its training set. Generalization is the ultimate goal of machine learning — learning patterns, not memorizing examples.

Machine Learning

Gradient Accumulation

A technique that simulates larger batch sizes by accumulating gradients over multiple forward passes before performing a single weight update. This enables large effective batch sizes on limited hardware.

Machine Learning

Gradient Boosting

An ensemble technique that builds models sequentially, where each new model focuses on correcting the errors made by previous models. It combines many weak learners into a single strong learner.

Machine Learning

Gradient Clipping

A technique that caps gradient values at a maximum threshold during training to prevent exploding gradients. If a gradient exceeds the threshold, it is scaled down.

Machine Learning

Gradient Descent

An optimization algorithm used to minimize the error (loss) of a model by iteratively adjusting parameters in the direction that reduces the loss most quickly. It is the primary method for training machine learning models.

Machine Learning

Graph Neural Network

A type of neural network designed to operate on graph-structured data (nodes and edges). GNNs learn representations of nodes, edges, or entire graphs by aggregating information from neighbors.

Machine Learning

GRU

Gated Recurrent Unit — a simplified version of LSTM that uses fewer gates and parameters while achieving similar performance on many sequence tasks. It is faster to train than LSTM.

Machine Learning

L

Latent Space

A compressed, lower-dimensional representation of data learned by a model. Points in latent space capture the essential features of the data, and nearby points represent similar data items.

Machine Learning

Layer Normalization

A normalization technique that normalizes the inputs across the features for each individual example (rather than across the batch). It stabilizes training in transformers and RNNs.

Machine Learning

Learning Rate

A hyperparameter that controls how much the model's weights are adjusted in response to errors during each training step. It determines the size of the steps taken during gradient descent optimization.

Machine Learning

LightGBM

Light Gradient Boosting Machine — Microsoft's gradient boosting framework optimized for speed and efficiency. LightGBM uses histogram-based splitting and leaf-wise growth for faster training.

Machine Learning

LIME

Local Interpretable Model-agnostic Explanations — a technique that explains individual predictions by approximating the complex model locally with a simple, interpretable model.

Machine Learning

Linear Regression

The simplest regression algorithm that models the relationship between input features and a continuous output as a straight line (or hyperplane in multiple dimensions). It minimizes the sum of squared errors.

Machine Learning

Logistic Regression

A classification algorithm that uses the sigmoid function to predict the probability of a binary outcome. Despite its name containing 'regression,' it is used for classification tasks.

Machine Learning

Long Short-Term Memory

A type of recurrent neural network designed to learn long-term dependencies through special gating mechanisms that control information flow. LSTMs address the vanishing gradient problem of standard RNNs.

Machine Learning

LoRA

Low-Rank Adaptation — a parameter-efficient fine-tuning technique that freezes the original model weights and adds small trainable matrices to each layer. It dramatically reduces the compute and memory needed for fine-tuning.

Machine Learning

Loss Function

A mathematical function that measures how far a model's predictions are from the actual correct values. The goal of training is to minimize this loss function, making predictions as accurate as possible.

Machine Learning

M

Machine Learning

A subset of AI where systems learn patterns from data and improve their performance over time without being explicitly programmed for every scenario. ML algorithms build mathematical models from training data to make predictions or decisions.

Machine Learning

Masked Language Model

A training approach where random tokens in the input are replaced with a special [MASK] token and the model learns to predict the original tokens from context. This is how BERT was pre-trained.

Machine Learning

Meta-Learning

An approach where models 'learn to learn' — they are trained across many tasks so they can quickly adapt to new tasks with minimal data. Also called learning to learn.

Machine Learning

Mixed Precision Training

Training neural networks using a combination of 16-bit and 32-bit floating-point numbers to speed up computation and reduce memory usage while maintaining model accuracy.

Machine Learning

Model Distillation Pipeline

An end-to-end workflow for transferring knowledge from a large teacher model to a smaller student model, including data generation, training, evaluation, and deployment.

Machine Learning

Model Merging

Combining the weights of multiple fine-tuned models into a single model that inherits capabilities from all source models, without additional training.

Machine Learning

Model Parallelism

A distributed training approach where the model itself is split across multiple GPUs, with each GPU holding and computing a different portion of the model.

Machine Learning

Momentum

An optimization technique that accelerates gradient descent by accumulating a velocity vector in the direction of persistent gradients, helping overcome local minima and noisy gradients.

Machine Learning

Multi-Armed Bandit

A simplified reinforcement learning problem where an agent must choose between multiple options (arms) with unknown payoffs, balancing exploration of new options with exploitation of known good ones.

Machine Learning

P

Parameter

Any learnable value in a machine learning model that is adjusted during training. Parameters include weights and biases in neural networks. Model size is often described by parameter count.

Machine Learning

Perceptron

The simplest form of a neural network — a single neuron that takes weighted inputs, sums them, and applies an activation function to produce an output. It is the fundamental building block of neural networks.

Machine Learning

Perplexity

A metric that measures how well a language model predicts text. Lower perplexity indicates the model is less 'surprised' by the text, meaning it can predict the next token more accurately.

Machine Learning

Pre-training

The initial phase of training a model on a large, general-purpose dataset before specializing it for specific tasks. Pre-training gives the model broad knowledge and capabilities.

Machine Learning

Precision

Of all the items the model predicted as positive, the proportion that were actually positive. Precision measures how trustworthy the model's positive predictions are.

Machine Learning

Preference Optimization

Training techniques that directly optimize models based on human preference data, where humans indicate which of two model outputs they prefer.

Machine Learning

Principal Component Analysis

A dimensionality reduction technique that transforms data into a new coordinate system where the first axis captures the most variance, the second axis the next most, and so on.

Machine Learning

Prompt Tuning

A parameter-efficient fine-tuning technique that prepends learnable 'soft prompt' tokens to the input while keeping the main model weights frozen. Only the soft prompt parameters are trained.

Machine Learning

Pruning

A model compression technique that removes unnecessary or redundant weights, neurons, or layers from a trained neural network. Like pruning a plant, it removes parts that are not contributing to overall health.

Machine Learning

R

Random Forest

An ensemble learning method that builds multiple decision trees during training and outputs the majority vote (classification) or average prediction (regression) of all the trees. The 'forest' of diverse trees is more robust than any single tree.

Machine Learning

Recall

Of all the actually positive items in the dataset, the proportion that the model correctly identified. Recall measures how completely the model finds all relevant items.

Machine Learning

Recurrent Neural Network

A type of neural network designed for sequential data where the output at each step depends on previous steps. RNNs have a form of memory that allows them to process sequences like text, time series, and audio.

Machine Learning

Regression

A type of supervised learning task where the model predicts a continuous numerical value rather than a discrete category. The output can be any number within a range.

Machine Learning

Regularization

Techniques used to prevent overfitting by adding constraints or penalties to the model during training. Regularization discourages the model from becoming too complex or fitting noise in the training data.

Machine Learning

Reinforcement Learning

A type of machine learning where an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties. The agent aims to maximize cumulative reward over time through trial and error.

Machine Learning

Reinforcement Learning from AI Feedback

A variant of RLHF where AI models (instead of humans) provide the feedback used to train reward models and align language models. RLAIF reduces the cost and scalability constraints of human feedback.

Machine Learning

ReLU

Rectified Linear Unit — the most commonly used activation function in deep learning. It outputs the input directly if positive, and zero otherwise: f(x) = max(0, x).

Machine Learning

Representation Learning

The process of automatically discovering useful features or representations from raw data, rather than manually engineering them. Deep learning excels at learning hierarchical representations.

Machine Learning

Residual Connection

A shortcut that allows the input to a layer to bypass one or more layers and be added directly to the output. This enables training of much deeper networks by ensuring gradient flow.

Machine Learning

Retraining

The process of training a model again on updated data to restore or improve its performance. Retraining addresses model drift and incorporates new patterns the original model did not learn.

Machine Learning

Retrieval-Augmented Fine-Tuning

Combining fine-tuning with retrieval capabilities, training a model to effectively use retrieved context. RAFT teaches the model when and how to leverage external knowledge.

Machine Learning

Reward Model

A model trained to predict how good a response is based on human preferences. In RLHF, the reward model scores outputs to guide the language model toward responses humans prefer.

Machine Learning

Reward Modeling

Training a separate model to predict human preferences, which then serves as the reward signal for reinforcement learning. The reward model learns what humans consider 'good' responses.

Machine Learning

Reward Shaping

The practice of designing intermediate rewards to guide a reinforcement learning agent toward desired behavior, rather than only providing reward at the final goal state.

Machine Learning

RLHF

Reinforcement Learning from Human Feedback — a technique used to align language models with human preferences. Human raters rank model outputs, and this feedback trains a reward model that guides further training.

Machine Learning

S

Self-Supervised Learning

A training approach where the model generates its own labels from the data, typically by masking or hiding parts of the input and learning to predict them. No human-annotated labels are needed.

Machine Learning

Sentence Transformers

A framework for computing dense vector representations (embeddings) for sentences and paragraphs. Built on top of transformer models and optimized for semantic similarity tasks.

Machine Learning

SHAP

SHapley Additive exPlanations — a method based on game theory that explains individual predictions by calculating each feature's contribution to the prediction. SHAP values are additive and consistent.

Machine Learning

Sigmoid

An activation function that squashes input values into a range between 0 and 1, creating an S-shaped curve. It is commonly used for binary classification outputs and in certain neural network architectures.

Machine Learning

Softmax

A function that converts a vector of numbers into a probability distribution, where each value is between 0 and 1 and all values sum to 1. It is typically used as the final layer in classification models.

Machine Learning

Stochastic

Involving randomness or probability. In ML, stochastic processes include random weight initialization, stochastic gradient descent, and probabilistic sampling during text generation.

Machine Learning

Stochastic Gradient Descent

A variant of gradient descent that updates model parameters using a single random training example (or small batch) at each step instead of the entire dataset. It is faster and can escape local minima.

Machine Learning

Supervised Learning

A type of machine learning where the model is trained on labeled data — input-output pairs where the correct answer is provided. The model learns to map inputs to outputs and can then predict outputs for new, unseen inputs.

Machine Learning

Support Vector Machine

A classification algorithm that finds the optimal hyperplane (decision boundary) that maximizes the margin between different classes. SVMs are effective in high-dimensional spaces.

Machine Learning