AI Glossary
The definitive dictionary for AI, Machine Learning, and Governance terminology. From Flash Attention to RAG — look up any term.
M
Machine Learning
A subset of AI where systems learn patterns from data and improve their performance over time without being explicitly programmed for every scenario. ML algorithms build mathematical models from training data to make predictions or decisions.
Machine Translation
The use of AI to automatically translate text or speech from one language to another. Modern neural machine translation uses transformer models and achieves near-human quality for many language pairs.
Masked Language Model
A training approach where random tokens in the input are replaced with a special [MASK] token and the model learns to predict the original tokens from context. This is how BERT was pre-trained.
Meta-Learning
An approach where models 'learn to learn' — they are trained across many tasks so they can quickly adapt to new tasks with minimal data. Also called learning to learn.
Minimum Viable AI
The simplest AI solution that delivers enough value to validate a use case. It prioritizes fast learning over comprehensive features, following lean startup principles.
Mistral
A French AI company and their family of efficient, high-performance open-weight language models. Mistral models are known for strong performance relative to their size.
Mixed Precision Training
Training neural networks using a combination of 16-bit and 32-bit floating-point numbers to speed up computation and reduce memory usage while maintaining model accuracy.
Mixture of Agents
An architecture where multiple different AI models collaborate on a task, with each model contributing its strengths. A routing or aggregation layer combines their outputs.
Mixture of Depths
A transformer architecture where different tokens use different numbers of layers, allowing the model to spend more computation on complex tokens and less on simple ones.
Mixture of Experts
An architecture where a model consists of multiple specialized sub-networks (experts) and a gating mechanism that routes each input to only the most relevant experts. Only a fraction of the total parameters are active per input.
Mixture of Modalities
AI architectures that natively process and generate multiple data types within a single unified model, rather than using separate models connected together.
MLOps
Machine Learning Operations — the set of practices that combine ML, DevOps, and data engineering to deploy and maintain ML models in production reliably and efficiently.
Model Card
A standardized document that accompanies a machine learning model, describing its intended use, performance metrics, limitations, training data, ethical considerations, and potential biases.
Model Collapse
A phenomenon where AI models trained on AI-generated content progressively lose quality and diversity, eventually producing repetitive, low-quality outputs. Each generation of model degrades further.
Model Context Protocol
An open protocol that standardizes how AI models connect to external tools, data sources, and services. MCP provides a universal interface for LLMs to access context from any compatible system.
Model Distillation Pipeline
An end-to-end workflow for transferring knowledge from a large teacher model to a smaller student model, including data generation, training, evaluation, and deployment.
Model Drift
The gradual degradation of a model's predictive performance over time as the real-world environment changes. Model drift can be caused by data drift, concept drift, or both.
Model Evaluation Pipeline
An automated system that runs a comprehensive suite of evaluations on AI models, generating reports on accuracy, safety, bias, robustness, and other quality dimensions.
Model Governance
The policies, processes, and tools for managing AI models throughout their lifecycle — from development through deployment to retirement. It ensures models remain compliant, fair, and performant.
Model Hub
A platform for hosting, discovering, and sharing pre-trained AI models. Model hubs provide standardized access to thousands of models across different tasks and architectures.
Model Interpretability Tool
Software tools that help understand how ML models make predictions, including feature importance, attention visualization, counterfactual explanations, and decision path analysis.
Model Merging
Combining the weights of multiple fine-tuned models into a single model that inherits capabilities from all source models, without additional training.
Model Monitoring
The practice of continuously tracking an ML model's performance, predictions, and input data in production to detect degradation, drift, or anomalies after deployment.
Model Parallelism
A distributed training approach where the model itself is split across multiple GPUs, with each GPU holding and computing a different portion of the model.
Model Registry
A centralized repository for storing, versioning, and managing trained ML models along with their metadata (metrics, parameters, lineage). It serves as the system of record for models.
Model Serving
The infrastructure and process of deploying trained ML models to production where they can receive requests and return predictions in real time. It includes scaling, load balancing, and version management.
Model Size
The number of parameters in a model, typically expressed in millions (M) or billions (B). Model size correlates loosely with capability but also determines compute and memory requirements.
Model Weights
The collection of all learned parameter values in a neural network. Model weights are what you download when you get a pre-trained model — they encode everything the model learned.
Momentum
An optimization technique that accelerates gradient descent by accumulating a velocity vector in the direction of persistent gradients, helping overcome local minima and noisy gradients.
Multi-Agent System
An architecture where multiple AI agents collaborate, each with specialized roles or capabilities, to accomplish complex tasks that no single agent could handle alone.
Multi-Armed Bandit
A simplified reinforcement learning problem where an agent must choose between multiple options (arms) with unknown payoffs, balancing exploration of new options with exploitation of known good ones.
Multi-Head Attention
An extension of attention where multiple attention mechanisms (heads) run in parallel, each learning to focus on different types of relationships in the data. The outputs are then combined.
Multilingual AI
AI models capable of understanding and generating text in multiple languages. Modern LLMs often support 50-100+ languages, though performance varies significantly across languages.
Multimodal AI
AI systems that can process and generate multiple types of data — text, images, audio, video — within a single model. Multimodal models understand the relationships between different data types.
Multimodal Embedding
Embeddings that map different data types (text, images, audio) into the same vector space, enabling cross-modal search and comparison.
Multimodal RAG
Retrieval-augmented generation that works across multiple data types — retrieving and reasoning over text, images, tables, and charts to answer questions that require multimodal understanding.
Multimodal Search
Search systems that can query across different data types — finding images with text, videos with audio descriptions, or documents that contain specific visual elements.