Safety Evaluation
Systematic testing of AI models for harmful outputs, dangerous capabilities, and vulnerability to misuse. Safety evaluations assess risks before deployment.
Why It Matters
Safety evaluation is becoming mandatory for frontier models. The EU AI Act and voluntary commitments require comprehensive safety testing before release.
Example
Testing whether a model can be manipulated into providing instructions for dangerous activities, generating harmful content, or leaking private training data.
Think of it like...
Like crash testing cars before they go on sale — you need to know how the system behaves in worst-case scenarios before putting it in users' hands.
Related Terms
Red Teaming
The practice of systematically testing AI systems by attempting to find failures, vulnerabilities, and harmful behaviors before deployment. Red teamers actively try to break the system.
AI Safety
The research field focused on ensuring AI systems operate reliably, predictably, and without causing unintended harm. It spans from technical robustness to long-term existential risk concerns.
Evaluation
The systematic process of measuring an AI model's performance, safety, and reliability using various metrics, benchmarks, and testing methodologies.
Guardrails
Safety mechanisms and constraints built into AI systems to prevent harmful, inappropriate, or off-topic outputs. Guardrails can operate at the prompt, model, or output level.
Responsible AI
An approach to developing and deploying AI that prioritizes ethical considerations, fairness, transparency, accountability, and societal benefit throughout the entire AI lifecycle.