Accuracy
The percentage of correct predictions out of all predictions made by a model. While intuitive, accuracy can be misleading for imbalanced datasets.
Why It Matters
Accuracy is the most commonly reported metric but can hide poor performance. A model that always predicts 'not fraud' achieves 99.9% accuracy on a dataset with 0.1% fraud — but is useless.
Example
A spam classifier correctly identifying 950 out of 1,000 emails (95% accuracy), meaning 50 were misclassified as spam or not-spam.
Think of it like...
Like a weather forecaster who says 'no rain' every day in a desert — they are 95% accurate but not actually useful for predicting the rare rainstorms.
Related Terms
Precision
Of all the items the model predicted as positive, the proportion that were actually positive. Precision measures how trustworthy the model's positive predictions are.
Recall
Of all the actually positive items in the dataset, the proportion that the model correctly identified. Recall measures how completely the model finds all relevant items.
F1 Score
The harmonic mean of precision and recall, providing a single metric that balances both. F1 scores range from 0 to 1, with 1 being perfect precision and recall.
Confusion Matrix
A table that summarizes the performance of a classification model by showing true positives, true negatives, false positives, and false negatives. It reveals the types of errors a model makes.
Evaluation
The systematic process of measuring an AI model's performance, safety, and reliability using various metrics, benchmarks, and testing methodologies.