Test Data
A separate portion of data held back from training that is used to evaluate a model's performance on unseen examples. Test data provides an unbiased estimate of how well the model will perform in the real world.
Why It Matters
Test data is your reality check — it reveals whether your model actually learned generalizable patterns or just memorized training examples.
Example
After training a spam detector on 80% of your emails, you test it on the remaining 20% to see how accurately it classifies emails it has never seen.
Think of it like...
Like a final exam that covers material the student studied but uses different questions — it tests understanding, not memorization.
Related Terms
Training Data
The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.
Validation Data
A subset of data used during training to tune hyperparameters and monitor model performance without touching the test set. It acts as an intermediate checkpoint between training and final evaluation.
Cross-Validation
A model evaluation technique that splits data into multiple folds, trains on some folds and tests on the held-out fold, repeating so every fold serves as the test set. It provides a robust estimate of model performance.
Overfitting
When a model learns the training data too well — including its noise and random fluctuations — and performs poorly on new, unseen data. The model essentially memorizes rather than generalizes.