Validation Data
A subset of data used during training to tune hyperparameters and monitor model performance without touching the test set. It acts as an intermediate checkpoint between training and final evaluation.
Why It Matters
Validation data prevents you from accidentally overfitting to the test set by giving you a separate dataset to make design decisions against.
Example
Using 60% of data for training, 20% for validation (to pick the best model configuration), and 20% for final testing.
Think of it like...
Like practice exams before the real test — they help you gauge readiness and adjust your study strategy without wasting the actual exam.
Related Terms
Training Data
The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.
Test Data
A separate portion of data held back from training that is used to evaluate a model's performance on unseen examples. Test data provides an unbiased estimate of how well the model will perform in the real world.
Cross-Validation
A model evaluation technique that splits data into multiple folds, trains on some folds and tests on the held-out fold, repeating so every fold serves as the test set. It provides a robust estimate of model performance.
Hyperparameter Tuning
The process of systematically searching for the best combination of hyperparameters for a model. Since hyperparameters are set before training, finding optimal values requires experimentation.