Machine Learning

Dimensionality Reduction

Techniques that reduce the number of features (dimensions) in a dataset while preserving the most important information. This makes data easier to visualize, speeds up training, and can improve model performance.

Why It Matters

High-dimensional data is hard to work with and can cause the 'curse of dimensionality.' Dimensionality reduction makes ML practical for datasets with thousands of features.

Example

Reducing a dataset with 1,000 gene expression features down to 50 principal components that capture 95% of the variation, making it feasible to cluster patient groups.

Think of it like...

Like summarizing a 500-page book into a 10-page overview — you lose some detail but keep the essential information that matters most.

Related Terms

Principal Component Analysis

A dimensionality reduction technique that transforms data into a new coordinate system where the first axis captures the most variance, the second axis the next most, and so on.

Back to Glossary