Structured Data
Data organized in a predefined format with clear rows and columns, like spreadsheets and relational databases. Each field has a defined type and meaning.
Why It Matters
Structured data is the easiest type for ML to consume. Traditional ML algorithms (XGBoost, random forests) work directly with structured tabular data.
Example
A customer database with columns: customer_id (integer), name (string), email (string), signup_date (date), total_purchases (float) — each field clearly defined.
Think of it like...
Like a well-organized filing cabinet with labeled folders and standardized forms — everything has its place and you can find anything quickly.
Related Terms
Unstructured Data
Data without a predefined format or organization — text documents, images, videos, audio, social media posts. Over 80% of enterprise data is unstructured.
Semi-Structured Data
Data that has some organizational structure but does not conform to a rigid schema like a relational database. Examples include JSON, XML, and HTML.
Data Preprocessing
The process of cleaning, transforming, and organizing raw data into a format suitable for machine learning. This includes handling missing values, encoding categories, scaling features, and removing outliers.