Data Science

Data Pipeline

An automated workflow that extracts data from sources, transforms it through processing steps, and loads it into a destination for use. In ML, data pipelines ensure consistent data flow from raw sources to model training.

Why It Matters

Data pipelines are the plumbing of AI systems. A broken pipeline means no fresh data, stale models, and degraded performance — often without anyone noticing.

Example

An ETL pipeline that hourly extracts customer data from a CRM, joins it with transaction data from a database, cleans it, and loads it into a feature store for model training.

Think of it like...

Like a factory assembly line — raw materials enter one end, pass through processing stations, and finished products emerge at the other end, all running automatically.

Related Terms