Machine Learning

Model Parallelism

A distributed training approach where the model itself is split across multiple GPUs, with each GPU holding and computing a different portion of the model.

Why It Matters

Model parallelism enables training models too large to fit on a single GPU. Without it, trillion-parameter models would be impossible.

Example

Splitting a 175B parameter model across 8 GPUs, with each GPU holding ~22B parameters. Different layers or components run on different devices.

Think of it like...

Like a factory assembly line where different workers handle different stages of production — the product (data) moves through workers (GPUs), each performing their specialized step.

Related Terms