Machine Learning

LoRA

Low-Rank Adaptation — a parameter-efficient fine-tuning technique that freezes the original model weights and adds small trainable matrices to each layer. It dramatically reduces the compute and memory needed for fine-tuning.

Why It Matters

LoRA makes fine-tuning large models practical on consumer hardware. You can customize a 70B parameter model on a single GPU instead of needing a cluster.

Example

Fine-tuning Llama 2 70B with LoRA requires only ~16GB of GPU memory instead of hundreds of GB, making it accessible to individual developers and small teams.

Think of it like...

Like altering a suit instead of making a new one from scratch — small, targeted changes to key areas give you a custom fit without rebuilding everything.

Related Terms