Machine Learning

QLoRA

Quantized Low-Rank Adaptation — combines LoRA with quantization to further reduce memory requirements for fine-tuning. It quantizes the base model to 4-bit precision while training LoRA adapters in higher precision.

Why It Matters

QLoRA enables fine-tuning of massive models on a single consumer GPU, democratizing access to custom LLMs for individuals and small organizations.

Example

Fine-tuning a 65B parameter model on a single 48GB GPU using 4-bit quantization for the base model combined with LoRA adapters for the trainable parameters.

Think of it like...

Like compressing a huge reference library into pocket-sized summaries and only keeping full-size versions of the chapters you are actively editing.

Related Terms