Machine Learning

Preference Optimization

Training techniques that directly optimize models based on human preference data, where humans indicate which of two model outputs they prefer.

Why It Matters

Preference optimization is how models learn to be helpful, honest, and harmless. It translates subjective human judgment into mathematical optimization.

Example

Showing raters two model responses to the same question, collecting their preference, then training the model to produce outputs more like the preferred ones.

Think of it like...

Like a cooking competition where judges taste two dishes and pick the better one — over many rounds, the chef learns to cook what judges prefer.

Related Terms