Machine Learning

Multi-Armed Bandit

A simplified reinforcement learning problem where an agent must choose between multiple options (arms) with unknown payoffs, balancing exploration of new options with exploitation of known good ones.

Why It Matters

Multi-armed bandits power A/B testing, ad placement, and content recommendation — any scenario where you need to learn the best option while maximizing reward.

Example

An ad platform deciding which of 10 ad variants to show each user — starting with equal rotation, then gradually showing the best performers more often.

Think of it like...

Like deciding which slot machine to play in a casino — each has a different unknown payout rate, and you want to find and stick with the best one.

Related Terms