Multi-Armed Bandit
A simplified reinforcement learning problem where an agent must choose between multiple options (arms) with unknown payoffs, balancing exploration of new options with exploitation of known good ones.
Why It Matters
Multi-armed bandits power A/B testing, ad placement, and content recommendation — any scenario where you need to learn the best option while maximizing reward.
Example
An ad platform deciding which of 10 ad variants to show each user — starting with equal rotation, then gradually showing the best performers more often.
Think of it like...
Like deciding which slot machine to play in a casino — each has a different unknown payout rate, and you want to find and stick with the best one.
Related Terms
Exploration vs Exploitation
The fundamental tradeoff in reinforcement learning between trying new actions (exploration) to discover potentially better strategies and using known good actions (exploitation) to maximize current reward.
Reinforcement Learning
A type of machine learning where an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties. The agent aims to maximize cumulative reward over time through trial and error.
Recommendation System
An AI system that predicts and suggests items a user might be interested in based on their behavior, preferences, and similarities to other users.