Exploration vs Exploitation
The fundamental tradeoff in reinforcement learning between trying new actions (exploration) to discover potentially better strategies and using known good actions (exploitation) to maximize current reward.
Why It Matters
Balancing exploration and exploitation is key to RL success. Too much exploration wastes time; too much exploitation misses better opportunities.
Example
A recommendation system deciding between showing a user content it knows they will like (exploitation) versus showing something new to discover untapped interests (exploration).
Think of it like...
Like choosing restaurants — always going to your favorite (exploitation) versus trying new places (exploration). A mix of both leads to the best overall dining experience.
Related Terms
Reinforcement Learning
A type of machine learning where an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties. The agent aims to maximize cumulative reward over time through trial and error.
Multi-Armed Bandit
A simplified reinforcement learning problem where an agent must choose between multiple options (arms) with unknown payoffs, balancing exploration of new options with exploitation of known good ones.