Concept Bottleneck
A model architecture that forces predictions through a set of human-interpretable concepts. The model first predicts concepts, then uses those concepts to make the final prediction.
Why It Matters
Concept bottleneck models are inherently interpretable — you can see exactly which concepts drove the prediction and intervene by correcting wrong concepts.
Example
A bird species classifier that first predicts interpretable concepts (wing_color=red, beak_shape=curved, size=small) then uses those concepts to predict the species.
Think of it like...
Like a doctor who first identifies symptoms (fever, cough, fatigue) then diagnoses the disease — the intermediate concepts make the reasoning transparent.
Related Terms
Interpretability
The degree to which a human can understand the internal mechanisms and reasoning process of a machine learning model. More interpretable models allow deeper inspection of how they work.
Explainability
The ability to understand and articulate how an AI model reaches its decisions or predictions. Explainable AI (XAI) makes the decision-making process transparent and comprehensible to humans.
Classification
A type of supervised learning task where the model predicts which category or class an input belongs to. The output is a discrete label rather than a continuous value.
Neural Network
A computing system inspired by the biological neural networks in the human brain. It consists of interconnected nodes (neurons) organized in layers that process information and learn to recognize patterns.