Top-p Sampling
A text generation method (also called nucleus sampling) where the model considers only the smallest set of tokens whose cumulative probability exceeds the threshold p. This balances diversity and quality.
Why It Matters
Top-p sampling gives more natural and coherent text generation than pure random sampling while avoiding the repetitiveness of greedy decoding.
Example
With top-p = 0.9, if the top 5 token probabilities sum to 0.92, the model randomly selects from only those 5 tokens, ignoring the long tail of unlikely options.
Think of it like...
Like a restaurant that only puts dishes on the specials menu if they are popular enough — you get variety, but only from options that actually make sense.
Related Terms
Temperature
A parameter that controls the randomness or creativity of an LLM's output. Lower temperatures (closer to 0) make outputs more deterministic and focused; higher temperatures increase randomness and creativity.
Beam Search
A search algorithm used in text generation that explores multiple possible output sequences simultaneously, keeping the top-scoring candidates at each step. It finds higher-quality outputs than greedy decoding.