Artificial Intelligence

Streaming

Delivering LLM output token-by-token as it is generated rather than waiting for the complete response. Streaming dramatically improves perceived latency and user experience.

Why It Matters

Streaming reduces perceived wait time from seconds to milliseconds. Users see text appearing immediately rather than staring at a loading spinner.

Example

ChatGPT showing words appearing one at a time as they are generated, letting users start reading within 200ms rather than waiting 5 seconds for the complete response.

Think of it like...

Like reading a news ticker as it scrolls versus waiting for the entire news broadcast to finish before seeing anything.

Streaming

Why It Matters

Example

Think of it like...

Related Terms

Inference

Latency

API

Model Serving