Artificial Intelligence

CLIP

Contrastive Language-Image Pre-training — an OpenAI model trained to understand the relationship between images and text. CLIP can match images to text descriptions without being trained on specific image categories.

Why It Matters

CLIP bridged the gap between vision and language, enabling zero-shot image classification, image search, and the text-to-image generation that powers DALL-E and Stable Diffusion.

Example

Searching a photo library by typing 'sunset over mountains' — CLIP matches the text description to images with similar semantic content without needing labeled training data.

Think of it like...

Like a bilingual person who can describe any photo in words and find any photo from a description — they understand both languages and the connections between them.

Related Terms