Instruction Hierarchy
A framework for prioritizing different levels of instructions when they conflict — system prompts typically override user prompts, which override context from retrieved documents.
Why It Matters
Instruction hierarchy prevents prompt injection attacks by establishing clear priority rules for whose instructions the model follows.
Example
System prompt (highest priority) → developer instructions → user instructions → content from retrieved documents (lowest priority) — so injected instructions in documents cannot override safety rules.
Think of it like...
Like a chain of command in the military — orders from higher ranks override conflicting orders from lower ranks, maintaining organizational control.
Related Terms
System Prompt
Hidden instructions provided to an LLM that define its behavior, personality, constraints, and capabilities for a conversation. System prompts set the rules of engagement before the user interacts.
Prompt Injection
A security vulnerability where malicious input is crafted to override or manipulate an LLM's system prompt or instructions, causing it to behave in unintended ways.
Guardrails
Safety mechanisms and constraints built into AI systems to prevent harmful, inappropriate, or off-topic outputs. Guardrails can operate at the prompt, model, or output level.
AI Safety
The research field focused on ensuring AI systems operate reliably, predictably, and without causing unintended harm. It spans from technical robustness to long-term existential risk concerns.
Prompt Engineering
The practice of designing and optimizing input prompts to get the best possible output from AI models. It involves crafting instructions, providing examples, and structuring queries to guide the model toward desired responses.