Query, Key, Value in Attention

LLM Architecture Series – Lesson 9 of 20. Multi head attention relies on three sets of vectors called queries, keys, and values.

These vectors control how positions compare to each other and how information flows across the sequence.

Query key value annotated

Visualization from bbycroft.net/llm augmented by Nano Banana.

QKV Attention

Visualization from bbycroft.net/llm

The QKV Framework

The Query-Key-Value framework is borrowed from information retrieval. Think of it like a database lookup:

Each of Q, K, V is created by multiplying the input by a learned weight matrix:

Q = XW_Q ∈ R^n×d_k
K = XW_K ∈ R^n×d_k
V = XW_V ∈ R^n×d_v

The complete scaled dot-product attention:

Attention(Q, K, V) = softmax(QK^T / √d_k)V

Having separate projections allows:

Previous: Multi-Head Attention

Next: Causal Masking – Preventing Future Leakage

This article is part of the LLM Architecture Series. Interactive visualizations from bbycroft.net/llm.

You can think of keys as entries in a library catalog, queries as search requests, and values as the actual book contents that are returned.

By adjusting queries and keys, the model can learn very flexible rules about which words should interact strongly in each context.

Next we introduce causal masking, which prevents the model from looking into the future when it predicts the next token.