The Complete Transformer Block

LLM Architecture Series – Lesson 15 of 20. Now we bring all familiar components together into the standard transformer block.

Each block contains attention, MLP, layer norms, and residual paths wired in a specific order.

Transformer block annotated

Visualization from bbycroft.net/llm augmented by Nano Banana.

Complete Transformer Block

Visualization from bbycroft.net/llm

Putting It All Together

A transformer block combines all the components we have discussed into a single unit. GPT models stack multiple identical blocks to create deep networks.

The Block Architecture

Each transformer block contains (in order):

  1. Layer Norm (Pre-normalization)
  2. Multi-Head Self-Attention
  3. Residual Connection
  4. Layer Norm
  5. Feed-Forward Network (MLP)
  6. Residual Connection

The Complete Formula

h = x + MultiHeadAttention(LayerNorm(x))
output = h + FFN(LayerNorm(h))

Pre-LN vs Post-LN

GPT uses Pre-LN (LayerNorm before each sublayer), which:

  • Makes training more stable
  • Allows larger learning rates
  • Enables training very deep networks

Parameters Per Block

For nano-gpt (d=48, 3 heads, 4x expansion):

  • Attention: 4 × 48 × 48 = 9,216 parameters
  • FFN: 2 × 48 × 192 = 18,432 parameters
  • LayerNorm: 2 × 2 × 48 = 192 parameters
  • Total per block: ~28,000 parameters

Series Navigation

Previous: Residual Connections

Next: Stacking Transformer Layers


This article is part of the LLM Architecture Series. Interactive visualizations from bbycroft.net/llm.

Analogy and intuition

You can think of a transformer block as one full reasoning step. Information flows in, attends to other positions, is processed by an MLP, and then passed on along residual paths.

Stacking many such blocks lets the model refine its understanding over multiple passes, similar to revisiting the same text with deeper insight each time.

Looking ahead

Next we will see what happens when we stack many transformer blocks on top of each other.

CategoriesAI

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.