Introduction to Large Language Models

LLM Architecture Series – Lesson 1 of 20. This article gives you the big picture of a modern language model before we zoom into each part.

You can think of a large language model as a very advanced auto complete engine that predicts the next token based on everything it has seen so far.

(more…)

LLM Architecture Series – Complete Guide

LLM Architecture Overview - Annotated

Visualisation from bbycroft.net/llm – Annotated with Nano Banana

Welcome to the LLM Architecture Series

This comprehensive 20-part series takes you from the fundamentals to advanced concepts in Large Language Model architecture. Using interactive visualisations from Brendan Bycroft’s excellent LLM Visualisation, we explore every component of a GPT-style transformer.

Series Overview

Part 1: Foundations (Articles 1-5)

  1. Introduction to Large Language Models – What LLMs are and how they work
  2. Tokenization Basics – Converting text to tokens
  3. Token Embeddings – Converting tokens to vectors
  4. Position Embeddings – Encoding word order
  5. Combined Input Embedding – Putting it together

Part 2: The Transformer Block (Articles 6-14)

  1. Layer Normalisation – Stabilizing the network
  2. Self-Attention Part 1 – The core innovation
  3. Self-Attention Part 2 – Multi-head attention
  4. Query, Key, Value – The attention framework
  5. Causal Masking – Preventing future leakage
  6. Attention Softmax – Computing attention weights
  7. Projection Layer – Combining attention outputs
  8. Feed-Forward Networks – The MLP component
  9. Residual Connections – Skip connections for depth

Part 3: The Complete Model (Articles 15-20)

  1. Complete Transformer Block – All components together
  2. Stacking Layers – Building depth
  3. Output Layer – The language model head
  4. Output Softmax – From logits to probabilities
  5. Scaling LLMs – From nano-GPT to GPT-3
  6. Complete Pipeline – The full picture

About This Series

Each article includes:

  • Interactive visualisations from bbycroft.net/llm
  • Mathematical equations explaining each component
  • Intuitive explanations of why each part matters
  • Navigation links to previous and next articles

Start Learning

Begin with: Introduction to Large Language Models


Interactive visualisations courtesy of bbycroft.net/llm by Brendan Bycroft. Annotated images created with Nano Banana.