LLM Architecture Archives

Concept Vectors and Analogy in LLMs

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Bonus Lesson. In earlier lessons you saw how tokens become vectors. This article goes deeper into what those vectors mean and how simple arithmetic on them can reveal structure in concepts.

Concept vectors in embedding space

Concept vectors in embedding space, generated with Nano Banana.

(more…)

The Complete LLM Pipeline – Putting It All Together

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 20 of 20. We have visited every component of the architecture. This lesson ties them together into a single mental model.

By walking through a full end to end example you can see how tokenization, embeddings, attention, MLPs, and the output layer cooperate to produce text.

(more…)

Scaling LLMs – nano-GPT to GPT-3

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 19 of 20. You now understand the core architecture. Scaling is about what happens when we make models wider, deeper, and train them on more data.

Surprisingly, performance often follows smooth scaling laws, which lets practitioners predict the benefit of using larger models.

(more…)

From Logits to Probabilities – Softmax Output

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 18 of 20. The output layer produces one logit per token in the vocabulary. Softmax converts these logits into a proper probability distribution.

These probabilities drive sampling strategies such as greedy decoding, top k sampling, and nucleus sampling.

(more…)

The Output Layer and Language Model Head

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 17 of 20. After many transformer layers, we have a final hidden vector for each position. The output layer turns this into raw scores for every token in the vocabulary.

This linear layer is often called the language model head and is where most parameters of the model live.

(more…)

Stacking Transformer Layers

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 16 of 20. A single transformer block is powerful, but modern LLMs use many of them in sequence.

Each additional layer can capture longer range patterns and refine the representations produced by earlier layers.

(more…)

The Complete Transformer Block

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 15 of 20. Now we bring all familiar components together into the standard transformer block.

Each block contains attention, MLP, layer norms, and residual paths wired in a specific order.

(more…)

Residual Connections

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 14 of 20. At this point attention and MLP layers are doing heavy work. Residual connections make sure information can flow easily through many layers.

By adding the input of a block back to its output, residual paths help gradients move during training and preserve useful signals.

(more…)

Feed-Forward Networks (MLP)

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 13 of 20. After attention and projection we pass each position through a feed forward network.

This multilayer perceptron applies the same small neural network to every position independently, adding powerful non linear transformations.

(more…)

The Projection Layer

Posted on December 1, 2025March 3, 2026 by David Saliba

LLM Architecture Series – Lesson 12 of 20. The attention heads produce outputs that must be merged and projected back into the model hidden space.

This is done by a learned linear projection that mixes information from all heads into a single vector per position.

(more…)

Older posts →