LSTM Networks Explained

LSTM Cell Architecture Diagram

Introduction: The Power of LSTMs

The Long Short-Term Memory (LSTM) network is a specialised kind of Recurrent Neural Network (RNN) architecture, designed specifically to solve the problem of vanishing gradients that plagues traditional RNNs when dealing with long sequences of data.

While standard RNNs struggle to retain information from many steps ago, LSTMs are engineered with a dedicated Cell State (C_t)—often called the “conveyor belt”—that runs straight through the network. This Cell State is regulated by three distinct, multiplicative gates (Forget, Input, and Output) that learn to selectively remember or forget information, allowing the network to capture and utilise long-term dependencies in sequential data like text, speech, and time series. The mathematical equations below illustrate how these gates precisely control the flow of both long-term memory (C_t) and short-term output (h_t).

LSTM Cell Architecture Equations

This breakdown translates the LSTM diagram into its corresponding mathematical equations, showing exactly how the inputs (x_t, h_t-1, C_t-1) are processed to generate the outputs (h_t, C_t).

The σ symbol represents the Sigmoid function, and W and b represent the weight matrices and bias vectors learned during training.

1. The Gates (Control)

The first step is calculating the three gates, each using the current input (x_t) and the previous hidden state (h_t-1) and applying a sigmoid function (σ):

A. Forget Gate (f_t):

f_t = σ(W_f · [h_t-1, x_t] + b_f)

Purpose: Decides which information to forget from the old cell state (C_t-1).

B. Input Gate (i_t):

i_t = σ(W_i · [h_t-1, x_t] + b_i)

Purpose: Decides which values to update in the cell state.

C. Candidate Cell State (ᶜ_t):

ᶜ_t = tanh(W_C · [h_t-1, x_t] + b_C)

Purpose: Creates a vector of potential new values that could be added to the cell state.

2. Cell State Update (The Memory)

The Cell State (C_t) is the core memory of the LSTM, updated by combining the old memory and the new candidate memory:

C_t = f_t ∗ C_t-1 + i_t ∗ ᶜ_t

The term f_t * C_t-1 implements the forgetting mechanism: the old memory C_t-1 is scaled down by the Forget Gate f_t.
The term i_t * ᶜ_t implements the input mechanism: the new candidate information ᶜ_t is scaled by the Input Gate i_t.
These two parts are then added to create the new long-term memory, C_t.

3. Hidden State Output (The Prediction)

The Hidden State (h_t) is the final output of the cell at this time step. It is based on the new Cell State, filtered by the Output Gate:

A. Output Gate (o_t):

o_t = σ(W_o · [h_t-1, x_t] + b_o)

Purpose: Decides which parts of the (squashed) Cell State will be exposed as the Hidden State.

B. Final Hidden State (h_t):

h_t = o_t ∗ tanh(C_t)

The new Cell State C_t is passed through tanh to bound the values between -1 and 1.
The result is then element-wise multiplied by the Output Gate o_t to produce the final short-term memory and output vector, h_t.

Conclusion: The Importance of Selective Memory

The LSTM architecture, as described by these equations, fundamentally improved the capability of recurrent neural networks to model complex dependencies over long sequences. By using three learned, multiplicative gates to regulate the flow into and out of the Cell State, the LSTM is able to maintain a stable, uncorrupted memory path, overcoming the practical limitations of standard RNNs.

This innovation has made LSTMs essential tools in areas requiring deep contextual understanding, leading to breakthroughs in speech recognition, machine translation, and text generation, before the wider adoption of the Transformer architecture.

Next Steps

Interested in a simpler alternative? Check out the GRU (Gated Recurrent Unit), which combines the forget and input gates into a single update gate—achieving similar performance with fewer parameters. For cutting-edge sequence modeling, explore how Transformers use attention mechanisms to process entire sequences in parallel, bypassing recurrence altogether.

Introduction: The Power of LSTMs

LSTM Cell Architecture Equations

1. The Gates (Control)

A. Forget Gate (ft):

B. Input Gate (it):

C. Candidate Cell State (ᶜt):