Recurrent Neural Networks (RNNs)

This interactive demo shows how Recurrent Neural Networks process sequential data step-by-step. RNNs have a "memory" that allows them to use information from previous timesteps when processing current inputs.

You'll explore how the hidden state evolves over time and see how different weights and activation functions affect the network's behavior on various sequence patterns.
RNN Architecture Components:

Hidden State: ht - Memory that carries information from previous timesteps
Input Weight: Winput - How much the current input affects the hidden state
Hidden Weight: Whidden - How much the previous hidden state influences the current one
Output Weight: Woutput - Projects hidden state to prediction space
Biases: Constant terms added to the computation
Activation Function: Non-linear transformation (tanh, ReLU, sigmoid)

RNN Formulas:
ht=f(Winputxt+Whiddenht1+bhidden)
yt=Woutputht+boutput

Where f is the activation function, xt is the input at time t, ht1 is the previous hidden state, and yt is the output prediction.

Note: The output layer separates internal memory (ht) from task-specific predictions (yt).
How to Use:

Select a sequence to see different patterns (Fibonacci, linear, etc.)
Adjust weights to see how they affect hidden state and output evolution
Try different activation functions to understand their impact
Use "Step Forward" to manually process each timestep
Click "Train (BPTT)" to train using gradient-based backpropagation through time (the real algorithm)
Click "Random Search" for comparison baseline (tries random weights without gradients)
Toggle "Raw/Normalized Inputs" to scale sequences to [-1, 1] range

Sequence Types:
Fibonacci: Each number is sum of previous two (requires richer state)
Linear: Simple counting sequence
Alternating: Pattern that switches between values
Exponential: Powers of 2 sequence
Sine Pattern: Smooth oscillating values
Understanding RNN Behavior:

Input Weight (Winput): Higher values make current input more influential
Hidden Weight (Whidden): Controls memory retention. |Whidden| > 1 can cause explosion, < 1 causes decay
Output Layer: Separates internal memory (ht) from predictions (yt). Essential for flexible modeling
Activation Functions:
- Tanh: Outputs between -1 and 1, standard for RNN hidden states
- ReLU: Only positive outputs, can cause exploding gradients in recurrence
- Sigmoid: Outputs between 0 and 1, prone to vanishing gradients
- Linear: No saturation but unstable without careful weight tuning

Training Insights:
BPTT vs Random Search: "Train (BPTT)" uses gradient descent (the real algorithm used in practice). "Random Search" tries random weight combinations as a baseline for comparison. BPTT is vastly more efficient and shows why gradient-based optimization revolutionized neural networks.
BPTT Algorithm: Backpropagation through time unfolds the network and computes gradients across all timesteps using the chain rule. Updates weights via gradient descent: w = w - lr * ∇w.
Gradient Clipping: The demo clips gradients to [-5, 5] to prevent explosion, a critical technique for RNN training
Learning Rate: Controls step size during gradient descent. Too high causes instability, too low causes slow convergence
Normalization: Helps prevent activation saturation with bounded functions

Limitations:
• This demo uses a single scalar hidden unit. Real RNNs use vector hidden states for richer memory
• Fibonacci requires memory of two previous values - a single scalar ht is insufficient

Input Sequence (xt)

RNN Cell

Loss: -
Grad Norm: -

Output Sequence (yt)

Input Sequence

1 1 2 3

Activation Function

Input Scaling

0.010
1.00
0.50
1.00
0.0
0.0
Sequence Processing History
Step -
Input (xt) -
Expected Next -
Actual Output (yt) -
Error (Δ) -
Expected Δ Output -
Actual Δ Output -