Backpropagation

Backpropagation Algorithm
Backpropagation is the key algorithm that enables neural networks to learn from data. It efficiently computes gradients by propagating errors backward through the network, allowing us to update weights to minimize prediction errors.

This demo shows how gradients flow backward through layers and how weights are updated to reduce prediction error.

How to Use This Demo:
• Set input values and target output to define the learning task
• Adjust learning rate to control the step size for weight updates
• Click "Step Optimization" to perform one iteration of backpropagation
• Use "Reset Weights" to start with new random weights
• Hover over table entries to highlight corresponding network connections
• Watch the loss decrease over multiple optimization steps

Network Configuration:
• Architecture: 2 inputs → 3 hidden (ReLU) → 1 output (linear)
• Loss function: Mean Squared Error (MSE)
• Optimization: Gradient descent with adjustable learning rate

Network Visualization:
• Node values: Show current activations after forward pass
• Connection thickness: Represents weight magnitudes

Parameter Tables:
• Weights table: Current weight values for all connections
• Hover highlighting: Shows which connection each table entry represents

Loss History Graph:
• Tracks MSE loss over optimization steps
• Shows learning progress as loss decreases
• Illustrates effect of learning rate on convergence

Learning Rate Guidelines:
• Start with learning rates around 0.01-0.1
• Too high: Loss may oscillate or diverge
• Too low: Very slow convergence
• Observe loss curve to assess if rate is appropriate

Training Observations:
• Watch how gradients propagate backward through layers
• Notice that output layer gradients are typically larger
• Hidden layer gradients depend on both forward activations and backward error signals
• ReLU activation sets gradients to zero for negative inputs (dead neurons)

Practical Insights:
• Multiple steps usually needed to reach good solutions
• Different input-target pairs will produce different gradient patterns
• Weight initialization affects convergence speed and final solution
• Real applications use mini-batches and advanced optimizers (Adam, RMSprop)

Input

1.00

0.50

Hidden

0.40

0.57

0.00

Output

0.33

1.3955

Step: 0

Training Values

Input 1

1.0

Input 2

0.5

Target Output

2.0

Learning Rate

0.01

Loss History

Network Parameters

Weights

From I1

From I2

To Output

→ H1: 0.371

→ H2: 0.513

→ H3: -0.560

→ H1: 0.059

→ H2: 0.113

→ H3: -0.882

H1 →: -0.118

H2 →: 0.662

H3 →: 1.044

Note: For simplicity, this network does not have any biases

Weight Updates (Last Step)

Click "Step Optimization" to see weight updates

Training History

Step	0
Loss	1.3955
Prediction	0.329
Target	2.000
$Δ$ Loss	—