Backpropagation Algorithm
Backpropagation is the key algorithm that enables neural networks to learn from data. It efficiently computes gradients by propagating errors backward through the network, allowing us to update weights to minimize prediction errors.

This demo shows how gradients flow backward through layers and how weights are updated to reduce prediction error.
How to Use This Demo:
• Set input values and target output to define the learning task
• Adjust learning rate to control the step size for weight updates
• Click "Step Optimization" to perform one iteration of backpropagation
• Use "Reset Weights" to start with new random weights
• Hover over table entries to highlight corresponding network connections
• Watch the loss decrease over multiple optimization steps

Network Configuration:
• Architecture: 2 inputs → 3 hidden (ReLU) → 1 output (linear)
• Loss function: Mean Squared Error (MSE)
• Optimization: Gradient descent with adjustable learning rate
Network Visualization:
Node values: Show current activations after forward pass
Connection thickness: Represents weight magnitudes

Parameter Tables:
Weights table: Current weight values for all connections
Hover highlighting: Shows which connection each table entry represents

Loss History Graph:
• Tracks MSE loss over optimization steps
• Shows learning progress as loss decreases
• Illustrates effect of learning rate on convergence
Learning Rate Guidelines:
• Start with learning rates around 0.01-0.1
• Too high: Loss may oscillate or diverge
• Too low: Very slow convergence
• Observe loss curve to assess if rate is appropriate

Training Observations:
• Watch how gradients propagate backward through layers
• Notice that output layer gradients are typically larger
• Hidden layer gradients depend on both forward activations and backward error signals
• ReLU activation sets gradients to zero for negative inputs (dead neurons)

Practical Insights:
• Multiple steps usually needed to reach good solutions
• Different input-target pairs will produce different gradient patterns
• Weight initialization affects convergence speed and final solution
• Real applications use mini-batches and advanced optimizers (Adam, RMSprop)

Input
1.00
0.50
Hidden
0.00
0.02
0.41
Output
0.20
0.20
1.6171
Step: 0

Training Values

1.0
0.5
2.0
0.01

Loss History

Network Parameters

Weights
From I1
From I2
To Output
→ H1: -0.758
→ H2: -0.270
→ H3: 0.835
→ H1: -0.448
→ H2: 0.579
→ H3: -0.857
H1 →: -0.717
H2 →: -0.204
H3 →: 0.505
Note: For simplicity, this network does not have any biases

Weight Updates (Last Step)

Click "Step Optimization" to see weight updates
Training History
Step 0
Loss 1.6171
Prediction 0.202
Target 2.000
ΔLoss