Backpropagation Algorithm
Backpropagation is the key algorithm that enables neural networks to learn from data. It efficiently computes gradients by propagating errors backward through the network, allowing us to update weights to minimize prediction errors.
This demo shows how gradients flow backward through layers and how weights are updated to reduce prediction error.
How to Use This Demo:
• Set input values and target output to define the learning task
• Adjust learning rate to control the step size for weight updates
• Click "Step Optimization" to perform one iteration of backpropagation
• Use "Reset Weights" to start with new random weights
• Hover over table entries to highlight corresponding network connections
• Watch the loss decrease over multiple optimization steps
Network Configuration:
• Architecture: 2 inputs → 3 hidden (ReLU) → 1 output (linear)
• Loss function: Mean Squared Error (MSE)
• Optimization: Gradient descent with adjustable learning rate
Network Visualization:
• Node values: Show current activations after forward pass
• Connection thickness: Represents weight magnitudes
Parameter Tables:
• Weights table: Current weight values for all connections
• Hover highlighting: Shows which connection each table entry represents
Loss History Graph:
• Tracks MSE loss over optimization steps
• Shows learning progress as loss decreases
• Illustrates effect of learning rate on convergence
Learning Rate Guidelines:
• Start with learning rates around 0.01-0.1
• Too high: Loss may oscillate or diverge
• Too low: Very slow convergence
• Observe loss curve to assess if rate is appropriate
Training Observations:
• Watch how gradients propagate backward through layers
• Notice that output layer gradients are typically larger
• Hidden layer gradients depend on both forward activations and backward error signals
• ReLU activation sets gradients to zero for negative inputs (dead neurons)
Practical Insights:
• Multiple steps usually needed to reach good solutions
• Different input-target pairs will produce different gradient patterns
• Weight initialization affects convergence speed and final solution
• Real applications use mini-batches and advanced optimizers (Adam, RMSprop)