Gradient Descent

This interactive demo shows how gradient descent optimizes linear regression parameters. Unlike analytical solutions, gradient descent iteratively improves parameters by following the negative gradient of the cost function.

You'll see the algorithm step-by-step, watching how the regression line evolves and tracking the optimization path on the cost surface. This is the foundation for training machine learning models.

• Click "Step" to perform one gradient descent iteration. Watch the regression line evolve (left panel) and the optimization path on the cost surface (right panel)
• Use "Re-initialize" to start from new random parameters, or load different datasets to explore various optimization landscapes
• Adjust the learning rate

η

to see different convergence behaviors—too small is slow, too large causes oscillation

What learning rate gives the fastest convergence without overshooting? Try values from 0.01 to 0.5.

• Data & Regression Line (top-left): Normalized data points, current regression line, residuals (dashed red lines showing prediction errors)
• Cost Surface (top-right): Cost surface contours, current position (red dot), optimization path (yellow line)
• Cost Improvement (bottom): Bar chart showing

| Δ J |

(absolute cost reduction) for each iteration. Watch how the improvements diminish as the algorithm converges to the optimal solution
• Readouts: Current iteration, normalized parameters

α

and

β

, cost

J

, and gradient norm

| | \nabla J | |

• Caption: Shows what happened in each step (gradient computation, parameter updates, and

Δ J

)
• As convergence approaches, both the gradient norm and

| Δ J |

decrease toward zero

Data & Regression Line

Cost Surface

Cost Improvement per Step (ΔJ)

Optimization History

Iteration	0
$α$	-1.300
$β$	-0.900
Cost $J$	2.265
$Δ J$	—

Dataset Selection

Current State

Iteration: 0

α

: -1.300

Cost

J

: 2.265

β

: -0.900

Initial Parameters

α_{0}

(intercept)

-1.3

β_{0}

(slope)

-0.9

η

(learning rate)

0.100

Mathematical Foundations

Gradient descent is an iterative optimization algorithm that finds parameter values minimizing the cost function $J (α, β) = \frac{1}{2 n} \sum_{i = 1}^{n} (α + β x_{i} - y_{i})^{2}$ for our linear regression model $μ (x) = α + β x$ . Rather than solving analytically, gradient descent starts with an initial guess for the parameters and repeatedly adjusts them in the direction that most decreases the cost.

At each iteration, we update the parameters according to $α \leftarrow α - η \frac{\partial J}{\partial α}$ and $β \leftarrow β - η \frac{\partial J}{\partial β}$ , where the gradients are $\frac{\partial J}{\partial α} = \frac{1}{n} \sum_{i = 1}^{n} (α + β x_{i} - y_{i})$ and $\frac{\partial J}{\partial β} = \frac{1}{n} \sum_{i = 1}^{n} (α + β x_{i} - y_{i}) \cdot x_{i}$ .

The learning rate $η$ is crucial: too small and the algorithm takes tiny steps requiring many iterations; too large and it may overshoot the minimum and diverge. A moderate $η$ leads to steady decrease in $J$ over iterations. As we approach the minimum, the gradient magnitude decreases, effectively reducing step size even with constant $η$ .

Developed by Kevin Yu & Panagiotis Angeloudis