| Iteration | 0 |
|---|---|
| $\alpha$ | 0.000 |
| $\beta$ | 0.000 |
| Cost $\mathcal{J}$ | 0.000 |
| $\Delta\mathcal{J}$ | — |
Gradient descent is an iterative optimization algorithm that finds parameter values minimizing the cost function $\mathcal{J}(\alpha,\beta) = \frac{1}{2n}\sum_{i=1}^{n}(\alpha + \beta x_i - y_i)^2$ for our linear regression model $\mu(x) = \alpha + \beta x$. Rather than solving analytically, gradient descent starts with an initial guess for the parameters and repeatedly adjusts them in the direction that most decreases the cost.
At each iteration, we update the parameters according to $\alpha \leftarrow \alpha - \eta \frac{\partial \mathcal{J}}{\partial \alpha}$ and $\beta \leftarrow \beta - \eta \frac{\partial \mathcal{J}}{\partial \beta}$, where the gradients are $\frac{\partial \mathcal{J}}{\partial \alpha} = \frac{1}{n}\sum_{i=1}^{n}(\alpha + \beta x_i - y_i)$ and $\frac{\partial \mathcal{J}}{\partial \beta} = \frac{1}{n}\sum_{i=1}^{n}(\alpha + \beta x_i - y_i) \cdot x_i$.
The learning rate $\eta$ is crucial: too small and the algorithm takes tiny steps requiring many iterations; too large and it may overshoot the minimum and diverge. A moderate $\eta$ leads to steady decrease in $\mathcal{J}$ over iterations. As we approach the minimum, the gradient magnitude decreases, effectively reducing step size even with constant $\eta$.
Developed by Kevin Yu & Panagiotis Angeloudis