Gradient Descent with Regularisation

Gradient descent moves downhill on the loss surface, but regularisation reshapes that surface to keep coefficients in check. L2 (ridge) adds circular penalty contours that gently pull

(β_{1}, β_{2})

toward the origin, while L1 (lasso) forms diamond-shaped contours with sharp corners on the axes. These distinct geometries change the path the optimiser follows: ridge shrinks parameters smoothly, whereas lasso can snap one coefficient to zero when the trajectory hits a diamond corner. In civil engineering models, this geometric perspective clarifies why regularisation stabilises predictions without relying on noisy measurements.

Use the tabs below to switch between ridge (L2) and lasso (L1) regularisation. Adjust the λ slider to set penalty strength—values near zero mimic unregularised training, while larger values tighten the constraint. Press Train to animate gradient descent from the initial point. Watch the contour plot: coloured contours show training loss, grey shapes mark the regularisation penalty, arrows trace each optimisation step, and the parameter readout updates after every iteration. Compare the train and total loss displays to see how the penalty term contributes to the overall objective.

Legend
• Loss contours: green → yellow → red indicate increasing training loss.
• Regularisation contours: grey circles for L2, grey diamonds for L1.
• Trail dots: small magenta markers for each gradient step; large blue dot is the starting point, large red dot is the final position.
• Arrows: magenta arrows connect successive steps and reveal the descent direction.
• Optimal markers: the green point highlights the unregularised optimum for reference when λ = 0.

Experiment with λ to see how much regularisation you need. λ = 0 recovers pure least squares and the trajectory heads straight for the MSE minimum. Small λ values nudge coefficients toward the origin without drastically changing the path. Large λ values dominate the update: ridge pulls both coefficients inward together, while lasso may zero one coefficient entirely, producing sparse solutions. Compare the final

(β_{1}, β_{2})

values between tabs to understand when an L1 or L2 penalty is preferable for civil engineering feature sets.

β_{1} = - 2.00, β_{2} = 2.00

Fitted Weights

λ (L2 Regularisation)

0.0

6.2556

Training Loss

6.2556

Total Loss

β_{1} = - 2.00, β_{2} = 2.00

Fitted Weights

λ (L1 Regularisation)

0.0

6.2556

Training Loss

6.2556

Total Loss

Geometry of Regularisation

Regularisation reshapes the feasible set that gradient descent explores. The unregularised loss forms elliptical contours because the mean squared error depends quadratically on $β = (β_{1}, β_{2})$ . Adding an L2 penalty introduces concentric circular constraints, so the optimum appears where an ellipse first touches a circle, producing smooth shrinkage toward the origin. L1 regularisation instead adds diamond-shaped constraints with sharp corners on the axes; the optimum often lands on a corner, creating sparse solutions with one coefficient exactly zero. By watching the contour plot you can see how the intersection of loss ellipses and penalty shapes determines the convergence point as λ varies.

Developed by Kevin Yu & Panagiotis Angeloudis