Regularisation

Regularization is a crucial technique to prevent overfitting in machine learning models. When models become too complex, they can memorize training data rather than learning generalizable patterns.

In civil engineering applications, regularization helps create more robust models for tasks like predicting structural responses, material behavior, or system performance under varying conditions.

• Adjust polynomial degree and L1/L2 regularization sliders—L1 (Lasso) creates sparse coefficients, L2 (Ridge) smooths the curve
• Click the plot to add data points, or use "Generate New Data" and "Clear All Data" buttons
• Try high polynomial degrees with and without regularization to observe overfitting prevention

What polynomial degree causes overfitting without regularization? How does adding L1 or L2 regularization help?

• High polynomial degree: Creates complex curves that may overfit without regularization
• L1 regularization: Watch coefficients become exactly zero - automatic feature selection
• L2 regularization: Smooths the curve and reduces oscillations
• Bias-variance tradeoff: Higher regularization increases bias but reduces variance
• Optimal balance: Find the sweet spot between underfitting and overfitting

y = a₀ + a₁x

Squared Errors

L1 Norm

L2 Norm

MSE = 0.0

MAE = 0.0

Coefficients & Regularization Effects

Mathematical Foundations

The regularized cost function adds a penalty term to the standard mean squared error:
$$J(\boldsymbol{\beta}) = \frac{1}{2m} \sum_{i=1}^{m} \left( h_{\boldsymbol{\beta}}(\mathbf{x}^{(i)}) - y^{(i)} \right)^2 + \lambda \cdot \text{penalty}(\boldsymbol{\beta})$$
where $\lambda \geq 0$ is the regularization parameter controlling penalty strength. When $\lambda = 0$, we recover ordinary least squares (no regularization). As $\lambda$ increases, we constrain parameter magnitudes, favoring simpler models.

L2 Regularization (Ridge) adds the sum of squared parameter values: $\lambda \sum_{j=1}^{n} \beta_j^2$. Ridge regression shrinks all coefficients towards zero proportionally but never eliminates features entirely. This is particularly effective when many features are relevant but should have modest contributions. Ridge tends to perform better when features are correlated, as it distributes weight among correlated features rather than arbitrarily selecting one.

L1 Regularization (Lasso) adds the sum of absolute parameter values: $\lambda \sum_{j=1}^{n} |\beta_j|$. The L1 penalty drives some parameters to exactly zero, performing automatic feature selection. Lasso is preferred when we believe only a sparse subset of features are truly relevant, producing interpretable models by eliminating irrelevant features. However, when many features are correlated, Lasso tends to arbitrarily select one and zero out the others.

Practical guidance: Use Ridge when most features are relevant and should contribute modestly. Use Lasso when you need automatic feature selection and interpretable models with sparse coefficients. Both methods require tuning the regularization strength $\lambda$ (typically via cross-validation). Feature scaling is essential before applying regularization to ensure penalties apply fairly across all features.

Developed by Kevin Yu & Panagiotis Angeloudis