Higher-Order Features

The bias-variance tradeoff is fundamental to understanding model complexity in machine learning. When fitting a polynomial to data, we face a critical choice: simple models (low degree) may underfit, while complex models (high degree) may overfit.

This demo uses synthetic data (12 training points, 18 test points) with a complex underlying pattern combining polynomial and trigonometric components. You'll clearly see: degree 1 severely underfits (MSE ~8,500), degree 4 finds the sweet spot (MSE ~1,200), and degree 10 overfits dramatically (train MSE = 17, test MSE = 5,096!).

• Adjust polynomial degree slider (1-10) to see how complexity affects train vs test error
• Check "Show Squared Errors" to visualize MSE—area of squares = prediction errors
• Click "Toggle Test Data" to show/hide test points
• Watch the error chart: Test MSE drops sharply, then explodes at degree 8+

Why does degree 4 minimize test error? Why does train error reach 0 but test error explodes?

• Degree 1: Straight line—severe underfitting (MSE ~8,500)
• Degree 2-3: Still underfitting (MSE ~2,900 → ~260)
• Degree 4: ✓ SWEET SPOT (Test MSE ~1,200)—best generalization!
• Degrees 5-8: Slight overfitting (test MSE ~1,200-1,400)
• Degrees 9-10: Clear overfitting—train MSE drops to 17, test MSE rises to ~5,000!
• Balanced dataset: 12 training, 18 test points show realistic bias-variance progression

Polynomial Degree:

y = 27.95 + 20.17 x

Display Options:

📐

Squared Errors

🔬

Test Data

MSE = 8, 540.6

Training Set

MSE = 6, 556.4

Test Set

Mathematical Foundations

Polynomial Regression models the relationship between temperature $x$ and bike rentals $y$ using a polynomial of degree $d$ : $\hat{y} = β_{0} + β_{1} x + β_{2} x^{2} + \dots + β_{d} x^{d}$
Increasing $d$ allows more flexible curves but risks overfitting to noise.

Bias-Variance Decomposition breaks prediction error into three components: $Expected Error = {Bias}^{2} + Variance + Irreducible Error$
Bias² measures error from wrong model assumptions. High-bias models (low degree polynomials) are too simple to capture the true pattern—they systematically underpredict or overpredict. As model complexity increases, bias decreases.

Variance measures sensitivity to specific training data. High-variance models (high degree polynomials) fit training noise and vary wildly when trained on different samples. As model complexity increases, variance increases.

Irreducible Error is noise inherent in the data that no model can eliminate (weather variations, individual preferences, measurement errors).

The Tradeoff: Optimal model complexity balances bias and variance. Too simple = high bias (underfitting). Too complex = high variance (overfitting). The sweet spot minimizes total error on unseen test data.

Training vs Test Error: Training error always decreases with complexity (more parameters = better fit to training data). Test error follows a U-shaped curve: decreases initially (reducing bias), reaches minimum at optimal complexity, then increases (variance dominates). The gap between train and test error indicates overfitting—larger gap means the model memorized training-specific patterns.

Developed by Kevin Yu & Panagiotis Angeloudis