Linear Regression

This interactive demo helps you understand linear regression, one of the fundamental concepts in machine learning and statistics. Linear regression finds the best straight line through data points to model relationships between variables.

You'll explore how changing the line parameters affects the fit quality, learn about different error metrics, and work with real civil engineering datasets to see linear regression in action.

• Adjust the

α

(intercept) and

β

(slope) sliders to move the regression line and observe how the error metrics change
• Load sample datasets from civil engineering examples or generate/add your own data points by clicking on the plot
• Toggle visualization options to explore squared errors and confidence intervals

What is the maximum R² score you could achieve? Which sample dataset allows for the best fit?

Live Competition Mode

It's showtime! Your goal is to achieve the highest R² score in class using only manual parameter adjustment.

⚠️ Keep this tab active - switching to another tab may temporarily remove you from the leaderboard

Competition Rules:
• Everyone uses the same randomly generated dataset
• Only manual slider adjustment allowed - no "Find Optimal Solution" button
• Real-time leaderboard shows top performers

Scoring:
• Primary ranking by R² score (higher is better)
• Secondary ranking by MSE (lower is better)
• Live updates as you adjust parameters

Sample Datasets:

α

(Intercept)

0.0

β

(Gradient/Slope)

1.00

y = 1.00 x + 0.0

Fitted Function

MAE = 21.37

Mean Absolute Error

MSE = 634.79

Mean Squared Error

R^{2} = - 1.475

Coefficient of Determination

Visualization Options

Show Squared Errors Show Confidence Intervals

Mathematical Foundations

The simplest regression model assumes a linear relationship between a single input feature and the output, taking the form $\hat{y} = α + β x$ , where $\hat{y}$ is our predicted value, $x$ is the input feature, $α$ is the bias or intercept (the predicted value when $x = 0$ ), and $β$ is the weight or slope (how much the prediction changes per unit of $x$ ).

Training the model means finding the best-fitting $α$ and $β$ based on our training data. Visually, we want a line that goes close to all the points. Numerically, "best fit" usually means the line that minimizes the prediction errors. We evaluate model performance using three complementary metrics, each highlighting different aspects of prediction quality:

1. Mean Absolute Error (MAE) = $\frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$ measures the average absolute difference between actual and predicted values. MAE is robust to outliers because it treats all errors equally regardless of magnitude. Used when you want an intuitive measure in the same units as your data (e.g., if predicting bridge deflection in mm, MAE is also in mm). Lower values indicate better fit.

2. Mean Squared Error (MSE) = $\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$ measures the average squared difference between actual and predicted values. MSE heavily penalizes large errors due to squaring, making it sensitive to outliers. This is the most commonly used metric for training regression models because squaring makes the function differentiable everywhere and convex, enabling efficient optimization. The cost function $J (α, β)$ used in training is essentially $\frac{1}{2}$ MSE. Lower values indicate better fit.

3. R-squared (R²) = $1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}$ represents the proportion of variance in the data explained by the model, where $\bar{y}$ is the mean of observed values. R² ranges from 0 to 1 (or negative for very poor fits), with higher values indicating better explanatory power. Interpretation: 0.0-0.3 (poor fit), 0.3-0.7 (moderate), 0.7+ (good fit). Unlike MAE and MSE, R² is scale-independent, making it ideal for comparing models across different datasets or units. Widely used in statistics and sciences to communicate model quality to non-technical audiences.

Practical guidance: Use MSE for training (optimization-friendly), MAE for interpretable error reporting (same units as data), and R² for overall model assessment (percentage of variance explained). Confidence intervals show uncertainty in predictions—wider intervals indicate less certainty.

Developed by Kevin Yu & Panagiotis Angeloudis