The simplest regression model assumes a linear relationship between a single input feature and the output, taking the form $\hat{y} = \alpha + \beta x$, where $\hat{y}$ is our predicted value, $x$ is the input feature, $\alpha$ is the bias or intercept (the predicted value when $x=0$), and $\beta$ is the weight or slope (how much the prediction changes per unit of $x$).
Training the model means finding the best-fitting $\alpha$ and $\beta$ based on our training data. Visually, we want a line that goes close to all the points. Numerically, "best fit" usually means the line that minimizes the prediction errors.
We evaluate model performance using three complementary metrics, each highlighting different aspects of prediction quality:
1. Mean Absolute Error (MAE) = $\frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|$ measures the average absolute difference between actual and predicted values. MAE is robust to outliers because it treats all errors equally regardless of magnitude. Used when you want an intuitive measure in the same units as your data (e.g., if predicting bridge deflection in mm, MAE is also in mm). Lower values indicate better fit.
2. Mean Squared Error (MSE) = $\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$ measures the average squared difference between actual and predicted values. MSE heavily penalizes large errors due to squaring, making it sensitive to outliers. This is the most commonly used metric for training regression models because squaring makes the function differentiable everywhere and convex, enabling efficient optimization. The cost function $J(\alpha,\beta)$ used in training is essentially $\frac{1}{2}$MSE. Lower values indicate better fit.
3. R-squared (R²) = $1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}$ represents the proportion of variance in the data explained by the model, where $\bar{y}$ is the mean of observed values. R² ranges from 0 to 1 (or negative for very poor fits), with higher values indicating better explanatory power. Interpretation: 0.0-0.3 (poor fit), 0.3-0.7 (moderate), 0.7+ (good fit). Unlike MAE and MSE, R² is scale-independent, making it ideal for comparing models across different datasets or units. Widely used in statistics and sciences to communicate model quality to non-technical audiences.
Practical guidance: Use MSE for training (optimization-friendly), MAE for interpretable error reporting (same units as data), and R² for overall model assessment (percentage of variance explained). Confidence intervals show uncertainty in predictions—wider intervals indicate less certainty.
Developed by Kevin Yu & Panagiotis Angeloudis