1D Classification

Classification is a fundamental machine learning task where we predict discrete categories rather than continuous values. This demo explores binary classification in one dimension, comparing linear models with and without sigmoid activation.

In civil engineering, classification problems include failure prediction, quality assessment, safety categorization, and risk classification for structures and systems.

Threshold selection: While this demo uses the standard threshold of 0.5 (predict class 1 if

P (y = 1) \geq 0.5

), in practice we often adjust this threshold based on application requirements. For safety-critical applications (e.g., structural failure detection), we might lower the threshold to

t = 0.3

to increase sensitivity—accepting more false alarms to avoid missing critical cases. The handout discusses this practical consideration in detail.

• Adjust

α

(slope) and

β

(position) sliders, and toggle sigmoid activation to compare linear vs sigmoid models
• Click above/below y=0.5 to add class 1/0 points, or use "Generate New Data" and "Clear All Data" buttons
• Observe how sigmoid activation changes predictions, loss functions, and decision boundaries

How does the sigmoid activation change the model behavior? Which loss function is more appropriate for classification?

• Decision boundary: The point where the model switches between classes
• Sigmoid benefits: Outputs probabilities (0-1 range) and smooth transitions
• Linear vs sigmoid: Linear can predict outside [0,1], sigmoid cannot
• Accuracy: Percentage of correctly classified points (same for both modes)
• Cross-entropy loss: More appropriate for classification than MSE
• Parameter interpretation:

α

controls steepness,

β

controls position

Classifier Type

α

(Slope)

0.1

β

(Threshold)

0.0

f (x) = 0.1 x + 0.0

Accuracy = 46.7 %

Loss = 0.332

Predictions Table

ID	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14
$x$	-6.94	-6.47	-6.10	-5.81	-5.76	-4.66	-3.28	0.85	1.43	2.72	3.09	3.56	4.43	5.08	6.56
True $y$	0	0	0	1	0	0	0	1	1	1	1	1	1	1	0
Pred $y$	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.09	0.14	0.27	0.31	0.36	0.44	0.51	0.66
Correct	✓	✓	✓	✗	✓	✓	✓	✗	✗	✗	✗	✗	✗	✓	✗
Confidence	50.0%	50.0%	50.0%	50.0%	50.0%	50.0%	50.0%	41.5%	35.7%	22.8%	19.1%	14.4%	5.7%	0.8%	15.6%

Mathematical Foundations

The Sigmoid (Logistic) Function maps any real value to a probability between 0 and 1: $σ (z) = \frac{1}{1 + e^{- z}}$
The sigmoid has several crucial properties: its range is bounded to $(0, 1)$ , making outputs interpretable as probabilities; it is smooth and differentiable everywhere, enabling gradient-based optimization; it is monotonic, with large positive values yielding $σ (z) \approx 1$ and large negative values yielding $σ (z) \approx 0$ ; and it is symmetric around $z = 0$ where $σ (0) = 0.5$ .

Logistic Regression Model: The model computes a weighted combination of inputs and passes it through the sigmoid to obtain a probability: $h (x) = P (y = 1 ∣ x) = σ (α x + β) = \frac{1}{1 + e^{- (α x + β)}}$
To classify, we use a threshold (typically 0.5): predict class 1 if $h (x) \geq 0.5$ , otherwise predict class 0. The decision boundary occurs where $α x + β = 0$ (where the sigmoid equals 0.5).

Loss Functions: Unlike regression, we cannot use mean squared error (MSE) for classification because combining MSE with the sigmoid output creates a non-convex cost surface with multiple local minima. Instead, we use cross-entropy loss (also called log loss or negative log-likelihood), which is designed for probabilistic classification and remains convex: $J (α, β) = - \frac{1}{m} \sum_{i = 1}^{m} [y^{(i)} \log (h (x^{(i)})) + (1 - y^{(i)}) \log (1 - h (x^{(i)}))]$
Cross-entropy heavily penalizes confident wrong predictions: predicting probability 0.9 for class 1 when the true label is 0 incurs much larger loss than predicting 0.6. This encourages the model to be calibrated and honest about uncertainty.

Practical guidance: Use sigmoid activation for classification to obtain probabilistic outputs and smooth decision boundaries. The sigmoid ensures predictions stay in the valid probability range [0,1] and provides confidence estimates. Cross-entropy loss is the standard choice for training classification models because it properly handles probabilistic predictions and enables efficient gradient-based optimization.

Developed by Kevin Yu & Panagiotis Angeloudis