Logistic Regression in 2D uses the sigmoid function to map any decision function to a probability between 0 and 1:
$$P(y=1 \mid \mathbf{x}) = \sigma(f(\mathbf{x})) = \frac{1}{1 + e^{-f(\mathbf{x})}}$$
where $f(\mathbf{x})$ is the decision function that depends on the boundary type.
Linear Decision Boundary: The simplest case uses a linear decision function:
$$z = \theta_0 + \theta_1x_1 + \theta_2x_2$$
The decision boundary occurs where $z = 0$ (where probability equals 0.5), which forms a straight line in 2D space. Points on one side of this line are classified as class 1, points on the other side as class 0. The parameters $\theta_1$ and $\theta_2$ control the orientation (slope) of the boundary, while the bias $\theta_0$ controls its position (intercept).
Quadratic Decision Boundary: For non-linearly separable data, we can use quadratic terms to create curved boundaries:
$$z = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_{11}x_1^2 + \theta_{22}x_2^2 + \theta_{12}x_1x_2$$
This allows the decision boundary to form ellipses, parabolas, or hyperbolas depending on the learned parameters. The quadratic terms $\theta_{11}x_1^2$ and $\theta_{22}x_2^2$ create curvature along each axis, while the interaction term $\theta_{12}x_1x_2$ allows for rotated or skewed boundaries.
Training and Optimization: As with 1D classification, we train using cross-entropy loss:
$$J(\boldsymbol{\theta}) = -\frac{1}{m}\sum_{i=1}^{m}\left[y^{(i)}\log(h(\mathbf{x}^{(i)})) + (1-y^{(i)})\log(1-h(\mathbf{x}^{(i)}))\right]$$
and optimize using gradient descent. The "Find Optimal Solution" button in this demo uses gradient descent to minimize cross-entropy loss for the selected boundary type.
Practical guidance: Start with linear boundaries for linearly separable data—they are simpler, more interpretable, and less prone to overfitting. Use quadratic (or higher-order) boundaries when data exhibits curved separation patterns that cannot be captured by a straight line. Always validate on separate test data to ensure the model generalizes beyond the training set. For engineering applications where interpretability matters, prefer simpler (linear) boundaries when performance is comparable.
Developed by Kevin Yu & Panagiotis Angeloudis