Geometric PCA Intuition

Principal Component Analysis (PCA) is a fundamental dimensionality reduction technique that identifies the directions of maximum variance in your data. This interactive demo visualizes how PCA discovers these principal directions in 2D space.

Watch as PCA behaves like a smart compass, automatically aligning PC1 and PC2 with the data's natural spread regardless of correlation, noise, or position. The projections show how each point maps onto the principal component axes.

Mathematical Foundation

PCA works by computing the eigenvectors and eigenvalues of the data covariance matrix. The eigenvector with the largest eigenvalue becomes PC1, capturing the direction of maximum variance. The second eigenvector (orthogonal to the first) becomes PC2.

The covariance matrix structure:

Σ=[Var(x)Cov(x,y)Cov(x,y)Var(y)]

In 2D, PC1 + PC2 always accounts for 100% of the variance. The variance explained by each component tells you how much information is retained when projecting onto that axis. High correlation creates elongated clouds where PC1 captures most of the variance.

Hover over any point to see its position in the original space and its projections onto PC1 and PC2 simultaneously.

Using the Demo

Controls:

  • Correlation (ρ) – drag from −1 to +1 to control how elongated the point cloud becomes. High correlation means PC1 captures most variance.
  • Noise level – add isotropic (uniform in all directions) noise to blur the distinction between PCs.
  • Number of points – increase sample size to reduce sampling noise and get more stable PC estimates.
  • Mean offset – shift the cloud away from the origin to verify that PCA is translation-invariant.

Buttons:

  • Reset Defaults – restore original settings.
  • Generate Dataset – resample new random points with current parameter settings.

Interactive highlighting: Hover over any point in the scatter plot or projection plots to see corresponding highlights across all views.

Quick Tips

  • Start neutral: set ρ ≈ 0 and low noise to see a circular cloud where PC1 and PC2 can point anywhere—perfect reminder that PCA cares only about variance.
  • Pump up correlation: slide ρ toward ±1 and watch PC1 snap to the long axis while the variance badge shows it claiming most of the information.
  • Add noise sparingly: increasing the noise slider instantly shortens the PC arrows; use it to illustrate how messy measurements dilute variance.
  • Play with sample size: drag the point-count slider down to 40 to observe noisy eigenvectors, then back up to confirm they stabilise with more data.
  • Shift the cloud: change the mean offsets to prove PCA recentres the data before analysis—the arrows stay put even though the cloud moves.
  • Use projections as a checklist: if you’re unsure what a slider change did, glance at the projection histograms; balanced bars mean the variance is split evenly.

Point Cloud & PCA Axes

PC arrows originate at the data mean; lengths reflect variance captured.

Samples 160
Sample Corr -0.12
Mean (0.03, 0.03)

Projection onto PC1

Projection onto PC2

Variance Explained

PC1 56.6%
PC2 43.4%

PC1 captures 56.6% of the variance; PC2 accounts for the remaining 43.4%.

Covariance Matrix

x y
x 1.12 -0.12
y -0.12 0.99

Dataset Controls

0.00
0.20
160

Use the controls in the sequence suggested in the classroom flow tab, or freestyle to answer “what if” questions from the cohort.