Hierarchical Clustering

Hierarchical clustering builds a tree of clusters, revealing data structure at multiple scales. Unlike k-means, it does not require specifying the number of clusters in advance and produces a dendrogram showing nested relationships.

This algorithm is particularly useful for:

  • Taxonomy creation - discovering natural hierarchies
  • Infrastructure portfolios - multi-scale asset grouping
  • Phylogenetic analysis - evolutionary relationships
  • Document clustering - organizing text by similarity

Agglomerative hierarchical clustering starts with each point as its own cluster, then iteratively merges the closest pairs until a single cluster remains.

Mathematical Foundation

Agglomerative Algorithm:

  1. Initialize: Start with m clusters, one per data point
  2. Compute: Calculate pairwise distances between all clusters
  3. Merge: Combine the two closest clusters into a new cluster
  4. Record: Store merge height in dendrogram
  5. Repeat: Steps 2-4 until one cluster remains

Linkage Criteria: Different methods for computing cluster distances:

Single Linkage: d(A,B)=minxA,yB||xy||

Minimum distance between any two points. Creates elongated "chain-like" clusters.

Complete Linkage: d(A,B)=maxxA,yB||xy||

Maximum distance between any two points. Creates compact, spherical clusters.

Average Linkage: d(A,B)=1|A||B|xAyB||xy||

Mean distance between all pairs. Balanced compromise between single and complete.

Ward's Method: Δ(A,B)=|A||B||A|+|B|||μAμB||2

Minimize within-cluster variance increase. Similar to k-means objective.

Dendrogram Interpretation: The tree diagram shows merge order and distances. Cutting at different heights produces different numbers of clusters.

How to Use This Demo

This demo illustrates agglomerative hierarchical clustering, which starts with each point as its own cluster and iteratively merges the closest pairs.

Controls:

  • Linkage Method - Choose how cluster distances are measured (Single, Complete, Average, or Ward's)
  • Step Forward - Merge the two closest clusters. Watch the dendrogram grow and colors update
  • Reset - Return to initial state with each point as its own cluster (keeps current data)
  • New Data - Generate a fresh set of random data points
  • Add Points - Check the box and click on the scatter plot to add your own data points

What to Observe:

  • Color stability: When two clusters merge, only those points change color (inheriting from the larger cluster). Other clusters maintain their colors.
  • Merge distances: Watch "Last Merge Distance" increase as you step through. Early merges join very similar points; later merges join distinct groups.
  • Dendrogram structure: The tree grows upward. Height shows merge distance. Large vertical gaps suggest natural cluster boundaries.
  • Linkage effects: Different methods produce different merge orders. Try Single vs Complete on the same data to see dramatic differences.

Suggested Exploration:

  1. Generate new data and step through completely with Average linkage
  2. Reset and try Complete linkage - notice which clusters merge first
  3. Add outlier points and observe when they merge into clusters
  4. Look for large jumps in merge distance indicating natural groupings

Quick Tips

  • Watch the dendrogram gap: pause when you see a tall vertical jump—cutting the tree there usually reveals the clearest grouping.
  • Swap linkage modes: Single stretches through skinny shapes, Complete prefers compact blobs, Average is the safe middle, Ward behaves most like k-means.
  • Track the stats panel: the merge counter and "Last Merge Distance" give early warnings that you are about to fuse very different clusters.
  • Add an outlier: drop a point far from the rest and notice how only Ward's method delays merging it until the end.
  • Use color memory: after a merge only one colour changes—retrace how the cluster evolved by following that colour in the scatter plot.
  • Cut interactively: once the dendrogram finishes, imagine slicing it at different heights and count the branches to decide how many clusters you would keep.

Scatter Plot

Dendrogram

0
18
18