CNN Architecture

Convolutional Neural Networks (CNNs)

This interactive demo shows how Convolutional Neural Networks process and classify handwritten digits from the MNIST dataset. You'll explore the internal workings of a real CNN, watching as data flows through convolutional layers, pooling layers, and fully connected layers.

The demo uses actual PyTorch-trained weights, letting you see how feature maps evolve from random noise to meaningful patterns as the network learns through training epochs.

CNN Architecture Components:

• Convolutional Layers: Apply filters to detect features like edges and shapes
• Pooling Layers: Reduce spatial dimensions while preserving important information
• Feature Maps: Visual representations of what each layer detects
• ReLU Activation: Introduces non-linearity by setting negative values to zero
• Softmax Output: Converts final layer to probability distribution over 10 digits

Key Concepts:
• Hierarchical Learning: Early layers detect simple features, deeper layers combine them into complex patterns
• Translation Invariance: CNNs can recognize patterns regardless of position in the image
• Parameter Sharing: Same filter weights used across entire image reduces overfitting

How to Use:

• Use the numpad to select MNIST digits and see real CNN processing
• Click "Draw Your Own" to draw custom digits for classification
• Watch feature maps update as data flows through each layer
• Start at epoch 0 (random weights) and step through training
• Click "Train Epoch" to advance one training epoch at a time
• Observe how feature maps evolve as the network learns

CNN Architecture:
• Input Layer: 28×28 grayscale digit images
• Conv Layer 1: 4 filters (3×3) → ReLU activation
• Max Pool 1: 2×2 pooling → reduces spatial size
• Conv Layer 2: 8 filters (3×3) → ReLU activation
• Max Pool 2: 2×2 pooling → further size reduction
• Flatten: Convert 2D feature maps to 1D vector
• Dense Layer: 128 neurons → ReLU activation
• Output Layer: 10 neurons → Softmax (digit probabilities)

Understanding Feature Maps:

• Early Layers (Conv1): Look for edge detection and basic shapes
• Deeper Layers (Conv2): Watch for more complex patterns and digit components
• Brightness = Activation: Brighter pixels indicate stronger feature activation
• Training Progress: Feature maps become more structured as training progresses

Observing Learning:
• Epoch 0: Random weights produce noisy, meaningless feature maps
• Early Epochs (1-5): Basic patterns start to emerge
• Later Epochs (10+): Clear, structured features that respond to specific digit characteristics
• Final Epochs: Highly specialized features for accurate digit classification

Drawing Tips:
• Draw digits clearly in the center of the canvas
• Use bold strokes - thin lines may not be detected well
• Try different digit styles to see how the network responds

Input Image

Prediction: 9

CNN Architecture

Feature Maps at Different Layers

After Conv1 (4 filters)

After Pool1

After Conv2 (8 filters)

After Pool2

FC1 (128 units)

Output Probabilities

0.09
0.09
0.10
0.10
0.11
0.09
0.11
0.11
0.10
0.11

Sample Digit:

Training Simulation:

Current Training Stats:

Epoch: 0/20
Loss: N/A
Accuracy: 10.0%