Instructions:
• Select a sequence from the dropdown to see how attention processes different inputs
• Hover over input tokens to see their corresponding Q, K, V vectors highlighted
• Hover over output embeddings to trace how they were computed from attention weights and values
• Click "Randomize Weights" to see different random weight initializations
• All computations follow:
Educational Purpose:
• This demo shows untrained attention with random initialization
• Designed to build visual intuition for attention mechanics, not realistic outputs
• Uses causal masking (lower triangular) typical in language models
• Fixed dimensions: , for clear visualization
Visualization:
• Blue (Q): Query vectors - what each token "asks for"
• Green (K): Key vectors - what each token "offers"
• Red (V): Value vectors - information each token contains
• Attention Weights: Shows which tokens attend to which (causal mask applied)
• Light Blue (O): Output embeddings - final attention results