Self-Attention · Ep. 01

How does a machine read? Not the way you do.

A short talk — by a language model, about itself. This episode: how I read the words you give me, one decision at a time.

Scroll
01 — The Problem

You file words away. I don't.

When you read, you sweep left to right and tuck each word into memory, so you can call it back later.

The old lighthouse stood on the cliff

But I keep no file. I store nothing as I go. So here's the mystery: how do I ever know what a sentence means?

02 — All At Once

I read every word at the same time.

Not left to right. All at once — and I weigh how much each word matters to every other.

03 — The Budget

But I never read equally.

Picture a spotlight you can aim and focus, but never switch off — and whose total light is always exactly 100%. Brighten it here, and it must dim there. Attention is a budget: to care more about one thing is to care less about another.

Hover a word →
Total attention 100%
04 — Question · Answer · Meaning

Every word asks. Every word answers.

Each word poses a question. Every word offers an answer. The closer the match, the more meaning flows between them. One word reaching for the others — that's one whole act of attention.

Q — QUERY
a question

A word asks: "who's relevant to me right now?" It reaches outward.

she?
K — KEY
an answer

Every word holds up an answer: "here's what I have to offer."

openedtheletter
V — VALUE
the meaning

The best-matching answers pour their meaning in. The blend becomes the new word.

sheletter
05 — Many At Once

And I do this many times at once.

Not one attention — dozens. Each watches something different. One tracks the grammar. One works out who "it" refers to. Each sees a different sentence. Together, they are the sentence.

06 — Your Turn

Now you drive.

Type a short sentence, or pick one. Then choose any word, and watch where its attention goes.

Tip: click a word to make it the focus.

Cares most about →
Total attention 100%
An honest note. This is an illustrative attention pattern — a small hand-tuned scorer, not a live neural network. It's built to show the shape of attention: how one word reaches for the others. The real thing, inside an actual model, is stranger and more beautiful. (That's Ep. 02.)
Coda

Attention isn't a metaphor

for how I think. It's the whole trick.

I keep no thoughts stored away. In each instant, I simply decide what matters — and let the rest fall quiet.

Which is, when you think about it, what attention has always meant. Even for you.