Ask a language model to multiply two large numbers and it will often get it wrong. Not approximately wrong — embarrassingly wrong, in a way that feels incongruous coming from a system that can explain Keynesian economics or debug your code. The model will give you an answer with total confidence, and the answer will be off by thousands.
This isn't a knowledge gap. The model has seen arithmetic. The problem is structural: language models generate the next plausible token, one at a time, by pattern-matching against everything they've learned. That process is genuinely powerful for language. It is a bad fit for computation, which requires exact operations that are either right or wrong with no useful middle ground.
The standard fix is to hand the problem off. Give the model a calculator. Let it call an external tool, get a result, and paste the answer back into the conversation. This works, but it means the computation is happening somewhere else — the model is a coordinator, not a computer.
A group of researchers at a company called Percepta recently released a paper claiming a different approach. They didn't give the model a calculator. They built a small, purpose-built model that demonstrates a calculator was already latent inside the architecture — waiting for someone to aim it correctly.
Here's what they found: one component of a language model — called an attention head — has a job that looks like fuzzy pattern-matching but is, under the right conditions, something much more precise. An attention head looks back over every previous token and scores them: how relevant is each one to what I'm predicting right now? Normally it blends the top results together, weighted by those scores. Useful for language, where meaning is distributed and context is everything. Useless for computation, where you need one exact answer, not a weighted average.
What the Percepta paper claims is that if you constrain an attention head to operate in just two dimensions — instead of the usual dozens — the math it's doing becomes something a computational geometry aficionado would recognize: find the point in this set that's furthest in a given direction. That operation has a known fast solution. You don't scan every point; you only check the ones on the outer boundary of the set, and you get the exact answer in logarithmic time. Exact, not approximate. One result, not a blend.
Think of it as Marco Polo in a crowded pool. The model calls out, and instead of everyone answering at once, exactly one person answers — not because they're the only one there, but because of where they're standing relative to everyone else. The structure of the crowd does the lookup.
That's a memory lookup. The same primitive a processor runs thousands of times a second. Once you have it, you have what you need to build registers, a stack, an instruction pointer — the working parts of a computer. So they compiled an interpreter into the weights of their model, and it could execute programs not by describing what a program does, but by actually running it, one token at a time, inside its own process.
The historical parallel is the math coprocessor. In the 1980s, if you needed fast floating point arithmetic, you added a second chip — the Intel 8087 — that specialized in exactly that. Your main processor handled general computation; the coprocessor handled the part the main processor was bad at. Same machine, different execution model, clean handoff between the two. Think of the large language model as a processor for language: powerful precisely because it's fuzzy, trading exactness for flexibility. The math coprocessor is the opposite — it works by eliminating fuzziness entirely, collapsing the model's probabilistic machinery down to a single exact answer when the problem calls for one.
What Percepta's paper describes has the same architecture in concept. A transformer that reasons flexibly, with a computational substrate inside it for the parts that require exactness. The difference from the 8087 is that this coprocessor can't be removed. It's not a separate chip. It's the weights — which means it's inside every step the model takes.
That's what makes the idea structurally interesting, if it holds up. An external calculator is still outside. This is inside. And because it's inside, it's part of the model's own process — which means future models could learn when to invoke it and how to weave its outputs into everything else they're doing. The computation stops being a tool call and starts being a thought.
The model wasn't missing computation. It had attention heads that were always doing something that, configured correctly, is a memory lookup. Nobody had aimed them right.