200 Lines. Zero Dependencies. Andrej Karpathy Just Distilled a Decade into a Single File.

Verdict: Read it. Right now. If you claim to build AI products and you haven’t studied this file, you’re flying blind.

Andrej Karpathy just dropped microgpt — 200 lines of pure Python, zero external dependencies, and a fully working GPT inside. Not a toy. Not a tutorial scaffold. The actual thing: dataset loader, character-level tokeniser, autograd engine, GPT-2-like transformer architecture, Adam optimiser, training loop, inference loop. All of it, in a single file that fits on a single screen if you use three columns.

This is what a decade of obsession looks like when it crystallises.

What’s Actually in the 200 Lines

Don’t let “200 lines” fool you into thinking this is a stripped-down demo. It’s dense. Here’s what’s packed in:

Dataset — downloads 32,000 names, one per line. Each name is a document. Dead simple, so nothing distracts from the architecture.
Tokeniser — character-level. Each unique char gets an integer ID. Plus one special BOS (Beginning of Sequence) token. Vocabulary of 27. That’s it.
Autograd engine — a Value class that does scalar automatic differentiation. Forward pass builds a computation graph; backward() walks it in reverse using the chain rule. No PyTorch. No NumPy. Just Python and math.
GPT-2-like architecture — attention, feedforward, layernorm. The real deal.
Adam optimiser — implemented from scratch in the same file.
Training loop + inference — train it, then sample new names. It actually works. You get plausible outputs like kamon, anna, lara, yeran.

The Lineage: A Decade in the Making

This didn’t appear from nowhere. microgpt is the final form of a lineage:

micrograd (2020) — Karpathy’s original scalar autograd engine. ~100 lines. Taught backprop to a generation of developers.
makemore — character-level language model series. Went from bigrams to transformers step by step.
nanoGPT — a clean, trainable GPT-2 in PyTorch. The go-to “real” reference implementation.
microgpt (2026) — all of the above, merged, compressed, and stripped to the minimum irreducible core.

Karpathy describes it as his “art project.” That undersells it. It’s closer to a proof — a formal demonstration that this is the minimum description of a working LLM. Everything else in modern AI is efficiency on top of these primitives.

Why This Matters for Builders

If you’re building on top of LLMs — using APIs, fine-tuning models, designing RAG pipelines — microgpt is the ground truth you should have internalised before you started.

Here’s what it hammers home:

Everything is prediction. Your chat conversation with GPT-4? From the model’s perspective, it’s just statistical document completion. Prompt goes in, likely next tokens come out. That’s the whole game.

Abstractions hide the ball. Every framework layer you add — HuggingFace, LangChain, whatever — is adding distance between you and what’s actually happening. microgpt removes all of it. When your production system behaves weirdly, this is the mental model you need to debug it.

Simplicity is a design achievement. The autograd engine is ~30 lines. It supports add, multiply, power, log, exp, relu. That’s enough to train a transformer. Most of us would have reached for PyTorch without thinking. Karpathy didn’t. That choice — refusing to reach for the abstraction — is what makes this educational.

No dependencies. No abstractions. No excuses.

How to Actually Use It

Three ways to engage:

Read the gist — just read it like a paper. One sitting. 200 lines.
Run it in Google Colab — free, zero setup, watch it train and generate names in real time.
Modify it — swap the dataset. Change the architecture. Break things intentionally. That’s where you actually learn it.

If you’re onboarding a junior dev or an engineer transitioning into ML, this is the first thing to put in front of them. Before any course. Before any framework tutorial.

The Verdict

microgpt is the best single ML resource released in years. It won’t replace PyTorch for production. It won’t train GPT-4. It does something more valuable: it shows you exactly what a language model is, with nowhere to hide.

Karpathy called it beautiful. He’s right. Read it once to understand transformers. Read it twice to understand what good engineering looks like. Then go build something that matters.

The file is 200 lines. You have no excuse not to read it today.

→ Read microgpt on GitHub Gist