llm.c: The genius of Andrej Karpathy
What’s awesome about Andrej Karpathy’s llm.c isn’t just that it’s a bare-metal, from-scratch implementation of GPT-2 (safety wink definitely required!).
If you take a step back, you’ll see he’s also educating us on how one of the very best in the world hones their craft. He’s stripped away the intermediate layer of libraries - there’s no PyTorch here. Instead, we’re taken back to the basics: an attempt to implement a simple C and CUDA version of GPT-2 in ~1000 lines with no dependencies or frameworks involved.
One thing I’ve noticed is that the very best engineers tend to be doing these types of training routines. I recall working with one who similarly sought to grasp React for the first time by re-implementing it in the least amount of lines of vanilla js possible (they managed it in 33). Another used to do speed runs of our take-home test to work on the muscle memory needed to develop at pace.
Of course, at Karpathy’s level he’s not only training but coaching. He’s shipped a reference implementation with tests confirming that his implementation mirrors a Pytorch-based version of GPT-2 exactly. Moreover, he structures it all in a way that allows the rest of us to follow along at home.
That’s what being world-class at your craft looks like. Andrej’s genius is he’s showing us all how to get there.