Dec 17, 2024
Webdev Arena builds on the Chatbot Arena concept but provides a coding-specific benchmark that offers an extremely fast and cheap way for you to evaluate the vibes of the different models out there.
Given a prompt and two anonymised LLMs the arena builds two output React/Typescript/Tailwind apps side by side for you to evaluate - serving them up in an e2b standbox.
I suspect that as the frontier keeps moving it’s worth refining the prompt you use to test models (spend a bit of time making it hard), then each time a model is released on the leaderboard just come in and get a feel for how your own personal benchmark has changed.
Perhaps a nice way to quickly cut through the hype, get a bit closer to the edge and unlock some useful productivity gains in a tiny amount of time (minutes).
WebDev Leaderboard