On ICPs vs Strength of PMF

Hot take. Product teams talking too much about ICPs is a red flag. ICPs are for sales and marketing teams. They’re at the blunt end and need to narrow their focus to maximise win rate and build a hyper efficient growth engine.

Product teams need to know and understand their ICP to support prioritisation, but they should be thinking in terms of Product Market Fit strength across segments. We need to have that peripheral vision and understand the whole picture. This is how you expand PMF and …

... [... 91 words]

Karpathy's Vibes Check

I thought this post was interesting, not so much for conclusion about Grok 3 but instead for the range of tests that Andrej performs to get a feel for the capabilities of the model in <=~2 hours. It’s all there - the recall/reasoning without search of the GPT-2 training FLOPs, a few varied dev tasks, research tasks, search tasks (including a gut feel for hallucinations), ethics, personality, then a battery of standard LLM assessments (‘r’s in strawberry, 9.11 > 9.9, …

... [... 160 words]

Quoting Harper Reed

Another day, another AI dev flow. There’s some common patterns emerging now (use of markdown files like spec.md, todo.md etc.) and I thought the blog gave a nice step by step guide and prompts to borrow. Basically the advice reduces to “spend a lot of time planning with reasoning models up front”. I liked this thought too:

I have spent years coding by myself, years coding as a pair, and years coding in a team. It is always better with people. These workflows are not easy to …

... [... 140 words]

Quoting Nelson Elhage

Great post from Nelson Elhage (Anthropic pre-training team) on adventures coding with Sonnet. Much of the post just describes the same journey that a lot of us are on at the moment (I’m still finding these posts fun to read, I wonder when the sense of wonder will be replaced by one of fatigue?), but there’s a couple of thoughtful nuggets towards the end that I’ve pulled out here:

You can now generate thousands of lines of code at a price of mere cents; but no human will …

... [... 240 words]

S1: Scalable test-time compute for $6

The title is a little click-baity, but the analysis of the paper in the blog is great. A fast download of one (quite hacky, fun) approach to getting scalable test-time compute.

Quoting Dario Amodei

The insights here are not novel, but Dario provides a strong mental model of how the AI system will keep evolving over time:

Shifting the curve. The field is constantly coming up with ideas, large and small, that make things more effective or efficient: it could be an improvement to the architecture of the model (a tweak to the basic Transformer architecture that all of today’s models use) or simply a way of running the model more efficiently on the underlying hardware. New generations of …

... [... 355 words]

More AI Rollups

Here they come, Rocketable is a YCW25 batch startup following the AI Rollup model (see previous post). The plan here is to purchase profitable SaaS companies throwing off cash and use that cash to bootstrap more purchases, Omaha style. The investment thesis is the application of AI/agents allows full automation of any work done by humans within these small SaaS co’s (as it’s likely to be generic one assumes).

Feels like a tricky one, the exact businesses willing to sell in this niche …

... [... 175 words]

AI Rollups

There’s a few pieces on AI Rollups floating around and I think it’s worth getting familiar with the model as it looks like a trend.

The tl;dr is that if you build a vertical SaaS product you can grab more return not by making pure software sales, but instead by buying businesses and then leading the transformation of applying the software to that business; this is known as the growth buyout. The oft-cited example of the model is Metropolis, who worked out number plate recognition for …

... [... 249 words]

Quoting Sean Goedecke

I’ve long thought consistency is king - I think this applies in codebases of all sizes, not just those in the single digit millions as Sean describes. Here’s the summary, though the full article is worth a read:

Large codebases are worth working in because they usually pay your salary

By far the most important thing is consistency

Never start a feature without first researching prior art in the codebase

If you don’t follow existing patterns, you better have a very good reason for it …

... [... 133 words]

DeepSeek-v3

DeepSeek-v3 dropped on Christmas day (!) a gigantic mixture of experts type model (671b total parameters) which sets a new SOTA performance for open source. Why should I care? What does this even mean? Well, the big news here is the training efficiency.

Firstly the total training cost was ~$5.5m (2.78m GPU hours). Now, this is the GPU cost of the training run only, not a total load (i.e. stuff like R&D and staffing costs are not included) but that’s a big gain. By way of comparison, …

... [... 198 words]

Hot takes on o3

Everywhere seems to be full of hype around o3 since Friday’s annoucement from OpenAI so I thought I’d summarise a few points I’ve seen shared in various places but not yet gathered in one place. We’re going to zoom in mostly on the ARC-AGI results, as I think that is the most interesting part. Before we do that, let’s introduce the ARC challenge.

ARC (Abstract Reasoning Corpus) was designed/created by François Chovllet, Author of both Deep Learning with Python and …

... [... 1040 words]

WebDev Leaderboard

Webdev Arena builds on the Chatbot Arena concept but provides a coding-specific benchmark that offers an extremely fast and cheap way for you to evaluate the vibes of the different models out there.

Given a prompt and two anonymised LLMs the arena builds two output React/Typescript/Tailwind apps side by side for you to evaluate - serving them up in an e2b standbox.

I suspect that as the frontier keeps moving it’s worth refining the prompt you use to test models (spend a bit of time making …

... [... 143 words]

Quoting Will Whitney

Some interesting ideas from Will on using generative AI to either manage the set of UI components shown to the user or generating the UI in raw pixels on the fly as we’re starting to see in gaming (i.e. Genie 2). I think a pixel based approach would be very complicated to do reliably, but an approach where a model dynamically generated the UI from a set of pre-defined components would be very interesting. Worth a read and a ponder about where we’re headed:

In place of a single …

... [... 156 words]

Byte Latent Transformer: Patches Scale Better Than Tokens

Interesting paper from Meta that has been generating some buzz:

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented dynamically based on the entropy of the next byte, allocating more compute and model …

... [... 329 words]

Sora: An idiot's guide

OpenAI | 2025 | Technical Report
Sarah Guo, Elad Gill, Aditya Ramesh, Tim Brooks, Bill Peebles | 2025 | Podcast

This post has been sat in my drafts for well over 6 months now, but with yesterday’s release of Sora in GA I thought I’d have a go at explaining how Sora might be working under the hood, and in particular a breakthrough that OpenAI made (and I assume competitors have now replicated) called Latent Space Time Patches.

I’ve tried to do this in simple, non-technical …

... [... 1750 words]

Fish Eyes

I thought this was a brilliant, thought-provoking piece on how to use zoom with text in the LLM era from Amelia Wattenberger. Worth it for the fish animations alone in my book (make sure to keep clicking as you scroll) but there’s a tonne of nice ideas here 👀

Aurora DSQL

Insightful piece from Marc Brooker on Aurora DSQL, which was announced at AWS re:invent this week. DSQL stands for “distributed sql”. The idea is to get ACID semantics at gigantic scale with Postgres compatibility (psql works with Aurora DSQL as a backend):

We built a team to go do something audacious: build a new distributed database system, with SQL and ACID, global active-active, scalability both up and down (with independent scaling of compute, reads, writes, and storage), …

... [... 649 words]

Baked Search: Building semantic search quickly for toy use cases

Decent quality semantic search has got much easier and cheaper to ship yourself in the last couple of years. I thought I’d try and write a super quick guide that gets a search backend up and running as quickly and cheaply as possible.

The guide assumes that you have a toy use case - you’re building as a hobbyist. The example I’ve chosen is writing search for a blog - specifically a blog built using a static site generator like Hugo, Jekyll, Gatsby etc (like this one!). To do …

... [... 1097 words]

Meritech ServiceTitan S1 Analysis

This S1 analysis from Meritech went viral due to the (compounding!) IPO ratchet that ServiceTitan are subject to after the Series H funding they took 18 months ago. About halfway down there’s some handy benchmarks for median/top decile pre-IPO performance in vertical SaaS, I’ve pocketed them for reference (maybe they’ll come in handy one day!), so I thought I’d reproduce them here:

Performance by EV / ARR PercentileTop DecileMedianServiceTitan
Financial Metrics …
... [... 219 words]

If you're still looking for something, you can browse an index of all blogs here.