Mixture of Experts

Insightful post from J Betker on the MoE architecture. Here’s a few grabs:

The fact that MoE has great scaling properties indicates that something deeper is amiss with this architectural construct. This turns out to be sparsity itself – it is a new free parameter to the scaling laws for which sparsity=1 is suboptimal. Put another way – Chinchilla scaling laws focus on the relationship between data and compute, but MoEs give us another lever: the number of parameters in a neural network. …

... [... 274 words]

Claude Code: Best practices for agentic coding

This is really good, well worth the investment of your time. There is a lot of novel insight here that will shortly become de rigueur. There’s a few bits worth calling out.

The models are now heavily tuned for too use as we all know. gh cli use is baked in:

Claude knows how to use the gh CLI to interact with GitHub for creating issues, opening pull requests, reading comments, and more. Without gh installed, Claude can still use the GitHub API or MCP server if you have those installed. …

... [... 438 words]

A Realistic AI Timeline

Another AI prediction, but I think this one pinpoints some of the blockers much more clearly. In summary:

Roughly: generalist scaling does not work or, at least, not well enough to make meaningul sense for material deployment. Instead, most development, including agentification, happens in the smaller size range with specialized, opinionated training. Any actual “general intelligence” has to take an entirely different direction — one that is almost discouraged by formal evaluation. …

... [... 491 words]

Quoting Kent Beck

Well if Kent Beck is doing it:

Been vibe coding like a fiend. Task breakdown is a highly leveraged human decision. Coding models are both non-deterministic & sensitive to initial conditions. You’ll get very different results having your agent implement Task1->Task2->Task3 or Task2->Task3->Task1.

I don’t have good heuristics yet, I just observe that when I try to implement “the same thing” I get quite different results.

Kent Beck

... [... 68 words]

Quoting Philip Tetlock

The master speaks on AI 2027 forecasts. The discussion of these forecasts has been rumbling on. Kokotajilo himself puts the probability of a supercoder on a 2027 timeline at around 50%

I’m also impressed by Kokotajilo’s 2021 AI forecasts. It raises confidence in his Scenario 2027. But by how much? Tricky! In my earliest work on subjective-probability forecasting, 1984-85, few forecasters guessed how radical a reformer Gorbachev would be. But they were also the slowest to foresee the …

... [... 98 words]

Quoting Neil Mehta

Great piece on Neil Mehta that has been doing the rounds this week. Interesting throughout, Green oaks is very focussed on the founder, which is normal at seed but typically has less emphasis at A, B and onwards. There’s nothing unusual in what he’s saying, I think what is unusual is the level of conviction with which they pursue that one thing.

“This is controversial,” Mehta replied, when asked if the Greenoaks machine has identified an ideal type, “but I do believe there’s an …

... [... 282 words]

AI 2027

I definitely don’t agree with all the predictions here (why do AI nerds always get obsessed with making geopolitical predictions?) and after the end of ‘26 everything goes a bit crazy. However, I see a lot of weak, poorly specified AI predictions so when you see one this detailed I think it is worth paying attention to. As they note, after the end of ‘26 the confidence level drops off. I’d suggest stopping reading at that point to save yourself the time (it’s highly …

... [... 182 words]

As AI’s power grows, so does our workday

AI increases labour supply rather than reduces it, and watch out for those second order effects on society at large:

Occupations more exposed to generative AI saw a rise in work hours immediately following the release of ChatGPT. Compared to workers less exposed to generative AI (such as tire builders, wellhead pumpers, and surgical assistants) those in high-exposure occupations (including computer systems analysts, credit counsellors, and logisticians) worked roughly 3.15 hours more per week …

... [... 425 words]

Jevons Paradox: A Personal Perspective

Great post from Tina He on the future of work in the era of AI. Firstly, we’ve been coming at things all wrong:

Traditional economics might predict that AI-boosted productivity would reduce working hours, a four-day weekend for tasks that once took five days. But reality has different plans. We’re witnessing what I call the “labor rebound effect”—productivity doesn’t eliminate work; it transforms it, multiplies it, elevates its complexity. The time saved becomes …

... [... 315 words]

Quoting Ankit Maloo

Similar to the Model is the Product a couple of weeks ago, the bitter lesson here is that brute forcing problems with compute wins versus clever solutions. Scaling compute at inference time with RL is the latest application of the bitter lesson, and we’re already seeing it move the needle in production use cases (customer support and soon, coding). This has big ramifications in the AI application layer:

While many companies are focused on building wrappers around generic models, …

... [... 177 words]

Cursor rules, prompt injections, voice to text and Diane

Let’s join the dots between a few different themes this week.

First up, cursor rule files are vulnerable to prompt injection attacks. It’s possible to embed prompts within the rules files and hide them using invisible characters.

You can then use this poisoned rule file to redirect cursor/your agentic IDE of choice towards malicious implementations. This is not a huge surprise - the point of rules files is to direct the LLM towards specific implementations. What’s changed …

... [... 501 words]

The Model is the Product

I think this is a strong take on the on the consequences of the recent RL breakthroughs from Alexander Doria:

I think it’s time to call it: the model is the product.

All current factors in research and market development push in this direction.

Generalist scaling is stalling. This was the whole message behind the release of GPT-4.5: capacities are growing linearly while compute cost are on a geometric curve. Even with all the efficiency gains in training and infrastructure of the past two …

... [... 536 words]

Claude 3.7 Sonnet

Lots to digest here. A few pull quotes from the press release. Coding use cases are the focus of the upgraded model:

Claude 3.7 Sonnet shows particularly strong improvements in coding and front-end web development. Along with the model, we’re also introducing a command line tool for agentic coding, Claude Code. Claude Code is available as a limited research preview, and enables developers to delegate substantial engineering tasks to Claude directly from their terminal.

It’s a drop in …

... [... 267 words]

On ICPs vs Strength of PMF

Hot take. Product teams talking too much about ICPs is a red flag. ICPs are for sales and marketing teams. They’re at the blunt end and need to narrow their focus to maximise win rate and build a hyper efficient growth engine.

Product teams need to know and understand their ICP to support prioritisation, but they should be thinking in terms of Product Market Fit strength across segments. We need to have that peripheral vision and understand the whole picture. This is how you expand PMF and …

... [... 91 words]

Karpathy's Vibes Check

I thought this post was interesting, not so much for conclusion about Grok 3 but instead for the range of tests that Andrej performs to get a feel for the capabilities of the model in <=~2 hours. It’s all there - the recall/reasoning without search of the GPT-2 training FLOPs, a few varied dev tasks, research tasks, search tasks (including a gut feel for hallucinations), ethics, personality, then a battery of standard LLM assessments (‘r’s in strawberry, 9.11 > 9.9, …

... [... 160 words]

Quoting Harper Reed

Another day, another AI dev flow. There’s some common patterns emerging now (use of markdown files like spec.md, todo.md etc.) and I thought the blog gave a nice step by step guide and prompts to borrow. Basically the advice reduces to “spend a lot of time planning with reasoning models up front”. I liked this thought too:

I have spent years coding by myself, years coding as a pair, and years coding in a team. It is always better with people. These workflows are not easy to …

... [... 140 words]

Quoting Nelson Elhage

Great post from Nelson Elhage (Anthropic pre-training team) on adventures coding with Sonnet. Much of the post just describes the same journey that a lot of us are on at the moment (I’m still finding these posts fun to read, I wonder when the sense of wonder will be replaced by one of fatigue?), but there’s a couple of thoughtful nuggets towards the end that I’ve pulled out here:

You can now generate thousands of lines of code at a price of mere cents; but no human will …

... [... 240 words]

S1: Scalable test-time compute for $6

The title is a little click-baity, but the analysis of the paper in the blog is great. A fast download of one (quite hacky, fun) approach to getting scalable test-time compute.

Quoting Dario Amodei

The insights here are not novel, but Dario provides a strong mental model of how the AI system will keep evolving over time:

Shifting the curve. The field is constantly coming up with ideas, large and small, that make things more effective or efficient: it could be an improvement to the architecture of the model (a tweak to the basic Transformer architecture that all of today’s models use) or simply a way of running the model more efficiently on the underlying hardware. New generations of …

... [... 355 words]

More AI Rollups

Here they come, Rocketable is a YCW25 batch startup following the AI Rollup model (see previous post). The plan here is to purchase profitable SaaS companies throwing off cash and use that cash to bootstrap more purchases, Omaha style. The investment thesis is the application of AI/agents allows full automation of any work done by humans within these small SaaS co’s (as it’s likely to be generic one assumes).

Feels like a tricky one, the exact businesses willing to sell in this niche …

... [... 175 words]

If you're still looking for something, you can browse an index of all blogs here.