The AI Bubble and the US Economy

I can never really tell how useful economic analysis are. Often the fundamentals are staring you in the face and then the bull market runs for years. Timing is everything. However, I thought this was a solid summary that feels balanced and thorough, so worth sharing.

When Will Quantum Computing Work?

Tom McCarthy breaks out the current state of quantum computing. For me what’s valuable here is not predictions on potential commercial applications or the timeline but instead the heuristic to use to track progress and the clear line in the sand for a commercially viable technology:

The key limitation is the size of the problem(s) that the QC can handle. Runtime, integration with real-time data, and performance vs classical optimization techniques also matter, but the main constraint is …

... [... 287 words]

Hacking with AI SASTs: An overview of 'AI Security Engineers' / 'LLM Security Scanners' for Penetration Testers and Security Teams

I enjoy posts like this deep dive from Joshua Rogers on “AI Security Engineers” as amidst so much noise they show the value that agents are adding at the frontier. Josh finds the tools generally useful, giving a good tear down in the post. I’m not quite convinced the tools are ready for prime time, there’s a few too many obvious gotchas outlined here (e.g. monorepo support, vulnerability to prompt injection). I have to admit though I’m cheering for this class of …

... [... 170 words]

Git Cheat Sheet

A very simple thing but this cheat sheet is great, even has simple diagrams for the different merge strategies built in, which is probably the most common area of debate (and confusion) when working with teams. Handy.

LLMs as Retrieval and Recommendation Engines

Nice deep dive on using LLMs for retrieval/recommendation. It’s a two parter, and there’s also a great guide to building a retrieval engine using a constrained decoding approach with vLLM and a HF hosted model. The whole thing is about 30 LoC.

Embrace the Red's month of AI Bugs

I’ve really enjoyed following along with the Embrace the Red prompt injection series over the summer. Pretty much every major, hyped tool has been compromised by the same fatal flaw - LLMs today mix data and instructions in the same channel (the prompt) and the model doesn’t know how to separate the two things. The series finale (an old school self-replicating virus) is a particular treat. There’s not really (yet) a great pattern for solving this problem, there’s been a …

... [... 168 words]

Quoting Jamie Tomalin

Some words on AI strategy from Jamie:

Perhaps, the AI-maxi strategy is building vertically integrated operating companies which wield strategic control to own the upside of AI, fundamentally transforming the economics of their business relative to incumbents, enabling them to counter-position and disrupt by selling directly to the end customer?

e.g., Paloma Health, Convictional, Candidly

Jamie Tomalin

... [... 57 words]

Armin Ronacher's Agentic Coding Recommendations

Armin Ronacher (creator of Flask) has a great piece on agentic coding patterns, insightful throughout but largely centered on the uplift you get from effective tool use:

Agentic coding’s inefficiency largely arises from inference cost and suboptimal tool usage. Let me reiterate: quick, clear tool responses are vital.

For this reason, he tends to avoid MCP:

The reason I barely use it is because Claude Code is very capable of just running regular tools. So MCP for me is really only needed …

... [... 850 words]

Quoting Elad Gil

Elad Gil on AI Rollups:

“It just seems so obvious,” said Gil over a Zoom call earlier this week. “This type of generative AI is very good at understanding language, manipulating language, manipulating text, producing text. And that’s audio, that’s video, that includes coding, sales outreach, and different back-office processes.”

If you can “effectively transform some of those repetitive tasks into software,” he said, “you can increase the margins dramatically and create very different types of …

... [... 177 words]

Quoting Shunyu Yao

Shunyu Yao, a researcher from OpenAI who worked on Deep Research, makes the case for fundamentally altering our approach to benchmarking now we’re in “the second half”:

Inertia is natural, but here is the problem. AI has beat world champions at chess and Go, surpassed most humans on SAT and bar exams, and reached gold medal level on IOI and IMO. But the world hasn’t changed much, at least judged by economics and GDP.

I call this the utility problem, and deem it the most …

... [... 126 words]

Cursor: Security

Simon’s blog is a gold mine. He just runs that bit further than everyone else and it shows time and again. Here he uses Cursor’s GDPR subprocessor disclosure to document their stack (the use of Fireworks and Turbopuffer is the interesting bit here). The killer bit is the disclosure at the end though:

When operating in privacy mode - which they say is enabled by 50% of their users - they are careful not to store any raw code on their servers for longer than the duration of a single …

... [... 282 words]

You should have private evals

I think this is a very good post. Taking the time to test for yourself and understand how each model generation is useful to you, in your context is clearly going to be a big advantage. So much of the assessment of LLMs is vibes based that your own vibes matter most, so spending some time defining what they are is important. This blog offers a framework, and examples, of how to do just that.

Quoting Chris Paxton

Nice explainer that sets out the boundaries of the RL techniques now dominating progress in AI. The list quoted here neatly describes what the jagged edge of AI will look like for the next little while:

Reinforcement learning is a powerful tool. Right now, though, it’s best used when:

You have a verifiable problem: math, coding, robot grasping

You have a way to generate a ton of data in this domain, but can’t necessarily generate optimal or even good data

The exploration problem is locally …

... [... 120 words]

The Bull Case for an AI Native Investment Bank

YC’s call for startups for the summer ‘25 batch includes a section on Fullstack AI, I’ve written about AI Rollups a few times on this blog, but it looks like the model might now accelerate.

Coincidentally the same day OffDeal (a YC company) has published their blueprint for a rollup that takes on investment bank M&A. Somewhat unusually, there’s tonnes of detail in this strategy doc so I’ve pulled our a few interesting bits below.

First up, note how they’ve …

... [... 669 words]

A Short Note on Sycophants and Feedback Loops

This has been written about in a few places so I’ll keep it brief. It was interesting that one of the root causes (note, not the sole cause) of the ChatGPT sycophancy issues was the feedback loop from the thumbs up/down data on posts, from their blog post:

“We also teach our models how to apply these principles by incorporating user signals like thumbs-up / thumbs-down feedback on ChatGPT responses.”

What’s interesting here is that cohort age of user feedback makes a …

... [... 219 words]

The Leaderboard Illusion

Interesting paper from Cohere, I think this might cause a bit of a storm - basically it’s an investigation into biases towards closed source model companies (OpenAI, Meta, Google DeepMind are named) in Chatbot Arena.

There’s three ways that the proprietary shops are favoured:

  1. There’s private testing practices that means these model providers are able to test multiple variants before public release, enabling selective disclosure of results.

  2. Proprietary closed models are …

... [... 222 words]

Mixture of Experts

Insightful post from J Betker on the MoE architecture. Here’s a few grabs:

The fact that MoE has great scaling properties indicates that something deeper is amiss with this architectural construct. This turns out to be sparsity itself – it is a new free parameter to the scaling laws for which sparsity=1 is suboptimal. Put another way – Chinchilla scaling laws focus on the relationship between data and compute, but MoEs give us another lever: the number of parameters in a neural network. …

... [... 274 words]

Claude Code: Best practices for agentic coding

This is really good, well worth the investment of your time. There is a lot of novel insight here that will shortly become de rigueur. There’s a few bits worth calling out.

The models are now heavily tuned for too use as we all know. gh cli use is baked in:

Claude knows how to use the gh CLI to interact with GitHub for creating issues, opening pull requests, reading comments, and more. Without gh installed, Claude can still use the GitHub API or MCP server if you have those installed. …

... [... 438 words]

A Realistic AI Timeline

Another AI prediction, but I think this one pinpoints some of the blockers much more clearly. In summary:

Roughly: generalist scaling does not work or, at least, not well enough to make meaningul sense for material deployment. Instead, most development, including agentification, happens in the smaller size range with specialized, opinionated training. Any actual “general intelligence” has to take an entirely different direction — one that is almost discouraged by formal evaluation. …

... [... 491 words]

If you're still looking for something, you can browse an index of all blogs here.