The AI Tourist Problem

Kyle Poyar’s writing at Growth Unhinged is normally solid and well researched, plus a handy source of benchmarks if you’re trying to evaluate startups, so it tends to be one I watch out for. This piece has some I treating stats on NRR for B2B SaaS/B2C SaaS and AI companies. The number of datapoints vary and we should take the results with a pinch of salt as they’re based on scraped data but they do point to an interesting trend, the data shows:

B2B SaaS is relatively sticky. …

... [... 366 words]

qwen3-vl-embedding

Very exciting to have an open source vision language model this capable. The queries described in the post are so varied (and work across different axis - semantic understanding, text understanding, object/spatial recognition), I think I this type of technology being cheaply/easily available is a big unlock for a lot of interesting product work.

Quoting Vicki Boykis

Vicki Boykis’ year in review is excellent throughout (as her writing always is) but I loved this line in particular:

The forking branches of a decision tree in a codebase, are likewise boundless, and the neat part is that there is no right answer. You are constrained by your business requirements, but the choice of implementation of those requirements is of an endless variety. It will depend on: the stack you already have, the budget for the rest of the stack, your own past experience and …

... [... 153 words]

Economics of Orbital vs Terrestrial Data Centers

Fun blog post from Andrew McCalip that attempts to build a model of the unit economics of orbital data centers. It looks like they’re just about feasible but really there’s only one player in town. This paragraph is key I think:

This isn’t about talent. It’s about integration. If you have to buy launch, buy buses, buy power hardware, buy deployment, and pay margin at every interface, you never get there. The margin stack and the mass tax eat you alive. Vertical …

... [... 130 words]

Regenerative Software

Chad Fowlers take on principles for how the craft of software changes in the AI era:

The metaphor I keep returning to is the phoenix: systems designed to burn and be reborn, continuously, without losing their identity.

A regenerative system has a few defining traits:

  • Clear, durable boundaries that outlive any implementation
  • Tests and evaluations that define correctness independently of code
  • Automation that assumes replacement is normal, not exceptional
  • Explicit acceptance that …
... [... 175 words]

Don't Build Agents, Build Skills Instead

A short talk from AI Engineer conference in which two Anthropic engineers (Barry Zhang and Mahesh Murag) make the case that you don’t need to build agents. Instead use a general purpose agent (Claude Code) and then write skills (skills are just pe-canned prompts expressed as markdown). The advantage of this approach being that skills are simple, versionable and composable. This last point seems the most important, no more wrangling graphs of actions.

MCP is not fully deprecated in this …

... [... 141 words]

Quoting Andrej Karpathy

Andrej nearly summarises where we are today:

In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It’s about to what extent an AI can “practice” something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there …

... [... 147 words]

The AI Bubble and the US Economy

I can never really tell how useful economic analysis are. Often the fundamentals are staring you in the face and then the bull market runs for years. Timing is everything. However, I thought this was a solid summary that feels balanced and thorough, so worth sharing.

When Will Quantum Computing Work?

Tom McCarthy breaks out the current state of quantum computing. For me what’s valuable here is not predictions on potential commercial applications or the timeline but instead the heuristic to use to track progress and the clear line in the sand for a commercially viable technology:

The key limitation is the size of the problem(s) that the QC can handle. Runtime, integration with real-time data, and performance vs classical optimization techniques also matter, but the main constraint is …

... [... 287 words]

Hacking with AI SASTs: An overview of 'AI Security Engineers' / 'LLM Security Scanners' for Penetration Testers and Security Teams

I enjoy posts like this deep dive from Joshua Rogers on “AI Security Engineers” as amidst so much noise they show the value that agents are adding at the frontier. Josh finds the tools generally useful, giving a good tear down in the post. I’m not quite convinced the tools are ready for prime time, there’s a few too many obvious gotchas outlined here (e.g. monorepo support, vulnerability to prompt injection). I have to admit though I’m cheering for this class of …

... [... 170 words]

Git Cheat Sheet

A very simple thing but this cheat sheet is great, even has simple diagrams for the different merge strategies built in, which is probably the most common area of debate (and confusion) when working with teams. Handy.

LLMs as Retrieval and Recommendation Engines

Nice deep dive on using LLMs for retrieval/recommendation. It’s a two parter, and there’s also a great guide to building a retrieval engine using a constrained decoding approach with vLLM and a HF hosted model. The whole thing is about 30 LoC.

Embrace the Red's month of AI Bugs

I’ve really enjoyed following along with the Embrace the Red prompt injection series over the summer. Pretty much every major, hyped tool has been compromised by the same fatal flaw - LLMs today mix data and instructions in the same channel (the prompt) and the model doesn’t know how to separate the two things. The series finale (an old school self-replicating virus) is a particular treat. There’s not really (yet) a great pattern for solving this problem, there’s been a …

... [... 168 words]

Quoting Jamie Tomalin

Some words on AI strategy from Jamie:

Perhaps, the AI-maxi strategy is building vertically integrated operating companies which wield strategic control to own the upside of AI, fundamentally transforming the economics of their business relative to incumbents, enabling them to counter-position and disrupt by selling directly to the end customer?

e.g., Paloma Health, Convictional, Candidly

Jamie Tomalin

... [... 57 words]

Armin Ronacher's Agentic Coding Recommendations

Armin Ronacher (creator of Flask) has a great piece on agentic coding patterns, insightful throughout but largely centered on the uplift you get from effective tool use:

Agentic coding’s inefficiency largely arises from inference cost and suboptimal tool usage. Let me reiterate: quick, clear tool responses are vital.

For this reason, he tends to avoid MCP:

The reason I barely use it is because Claude Code is very capable of just running regular tools. So MCP for me is really only needed …

... [... 850 words]

Quoting Elad Gil

Elad Gil on AI Rollups:

“It just seems so obvious,” said Gil over a Zoom call earlier this week. “This type of generative AI is very good at understanding language, manipulating language, manipulating text, producing text. And that’s audio, that’s video, that includes coding, sales outreach, and different back-office processes.”

If you can “effectively transform some of those repetitive tasks into software,” he said, “you can increase the margins dramatically and create very different types of …

... [... 177 words]

Quoting Shunyu Yao

Shunyu Yao, a researcher from OpenAI who worked on Deep Research, makes the case for fundamentally altering our approach to benchmarking now we’re in “the second half”:

Inertia is natural, but here is the problem. AI has beat world champions at chess and Go, surpassed most humans on SAT and bar exams, and reached gold medal level on IOI and IMO. But the world hasn’t changed much, at least judged by economics and GDP.

I call this the utility problem, and deem it the most …

... [... 126 words]

Cursor: Security

Simon’s blog is a gold mine. He just runs that bit further than everyone else and it shows time and again. Here he uses Cursor’s GDPR subprocessor disclosure to document their stack (the use of Fireworks and Turbopuffer is the interesting bit here). The killer bit is the disclosure at the end though:

When operating in privacy mode - which they say is enabled by 50% of their users - they are careful not to store any raw code on their servers for longer than the duration of a single …

... [... 282 words]

You should have private evals

I think this is a very good post. Taking the time to test for yourself and understand how each model generation is useful to you, in your context is clearly going to be a big advantage. So much of the assessment of LLMs is vibes based that your own vibes matter most, so spending some time defining what they are is important. This blog offers a framework, and examples, of how to do just that.

If you're still looking for something, you can browse an index of all blogs here.