Home
  • About
  • |
LIGHTDARK

Sep 7, 2025

I’ve really enjoyed following along with the Embrace the Red prompt injection series over the summer. Pretty much every major, hyped tool has been compromised by the same fatal flaw - LLMs today mix data and instructions in the same channel (the prompt) and the model doesn’t know how to separate the two things. The series finale (an old school self-replicating virus) is a particular treat. There’s not really (yet) a great pattern for solving this problem, there’s been a couple of papers but it looks like it will create a fundamental roadblock to the type of public-internet roaming, self directed agents which were expecting to be released. Part of me wonders if the burden will need to be shifted to the site being browsed/the browser rather than the agent itself. The only real directive on most sites today is LLM.txt - clearly with a bit more thought we could do something better. You can see the gap, and it’ll be interesting to see how it gets solved.

Embrace the Red's month of AI Bugs
 
© 2025 Tom Hipwell. Built with Hugo.