Home
  • About
  • |
LIGHTDARK

May 11, 2025

Simon’s blog is a gold mine. He just runs that bit further than everyone else and it shows time and again. Here he uses Cursor’s GDPR subprocessor disclosure to document their stack (the use of Fireworks and Turbopuffer is the interesting bit here). The killer bit is the disclosure at the end though:

When operating in privacy mode - which they say is enabled by 50% of their users - they are careful not to store any raw code on their servers for longer than the duration of a single request. This is why they store the embeddings and obfuscated file paths but not the code itself.

Reading this made me instantly think of the paper Text Embeddings Reveal (Almost) As Much As Text about how vector embeddings can be reversed. The security documentation touches on that in the notes:

Embedding reversal: academic work has shown that reversing embeddings is possible in some cases. Current attacks rely on having access to the model and embedding short strings into big vectors, which makes us believe that the attack would be somewhat difficult to do here. That said, it is definitely possible for an adversary who breaks into our vector database to learn things about the indexed codebases.

Eeeeeek that’s massive and something I definitely hadn’t fully grokked before this point.

Also makes you realise how heavily vendored Cursor is; there’s seemingly no IP moat from the IDE, the vector store, the model(s) or the inference engine used for hosting the os models under the hood, though there might be a strong data flywheel (I suspect the accept/reject on diffs is the signal that is valuable to the major model vendors).

Food for thought.

Cursor: Security
 
© 2025 Tom Hipwell. Built with Hugo.