Baked Search: Building semantic search quickly for toy use cases

Decent quality semantic search has got much easier and cheaper to ship yourself in the last couple of years. I thought I’d try and write a super quick guide that gets a search backend up and running as quickly and cheaply as possible.

The guide assumes that you have a toy use case - you’re building as a hobbyist. The example I’ve chosen is writing search for a blog - specifically a blog built using a static site generator like Hugo, Jekyll, Gatsby etc (like this one!). To do …

... [... 1097 words]

Meritech ServiceTitan S1 Analysis

This S1 analysis from Meritech went viral due to the (compounding!) IPO ratchet that ServiceTitan are subject to after the Series H funding they took 18 months ago. About halfway down there’s some handy benchmarks for median/top decile pre-IPO performance in vertical SaaS, I’ve pockets them for reference (maybe they’ll come in handy one day!), so I thought I’d reproduce them here:

Performance by EV / ARR PercentileTop DecileMedianServiceTitan
Financial Metrics …
... [... 219 words]

Quoting Google Rules of Machine Learning

The often forgotten first rule of ML is that you might be able to get a good enough result without ML:

Rule #1: Don’t be afraid to launch a product without machine learning.

Machine learning is cool, but it requires data. Theoretically, you can take data from a different problem and then tweak the model for a new product, but this will likely underperform basic heuristics. If you think that machine learning will give you a 100% boost, then a heuristic will get you 50% of the way there.

For …

... [... 171 words]

uv Cheat Sheet

This cheat sheet works as a handy 30 second introduction to the python package/project management tool if you’ve not met it already. I’ve been teetering on the brink; I’m boring and have stuck with pip and pip-tools for a long time but it is so incredibly fast that I am tempted to move over.

Quoting AWS Lambda PR/FAQ

Brilliant throughout, with lots of small, golden nuggets. PR/FAQs like this are, for me, a bit wordy but you can see how effectively the technique is used here to explain the impact of not just a ground breaking new technical approach but also a major shift in AWS’s compute billing model to a wide audience:

When we launched Lambda, security was not negotiable – and we knew that there would be trade-offs. So, until Firecracker, we used single tenant EC2 instances. No two customers shared …

... [... 180 words]

OpenAI Email Archives (from Musk v. Altman)

There’s been a tranche of emails released as part of the Musk vs Altman stuff around OpenAI and it makes for some interesting reading.

One of the big things that jumps out is how much focus there is on crafting the narrative and mission for OpenAI.

They’re obsessed with getting the best talent (cheaply it seems), using the mission as the motivator:

Sam Altman to Elon Musk - Jun 24, 2015

The mission would be to create the first general AI and use it for individual empowerment—ie, the …

... [... 935 words]

The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers

TL;DR: A study of ~5,000 engineers across Microsoft, Accenture, and a Fortune 100 company finds Github Copilot boosts weekly PRs by 26.08% (SE: 10.2%) - but the effect varies widely, with a 95% confidence interval from 5.88% to 46.28%. Adoption patterns show junior and newly tenured engineers are more likely to use Copilot (up to 9.5% higher). 30-40% of engineers didn’t use it at all.

This is a paper I saw posted about a bit during the summer that looks at the productivity impact of Github …

... [... 934 words]

Quoting Google Big Sleep team

Pattern matching with LLMs used to find security vulns in the wild:

A key motivating factor for Naptime and now for Big Sleep has been the continued in-the-wild discovery of exploits for variants of previously found and patched vulnerabilities. As this trend continues, it’s clear that fuzzing is not succeeding at catching such variants, and that for attackers, manual variant analysis is a cost-effective approach.

We also feel that this variant-analysis task is a better fit for current …

... [... 135 words]

Quoting Graham Paterson

Love a nice data driven product feedback loop; the Jitty folk have found a nice pattern with natural language search:

Over the weekend we quietly released a highly requested feature on Jitty: search by travel time 🚌🚶🚗🚴‍♀️🚂

We’ve partnered with the good people of the aptly named TravelTime to let homebuyers search by time rather than just distance.

Since we launched natural language search, we can see what people search for. Loads of people were searching for “15 minutes cycle to …

... [... 94 words]

AI-Assisted Assessment of Coding Practices in Modern Code Review

Nice paper on AI assisted code review at Google. Three call outs that I thought were interesting (as I imagine that we’re about to be hit by a tidal wave of commercial applications of this idea):

(1) One of the issues is that the required training dataset varies by best practice - the currency of knowledge really matters. So for example the underlying model was trained on data prior to ‘22, but the canonical source of python type definitions has shifted about a fair bit from Python …

... [... 499 words]

AI, Ad Dollars

I liked Ethan Mollick’s post on ad dollars earlier this week, here it is if you missed it:

No one has figured out how you integrate advertising with LLM replies. If it is contextual ads around the LLM, then a good LLM answer should provide more guidance to the product you want than ads, making the ads useless. If ads are integrated into the prompt, with the instructions that the advertiser be recommended, that will lead to inaccurate, bad answers. This is sort of a big deal, given that …

... [... 1052 words]

AGI Predictions

I really enjoyed the nonint post on timelines to AGI. Obviously James Betker is better placed than me to make an informed prediction, and he has inside information (he works for OpenAI) but there’s a couple of things that jump out at me if I read this prediction critically.

Firstly, given that transformers are great general approximators of behaviour, it’s very difficult to falsify any predictions about AGI without having a very specific and testable definition of what AGI is that …

... [... 501 words]

dspy unpacked: continuous prompt optimisation

Omar Khattab, Chris Potts, Matei Zaharia | 2023 | Paper | Github | Docs

A lot of work with LLMs today involves working through a loop where you break a problem into steps, write a prompt for each step, then put the whole together by adjusting each prompt to feed into the next one.

dspy simplifies this process. It gives you a framework to structure your pipeline - forcing you to architect the application so your program flow is split from the variable stuff - the prompts and model weights that …

... [... 990 words]

How many customer interviews are enough?

Counts of customer interviews seem to have become a bit of a vanity metric of late. A shorthand for product or decision quality, as if one automatically implies the other.

I appreciate your sacrifice at the temple of customer research, but I worry that you may have wasted your time.

Working out the right number of interviews, wireframe tests or customers in the alpha phase of your project is quite similar to an optimal stopping problem. You’re trying to work out how much learning you …

... [... 328 words]

llm.c: The genius of Andrej Karpathy

What’s awesome about Andrej Karpathy’s llm.c isn’t just that it’s a bare-metal, from-scratch implementation of GPT-2 (safety wink definitely required!).

If you take a step back, you’ll see he’s also educating us on how one of the very best in the world hones their craft. He’s stripped away the intermediate layer of libraries - there’s no PyTorch here. Instead, we’re taken back to the basics: an attempt to implement a simple C and CUDA version …

... [... 227 words]

March '24 Roundup

March was the month we got Grok, OpenAI confirmed their strategy and we no longer needed to run on vibes alone as gpt-4 was displaced at the top of the leaderboards. An experiment was also kicked off to learn about the pricing power of the major LLM providers.

One of the things I most enjoyed this month was the explosion of interest in LLM agents with the launch of Devin, the AI software engineer. So this month I’ve pulled out 4 papers which expand on agent based workflows and show how …

... [... 1456 words]

Hot takes on Devin, the AI software engineer

I thought Devin from Cognition looked super cool this week, the UX feels like a glimpse of a new era.

I wonder how deep the moat is though? 🤔

From staring a little bit too closely at the screenshots and videos I’ve seen so far, a hot take would be that it feels like most of the performance lift in the SWE benchmarks could come from a switch in prompting technique, i.e. the size in the performance lift in the benchmark looks similar to that of shifting from chain-of-thought to something …

... [... 296 words]

Grandmaster-Level Chess Without Search

Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid and Tim Genewein | Grandmaster Level Chess without Search | 2024 | Paper

Walter Isaacson | Elon Musk | 2023 | Book

Towards the end of Walter Isaacson’s biography of Elon Musk, there’s a description of a breakthrough with Tesla Autopilot:

For years, Tesla’s Autopilot system relied on a rules-based approach. It took visual data from a car’s cameras and identified such things as …

... [... 1982 words]

February '24 Roundup

February feels like it’s gone in a blur. Hofy had a brilliant company retreat in Peniche, Portugal. Sora looks insane. Google returned to open source AI with the Gemma series while Mistral released a hosted, closed-source model. Here’s a few other things that caught the eye:

Self-Discover, Google DeepMind

Can we improve LLM reasoning by adjusting the way in which we prompt? Google DeepMind demonstrate an up-to 32% uplift in performance that transfers across LLMs (GPT-4, GPT-3.5, …

... [... 1227 words]

If you're still looking for something, you can browse an index of all blogs here.