Faster similar-document search in Elasticsearch 9.4

Ben Trent posted about a small thing today, and "small things" like it are worth noticing.

The change is query_vector_builder.lookup, landing in Elasticsearch 9.4. A common vector-search pattern is "find documents similar to this one" - you have a document, you want its near neighbours. Until now that took two requests: a GET to pull the document's vector out of Elasticsearch, then a kNN query to send that same vector straight back in. The vector - often a thousand-plus floats - made a full round trip out of the cluster and back, serialised and deserialised on each leg, only to be used by the cluster it started in.

The new lookup builder removes the detour. You hand the kNN query an index, a document id and a field; Elasticsearch fetches the vector internally and runs the search server-side. One request. The vector never leaves the cluster.

The numbers from the benchmark - 2 million documents, two GCP nodes - make you double-take for a change this small: p50 latency drops from 10.4ms to 3.1ms, p90 from 25.4ms to 5.9ms. Better than 3x at the median, better than 4x at p90. His own framing is characteristically modest: "While this is a simple feature, I hope it removes some unnecessary friction... and makes us that much more lovable."

It's worth being precise about where a speedup like that comes from, because it isn't the search getting faster. The kNN query does exactly the same work as before. What changed is everything around it: one network round trip and two serialisation cycles, gone. A round trip you don't take is latency you never pay.

And here's why that produces a multiple rather than a few percent. A kNN search over 2 million vectors is fast - single-digit milliseconds. When the core operation is that cheap, the fixed overhead of shuttling data to and from it stops being a rounding error and becomes most of the wall-clock time. Delete the overhead and you don't shave a margin, you get a multiple - because the overhead was the bulk of the total.

That's the shape of every high-leverage small change I can think of. It's never "small change, big impact" by luck. It's a small change that removes a fixed cost which had quietly grown to dominate a cheap operation. The leverage is high precisely because vector search got good enough that the plumbing around it became the bottleneck.

There's a second multiplier, and it's what makes this land in 2026 specifically. Retrieval used to be a single round trip - a query in, a result set out. It increasingly isn't. Modern retrieval runs inside agentic loops: a question triggers a search, the results get read and reasoned over, another search goes out, and again. SID-1, the retrieval model I wrote about yesterday, makes many internal search calls to answer a single hard question. To be precise about what lookup changes here: it won't speed up all of them - most agentic searches run on query text the model just wrote, whose vector isn't a stored document. It collapses one specific case - the loop already has a document in hand and wants the ones most similar to it. But the broader point holds whichever step you optimise: once retrieval is a loop rather than a single hop, an internal round trip you delete is deleted many times per question, not once. The plumbing moved into the hot path, which is exactly where you want it tightened.

The pattern itself has no novelty, and that's the point. "Compute where the data already lives; don't ship the data out to the caller" is one of the foundational instincts of backend engineering - it's why we have server-side joins instead of N+1 client round trips, why SQL has subqueries, why predicate pushdown exists. query_vector_builder.lookup is that instinct applied to the one corner of the vector-search API that hadn't caught up yet.

A week or so ago, another post by Ben sent me down a similar path - the DiskBBQ filtering optimisation, a small cheap structure that tells the search where not to look. That piece was about putting something cheap in front of something expensive; this is its sibling, not paying for transport you don't need. Different move, same family - and the same thing I keep coming back to: the genuinely new components in modern search keep turning out to be governed by rules that are decades old.

This is where a great deal of real engineering craft lives nowadays - noticing that a cheap operation has come to be dominated by its own overhead, and quietly deleting the overhead. Ben and his teams are doing a lot of this. It's worth watching for and a regular reminder that this kind of work is really worth doing ourselves in our own codebases.