Asymmetric query quantization in DiskBBQ
Ben Trent and Thomas Veasey published Cutting Elasticsearch DiskBBQ query quantization time by 5x today. It’s shipping in Elasticsearch Serverless now and Elastic Stack 9.4.0 next.
The change in one sentence: instead of quantising the query against every document centroid it visits (symmetric quantisation), DiskBBQ now quantises the query once against a coarser parent centroid and reuses that quantisation across all the child document centroids underneath it (asymmetric quantisation). Query quantisation used to take ~20% of query time; it now takes ~4%. That’s a 5x reduction on the quantisation stack with, in the authors’ words, "very little, if any" recall impact.
Why this is interesting beyond the speedup
DiskBBQ already had a two-tier centroid hierarchy - parent centroids on top of document centroids, originally introduced for index scaling. What’s new is using that hierarchy asymmetrically between the query path and the document path:
- Document centroids stay fine-grained. The document side needs the precision, because that’s what determines posting-list structure and scoring fidelity.
- Query centroids go up one level, to the parents. The query is one item, looked up once per request - it doesn’t need centroid-level precision.
The old mental model, as the authors put it, was "one centroid does everything for a posting list" - same precision for both sides of the comparison. The new model splits responsibilities. A query and a document don’t have the same lifetime, the same frequency, or the same reuse pattern; they were being quantised the same way mostly out of historical symmetry. Once you notice that, the cheaper query path becomes an obvious lever to pull.
Where the cost goes next
The authors are honest about scope. They note: "the bulk of the cost is still just scoring the vectors in each cluster." Quantisation was the secondary cost, and even after this change scoring still dominates. The improvement isn’t a redesign of the scoring pipeline; it’s a sharp reduction in the format-conversion overhead around it - roughly 16% off overall query latency in the new shape, since quantisation was never the dominant share.
What it does do is shift the operating point. When quantisation was 20% of query time, "the cost of converting things into the right format" was worth caring about. At 4% it’s basically background noise, and the dominant cost is back to being the actual work - scoring vectors. That’s the right shape for an optimisation: the secondary cost falls away, leaving the primary cost visible for the next round.
It also pairs nicely with the previous DiskBBQ optimisation, the one that cut wasted centroid scans under restrictive filters. The two compound: you visit fewer centroids; the ones you do visit cost less to quantise against. Same engine, different cost.
Three in two weeks
Three Elastic search-labs posts from Ben Trent’s team in two weeks. The DiskBBQ filter optimisation was about not looking where you don’t need to. The 9.4 vector lookup was about not shipping the data out when the cluster can fetch its own. Today’s piece is about not treating the query and the document the same when they have different lifetimes.
Each one is a small change. Each one removes a cost that had quietly grown to dominate something cheap. Each one is, at the architectural level, the same move: an assumption baked into the implementation that the workload didn’t actually demand. Spot it, name it, remove it. There’s a real architectural rhythm forming in DiskBBQ’s improvement cadence, and any search engineer would do well to keep an eye on it.
The authors close with characteristic understatement: "This was a nerdy one... It’s always fun to be able to tackle complex problems with simple mathematics." It’s also fun to watch a single product team find a fresh asymmetry to exploit, month after month.
