What to make of TurboQuant
A new quantisation method has been making the rounds: TurboQuant, out of Google Research. It’s a data-oblivious compression scheme for high-dimensional vectors. The core trick is actually quite small: apply a fast Hadamard rotation to a vector before you quantise it.
Hadamard rotation has been kicking around signal processing and randomised algorithms for ages. The reason it matters here is what it does to a vector’s distribution: it redistributes the energy of the vector evenly across coordinates. After the rotation, every coordinate looks similarly “typical”. Now when you quantise - drop each coordinate from 32 bits down to 4, 2, or even 1 - you’re throwing away a balanced approximation rather than crushing the few coordinates that happened to carry most of the magnitude.
That, in essence, is TurboQuant. Rotate, then quantise. The novelty isn’t the rotation nor the quantisation (everyone does that). It’s the theoretical packaging: TurboQuant ships with mathematical bounds on dot-product distortion, the rotation removes the need for dataset-specific calibration (“data-oblivious”), and the original paper demonstrates compelling numbers if you like reading whitepapers.
Implementations are arriving..
Two vector engines have shipped rotation-based scalar quantisation as a named feature, and they came at it from different directions.
Weaviate got there first from what I can see... but under a different name. Their Rotational Quantization (RQ) landed in v1.32 (8-bit RQ for HNSW) and has been the default since v1.33. RQ applies a fast pseudorandom rotation via the Walsh–Hadamard Transform, then scalar-quantises each dimension - roughly 4x compression at 98–99% recall. Weaviate developed RQ taking inspiration from RaBitQ, simplifying it for use with HNSW.
Qdrant added TurboQuant as a first-class quantisation option in their 1.18 release earlier this week. They report “similar recall and speed” to scalar quantisation while achieving “twice the compression ratio”. 4-bit TurboQuant lands recall scores between 0.9169 and 0.9271 across their four test datasets - competitive with scalar quantisation’s 0.9014–0.9339 - and notably ahead of 1-bit binary quantisation in the high-compression regime.
Elasticsearch is sceptical
Last week, Thomas Veasey at Elastic published a respectfully critical analysis. The argument is sharper than “we don’t like it.” It’s three claims, in order of damage:
1. The Hadamard rotation does the work, not the rest of TurboQuant. When Elastic bolted the same Hadamard rotation onto their own Optimised Scalar Quantisation (OSQ) and re-ran the benchmark, “OSQ + Hadamard matches TurboQuant almost exactly at 1–2 bits.” The rotation is the insight; everything else around it is decoration.
2. Dot-product accuracy beats reconstruction error. TurboQuant has good MSE numbers, but search doesn’t rank by MSE - it ranks by dot product. After accounting for ranking-irrelevant bias, OSQ at 1-bit per document beats TurboQuant at 4-bit per document - “better ranking accuracy at over 5x less storage.”
3. CPUs aren’t H100s. TurboQuant’s reference implementation was designed around GPU-friendly kernels. On commodity CPUs - where the overwhelming majority of vector search actually runs - OSQ’s symmetric kernels are “10–40x faster than TurboQuant”, clocking “14 ns/doc versus TurboQuant’s 293 ns/doc using NEON intrinsics.”
Their conclusion is that TurboQuant is “theoretically elegant”, but for CPU-based vector search the empirical picture is clear. They’re sticking with BBQ/OSQ - the quantisation stack that feeds the DiskBBQ index I mentioned in my thoughts yesterday.
Qdrant is optimising for “good default with mathematical guarantees and no calibration step.” That’s a sane position when your users span a huge variety of embeddings, query distributions, and hardware. A data-oblivious method with provable bounds is the safer choice across that surface.
Elastic is optimising for “best possible CPU-side throughput on the specific workload our customers run.” That’s a sane position when you have years of accumulated knowledge about exactly what your customers’ vectors look like and what their hardware will pipeline.
Both can be right at the same time. The fact that they reached different conclusions on the same algorithm tells you about their deployment surfaces, not about who is wrong.
What I actually take from this
The most durable insight in both pieces is the same: the rotation matters more than the rest of the scheme. Hadamard transform → quantise is the move. Whether you wrap it in TurboQuant’s theoretical packaging or in OSQ’s anisotropic loss is a second-order question.
It’s also a pattern I keep finding myself drawing in recent posts: a small, cheap pre-processing step that lets the next component get away with much less work. Rotate first, quantise second. The rotation is the cheap thing in front of the expensive thing. Even when two vendors disagree about the implementation, they agree about the shape.
I find that genuinely reassuring. The different stances between Qdrant and Elastic isn’t a sign that vector search is in a confused state - it’s a sign that the field is healthy enough to host real, evidence-based disagreement. Both camps shipped benchmarks. Both made their reasoning visible. Both will move the field forward, regardless of who ends up “winning” the algorithm war.
If you’re picking quantisation for your own stack: read both posts. Look at where your embeddings come from, what your hardware looks like, and which trade-off you actually care about - calibration-free vs CPU-throughput vs storage. The answer is downstream of those questions, not upstream.
