Ben Trent and Thomas Veasey shipped another DiskBBQ optimisation today: quantising queries against coarser parent centroids instead of per-document ones. 5x off the quantisation stack with no meaningful recall loss. The insight underneath is that the query path and the document path don’t have to be treated symmetrically.
A new paper out this week shows the per-token hidden states of off-the-shelf single-vector embedders already carry the information needed for ColBERT-style MaxSim - and you can wire it in at inference time, without retraining. The late-interaction deployment barrier I most underestimated just dropped.
Elasticsearch 9.4 adds query_vector_builder.lookup - a tiny API addition that collapses a two-request vector search into one and runs better than 3x faster. A small change with a big impact, and a look at where that ratio actually comes from.
SID AI’s SID-1 is the first retrieval model trained end-to-end with RL. Some observations through a search-and-IR lens: the middle of the retrieval pipeline collapses into one trained model, the NDCG reward gets deliberately bent toward recall, and the agentic-retrieval loop becomes a subagent you hand to a larger system.
Laurie Voss says applied-AI iteration has moved off the model and into "the harness". He’s right - and once you strip the new vocabulary, the harness is mostly a retrieval system.