Writing on software design, company building, and Information Retrieval.

Occasional longer-form thoughts on programming, leadership, product design, and more, collected in chronological order.

A field guide to vector similarity measures

July 7, 2026

Dot product, cosine, Euclidean, Manhattan and Hamming - what each one actually measures, why most of them collapse into the same ranking once your vectors are normalised, and the handful of mistakes that bite in practice.

July 7, 2026

Near both, on neither: why single-vector search fails high-intent queries

July 5, 2026

A small, self-contained experiment on why single-vector retrieval breaks on compound, high-intent queries - and how late interaction keeps two facets intact where a dense embedding averages them away.

July 5, 2026

AI is changing consumer search

8-part seriesJun 2026

A series on how consumer search habits are shifting toward AI answers, and what that shift means for the businesses, publishers, and engineers who depend on being found.

8-part seriesJun 2026

Asymmetric query quantization in DiskBBQ

May 27, 2026

Ben Trent and Thomas Veasey shipped another DiskBBQ optimisation today: quantising queries against coarser parent centroids instead of per-document ones. 5x off the quantisation stack with no meaningful recall loss. The insight underneath is that the query path and the document path don’t have to be treated symmetrically.

May 27, 2026

SMART: late interaction without retraining

May 26, 2026

A new paper out this week shows the per-token hidden states of off-the-shelf single-vector embedders already carry the information needed for ColBERT-style MaxSim - and you can wire it in at inference time, without retraining. The late-interaction deployment barrier I most underestimated just dropped.

May 26, 2026

Faster similar-document search in Elasticsearch 9.4

May 21, 2026

Elasticsearch 9.4 adds query_vector_builder.lookup - a tiny API addition that collapses a two-request vector search into one and runs better than 3x faster. A small change with a big impact, and a look at where that ratio actually comes from.

May 21, 2026

SID-1: Train the loop, keep the index

May 20, 2026

SID AI’s SID-1 is the first retrieval model trained end-to-end with RL. Some observations through a search-and-IR lens: the middle of the retrieval pipeline collapses into one trained model, the NDCG reward gets deliberately bent toward recall, and the agentic-retrieval loop becomes a subagent you hand to a larger system.

May 20, 2026

The harness is mostly retrieval

May 18, 2026

Laurie Voss says applied-AI iteration has moved off the model and into "the harness". He’s right - and once you strip the new vocabulary, the harness is mostly a retrieval system.

May 18, 2026

xAI algorithm through a search lens

May 16, 2026

xAI open-sourced the For You feed algorithm today. Three observations through a search-and-IR lens: two-tower’s quiet dominance, the retrieve/rank split surviving the bitter lesson, and recsys converging with search.

May 16, 2026

Could TurboQuant Unlock Late Interaction Retrieval?

May 15, 2026

TurboQuant landed as a KV cache result, but the more interesting application might be ColBERT-style late interaction. Here’s the case, and the open questions.

May 15, 2026

What to make of TurboQuant

May 14, 2026

A new quantisation method out of Google Research is making the rounds. Qdrant shipped it. Elastic ran the benchmarks and politely declined. Both responses tell you something useful.

May 14, 2026

The pattern goes all the way down

May 13, 2026

DiskBBQ’s new filtered-search optimisation is the same architectural move I wrote about last week, applied one layer deeper. The pattern is fractal - and that’s what makes it useful.

May 13, 2026

Everything has changed in search. Nothing has changed in search.

May 5, 2026

Agentic search looks like a clean break from classical IR. Look closer and the architectural instincts are the ones backend engineers have used for decades - the components are new, the rules are not.

May 5, 2026

A primer on late interaction

May 2, 2025

How ColBERT-style token-level matching fits between single-vector dense retrieval and cross-encoders, why MaxSim is the clever bit, and what the storage tax actually looks like in practice.

May 2, 2025

Today, Vimeo goes public

May 25, 2021

Vimeo spins out from IAC and begins trading on Nasdaq under the ticker VMEO.

May 25, 2021

FOSDEM 2015

February 2, 2015

Another year at FOSDEM — Vimeo's open source talk, the dedicated Open Source Search track, and a closing keynote from a Mars One astronaut candidate.

February 2, 2015

I've Joined Vimeo

April 7, 2014

Joining the Vimeo team to work on the search platform after an amazing run at DueDil.

April 7, 2014

DueDil raises further $17 million of funding to fuel growth and expansion

March 3, 2014

DueDil's Series B: a $17m round led by Oak Investment Partners, bringing total funding to $22m and accelerating expansion to new geographies.

March 3, 2014

Elasticsearch 1.0 launched: An overview

February 12, 2014

A run-down of the headline features in Elasticsearch 1.0 — Snapshot/Restore, the cat API, the redesigned percolator, and the new Aggregations framework.

February 12, 2014

FOSDEM 2014: a retrospective

February 2, 2014

A weekend in Brussels at FOSDEM — Elasticsearch 1.0 ahead of launch, plus PostgreSQL JSON, Redis, MongoDB, and YARN talks.

February 2, 2014

Elasticsearch Marvel: Monitor and Manage your Elasticsearch cluster

January 28, 2014

Elasticsearch's new Marvel monitoring dashboard — built on Kibana and Sense — surfaces cluster health and lets you query the REST API live.

January 28, 2014

Elasticsearch Snapshot Restore Overview

January 5, 2014

A walkthrough of the new Snapshot/Restore API arriving in Elasticsearch 1.0 — incremental backups for your cluster via a simple REST endpoint.

January 5, 2014

Elasticsearch Aggregations Overview

December 18, 2013

A look at the new Aggregations framework arriving in Elasticsearch 1.0 — multi-level, nested calculations that go far beyond what Facets could do.

December 18, 2013

London Elasticsearch User Group Presentation

September 10, 2013

A talk at the London Elasticsearch meetup on how DueDil uses Elasticsearch — bulk indexing, and using Facets to add depth to search.

September 10, 2013

Elasticsearch "Yellow" cluster status explained

April 20, 2013

What yellow status actually means, and how the Cluster Health API reports primary vs replica shard allocation.

April 20, 2013

DueDil completes Series A funding and announces $5m in new investment

April 11, 2013

DueDil's $5m Series A round, led by Notion Capital and Oak Investment Partners — funding new territories and additional data sets.

April 11, 2013

Using Elasticsearch on Amazon EC2

January 26, 2013

Setting up an Elasticsearch cluster on EC2: installing the AWS cloud plugin, configuring discovery, and watching nodes find each other.

January 26, 2013

DueDil: Trust the data

December 29, 2012

DueDil's intro video — a quick look at what we're building.

December 29, 2012