Everything has changed in search. Nothing has changed in search.
For about 15 years, the way we built search systems stayed remarkably stable. Inverted indexes, TF-IDF, BM25, maybe some learning-to-rank on top if you were fancy. Elasticsearch or Solr in the middle, a query parser at the front, a relevance tuner crying quietly in the corner. The shape didn’t really change. We argued about analyzers and field boosts and whether to denormalise. Search teams focused on hyper-optimizing a single step in an established flow before moving on to experiment with improvements the next step. Sure - vector indexing arrived, and we started to work out hybrid search approaches but we did not argue about the overall architecture, because there wasn’t much to argue about.
Then “agentic search” arrived, RAG was mainstream and it felt like everything changed.
The new shape goes something like this: a user submits a query, and instead of one query hitting one index, you get a loop. The system decomposes the question. It picks a tool - maybe your enterprise index, maybe a web search, maybe a structured database or knowledge graph. It reads the results, decides whether it has enough, and if not, refines and goes again. Eventually it hands the gathered context to a language model, which synthesises an answer, or ranks the documents, perhaps. The “search” is no longer a single round trip. It’s an iterative, tool-using, decision-making process.
That sounds like a complete break from how search used to work (and in some ways it is) The indexes are still there underneath, but they’re no longer the system - they’re one tool among several, called by something smarter sitting on top.
And yet...
When you look at what that “something smarter” actually does, it’s doing the same things engineers have been doing in backend systems forever. It splits cheap, high-volume, well-defined work from expensive, harder work, and puts a boundary between them. It caches. It batches. It decides what’s worth escalating and what can be handled locally. It treats the expensive resource - in this case, a frontier model - like we used to treat the database: something you don’t hit unless you have to.
Glean’s recently released Waldo is a clean example. Waldo is a small, specialised model that runs before the frontier model. It handles the search loop - decomposing the query, picking tools, deciding when there’s enough evidence - and then hands a tidy context window to the reasoner. ~50% lower latency, ~25% fewer tokens, no drop in quality. The framing is “agentic search model,” but the move is older than that: put the cheap specialist in front of the expensive generalist. We’ve done this with caches in front of databases. We’ve done it with queue workers offloading from request handlers. We’ve done it with edge functions absorbing the trivial stuff before it ever hits the origin. Now we’re doing it with one model in front of another.
It’s tempting to think the design problem is solved by exposing everything as a tool - wrap your services in MCP, give the model a big toolbox, let it figure things out. But that just relocates the difficulty. Once you have ten tools, or fifty, the interesting question isn’t can the model call them - it’s which one, when, and how does it know it’s done? That’s a planning problem, not a plumbing problem, and it’s the same planning problem we’ve always had in distributed systems: routing, orchestration, knowing when to escalate. Waldo’s framing as a planner is what makes it interesting. It’s not “another model that can call tools.” It’s a model whose entire job is deciding which tool to call and when to stop. That’s a much harder, much more valuable thing.
This is the “nothing has changed” part. The architectural instinct - separation of concerns, put the cheap work on one side of a boundary and the expensive work on the other - is the same one that has shaped good backend systems for decades. We’re not inventing a new discipline. We’re applying an old one to a new kind of component.
What has changed is what counts as a component. For most of my career the unit of “thing that does work” was a service, a process, a thread, a function. Now it’s also a model. Models have latency profiles, cost profiles, failure modes, and capability boundaries, just like any other component, and the interesting design work is figuring out which model goes where and what sits between them.
That reframing matters, because a lot of the discourse around AI systems treats them as a fundamentally new kind of engineering - something where the old rules don’t apply and you need a whole new playbook. I don’t buy it. The rules apply fine. It’s the components that are new.
If you’ve spent years thinking about where to put a cache, when to introduce a queue, how to keep an expensive backend from being hammered by trivial work - you already have most of the instincts you need to design good agentic systems. The vocabulary is unfamiliar. The shape is not.
Everything has changed. Nothing has changed. Both are true, and the gap between them is where the interesting design work lives.
