Optimising for AI search

Start with a question that sounds simple and isn’t: how would you A/B test a ChatGPT answer?

You can’t. Not in the way you A/B test a web page. There’s no variant to serve, no traffic to split, no conversion event at the end. The "page" is generated fresh each time, by a model you don’t control, from a context you can’t see, for a user you can’t identify. The thing you want to influence - whether your brand shows up in the answer - happens inside a box with no instrumentation hooks.

That question is the whole problem in miniature. And it’s why the standard advice - do more SEO - is not so much wrong as aimed at the wrong target.

Why "do SEO harder" misses

Classical SEO is, at its core, the practice of optimising for a crawler’s signals. Inbound links. Clean markup and schema. Page speed. Crawl budget. Mobile rendering. Core Web Vitals. Two decades of accumulated craft, all of it pointed at one question: how do I make Google’s ranking system put my URL higher in a list of links?

That craft isn’t worthless now. But the surface it was built for - the ranked list of ten links - is the surface that’s disappearing. The new surface is a synthesised paragraph with a handful of citations, and it’s produced by a retrieval-and-generation pipeline that weighs different things.

So "do SEO harder" optimises a set of signals that still matter somewhat, for a results page that matters less every quarter. You can do all of it perfectly and still never appear in a single AI answer.

What the new signals probably are

I say "probably" because nobody outside the model providers has the ground truth, and anyone who tells you they’ve cracked the algorithm is selling something. But the pipeline I described earlier in this series tells you where the leverage has to be, because a citation can only happen if a document clears every stage of it.

Working through the stages, the signals that matter look roughly like this:

  • Corpus presence. Has the provider crawled you at all? Are you in the index that gets searched? Different providers crawl differently, refresh at different rates, and respect different rules. You can be invisible to one and prominent in another. This is the first gate and the most binary.
  • Retrievability. When a relevant question is asked, does your content come back in the candidate set? This is the part that is classical retrieval - lexical and semantic matching, the embedding neighbourhood your content sits in. If your page is the answer but it’s phrased in language no one searches in, it won’t be retrieved.
  • Extractability. Can a model lift a clean, quotable, factual sentence out of your page? Content that buries its claims in hedging, marketing language, or a wall of context is hard to cite. Content that states things plainly is easy to cite. This is new, and it cuts against a lot of how brands have been taught to write.
  • Authority, as the model perceives it. Not PageRank - the model’s own learned sense of which sources are trustworthy for which topics. Harder to influence, slower to move, and not something you can buy a backlink to fix.
  • Corroboration. Models lean toward claims that show up consistently across multiple independent sources. Being the lone voice making a claim is weaker than being one of many. This rewards genuine presence across the web, not a single optimised page.

Notice that some of these - retrievability, the embedding neighbourhood - are exactly the things the search-engineering world has been arguing about all year. The work on quantisation, on late interaction, on the architecture of retrieval at scale: that’s not abstract any more. It’s the machinery that decides whether a business shows up when a customer asks an AI a question. The niche got central.

The measurement problem is the hard part

Here’s the thing I most want to land. Even if you accept every signal above and act on all of them - you still can’t see whether it worked.

With classical SEO you had a rank tracker. You typed in your hundred important queries, the tool checked your position, and you watched the numbers move. The feedback loop was tight and the metric was legible.

There is no rank tracker for "are we in the answer". The answer is non-deterministic, varies by provider, varies by phrasing, varies by day, and often varies between two identical prompts thirty seconds apart. The only way to know your visibility is to measure it the hard way:

  • Maintain a set of prompts that represent the questions you want to be the answer to.
  • Run them, on a schedule, across every model and assistant your customers actually use - ChatGPT, Claude, Gemini, Perplexity, the AI Overview, whatever comes next.
  • Parse the responses: were you mentioned? Cited? In which sentence? Favourably? Alongside which competitors?
  • Do it often enough, and across enough phrasings, to see through the noise of non-determinism to an actual signal.

That is a real, ongoing, technically involved measurement programme. It is much closer to running a monitoring system than to running a marketing campaign. And almost nobody has one, because the discipline is about two years old and the tooling for it barely exists.

A discipline, forming in real time

This has started to acquire a name - AI Engine Optimisation, AEO, the inevitable acronym - and like all new acronyms it’s being attached to a lot of things, some serious and some not. Strip the hype and the substance underneath is straightforward: the surface that consumer attention flows through has changed, the old optimisation discipline was built for the old surface, and a new discipline has to exist for the new one.

I find this genuinely energising, and I’ll be honest about why. It sits exactly on the seam between two things I’ve spent my career on - the engineering of retrieval systems, and the question of how people actually find things. For fifteen years those felt like a back-end concern and a product concern. The shift this series has been describing fuses them. Whether a business is visible in an AI answer is now a retrieval-engineering question with a commercial outcome attached, and the measurement layer that connects the two doesn’t properly exist yet.

That’s the kind of problem I can’t stop thinking about. Which is not, it turns out, an accident - and it’s what the next post is about.