Technical SEO

How AI Search Engines Decide Which Business to Cite

Most businesses that ask “how do I get cited by AI search?” are looking for a single lever — one thing to add or change that puts them in the answer. There isn’t one. But there is a process, and it’s the same two-stage filter every major AI search engine uses.

Understanding the two stages tells you exactly where to focus.

Stage one: ranking

For Google AI Overviews, stage one is classic search ranking. AI Overviews retrieve from Google’s existing search index. A page that doesn’t rank in the top results for a query — roughly top 10 — is not a candidate for citation in that query’s AI Overview.

This means the foundational work is the same as it’s always been: well-structured pages, clear relevance signals, technical SEO that allows Google to crawl and index the page, internal linking, and authority built over time.

For ChatGPT’s web search mode, the retrieval is similar: live web search results feed the citation engine. Pages that rank well for the query are the pages ChatGPT is retrieving and reading.

The shortcut businesses look for — getting cited in AI answers without having to rank — doesn’t exist for Google or ChatGPT. AI engines are not bypassing search; they’re reading from the top of it.

Stage two: extractability

Ranking gets a page into the candidate pool. Extractability determines whether the AI engine pulls an answer from it.

AI engines apply a second filter after retrieval: they look for pages where the answer to the question is immediately and clearly accessible. The page doesn’t just need to be about the topic — it needs to structure the answer in a way that can be quoted.

Research from Princeton and Georgia Tech published in 2024 found that roughly 44% of AI citations come from the first 30% of a page. The implication is direct: if the answer to a question isn’t available near the top of the page, the page is less likely to be cited even if it ranks for the query.

Pages that pass the extractability filter consistently share a few structural patterns:

Question-shaped headings. H2s phrased as full questions (“How long does a brand strategy engagement take?”) signal to AI retrieval that the section directly answers a specific query. Generic H2s (“Our process”) don’t.

Answer-first structure. The answer appears in the first one or two sentences of the section, before the explanation. AI engines can pull that sentence and attribute it. Pages that bury the answer in the third paragraph after context-setting are harder to cite.

Direct, declarative sentences. AI engines are pulling text to quote. Hedged, passive, or overly qualified sentences don’t quote well. “Most HVAC companies in Vancouver don’t have neighbourhood-specific pages” is citeable. “There may be some variation in how local HVAC businesses approach their digital presence” is not.

Where local businesses have an advantage

Local queries — “best HVAC company in Burnaby,” “law firm near North Vancouver,” “design studio for startups in Vancouver” — have a third factor that general queries don’t: entity grounding.

When AI engines answer questions recommending specific businesses, they need to know who the business is, where it’s located, and what it does. This comes from entity schema — specifically, LocalBusiness or ProfessionalService markup with a complete sameAs array linking to the Google Business Profile, LinkedIn, and other authoritative directories.

A business without entity schema can rank well and have excellent page structure and still not get cited by name — because the AI engine can’t confidently resolve which entity it’s recommending. Entity schema closes that gap. It tells every AI engine: this business is named X, it’s located at Y, it serves Z area, and here are the external sources that confirm it.

For local service businesses, entity schema is the single highest-leverage GEO action precisely because local queries require confident entity resolution to produce a business citation.

What doesn’t move the needle

Some tactics get attention in GEO discussions without meaningfully affecting citation rates.

llms.txt. Google’s Search team has been explicit: llms.txt is not used by Google for search or AI Overviews. The file may have value for agentic AI (AI agents completing tasks on your site), but it doesn’t affect whether AI search engines cite your pages. See our full breakdown of the llms.txt situation.

AI-specific rewrites. Google’s Search Central team listed “writing content specifically for AI” alongside llms.txt as an approach that’s not necessary for AI Overview visibility. Content written well for human readers — clear, specific, answer-first — is the same content that performs well for AI extraction.

Blocking training crawlers. Blocking GPTBot (OpenAI’s model training crawler) has no effect on ChatGPT citations. The live-search citation crawler is OAI-SearchBot, which is a separate agent. See why GPTBot and OAI-SearchBot are different.

The practical sequence

For a Greater Vancouver service business starting from limited AI visibility, the order of operations is:

  1. Get indexed and rank in the top results for the queries where you want citations. AI Overviews and ChatGPT can’t cite what they can’t find.
  2. Implement LocalBusiness or ProfessionalService schema with a complete sameAs array. Entity grounding is the first GEO-specific action with meaningful leverage.
  3. Add FAQPage schema on service and informational pages. FAQPage is the schema type AI engines extract from most readily.
  4. Rewrite H2s as full questions on pages that are citation candidates. This is the structural change with the highest return per hour of effort.
  5. Audit robots.txt to confirm citation crawlers aren’t blocked. OAI-SearchBot, ClaudeBot, PerplexityBot, and Googlebot-Extended should all have explicit access.

The first item — ranking — is the foundation that makes items two through five effective. GEO is an optimisation layer on top of classic SEO, not a replacement for it.

GEO audits — entity schema, extractability review, robots.txt, and AI impression baseline — are part of every Arara SEO site review. If you’d like to know where your site stands in AI search visibility, the audit is free.

All posts