You cannot optimise for prompts you have not seen. Google keyword volume tells you what people typed into a search box last year. It does not tell you what they ask ChatGPT, Perplexity, Claude or Gemini today — and on most B2B niches the gap is wide enough to matter.

We run prompt research on every new retainer. The method below is what survived four iterations. Not theory — the actual operating procedure, with the five sources we pull from, the dedupe rule, and the three-axis scoring model we use to pick the working universe.

Why Google data alone is not enough

Two empirical observations from our portfolio.

LLM prompts are longer. Median prompt length in our tracking sits at 11 to 14 words. Google’s same-business-intent queries median at 3 to 5. Examples — “best AEO agency for B2B SaaS with 50-200 employees” is what someone asks ChatGPT. “AEO agency” is what they type into Google. The first one has no Google volume. It still drives real LLM traffic.

LLM prompt intent skews differently. Google still gets a lot of navigational and short-tail informational traffic. LLMs absorb a disproportionate share of commercial-intent and comparison prompts. So even when Google volume exists, the LLM volume on the same brand is concentrated in a different prompt shape — usually the longer, more decisional one.

The first time we measured this we found 35 to 55% of the prompt pool was invisible if you only used Google keyword data. That is the gap the method below is built to close.

Source 1 — GSC long-tail queries (7+ words)

The cheapest single source. If you have Google Search Console verified, you already have it.

Pull queries dimension over the last six months. Filter to anything 7 words or longer. These are the conversational queries Google already sees on your brand — and the same shape that LLMs index against. We have written about this thread in detail in prompt research vs keyword research, but the short version — long-tail GSC is the highest-signal-per-hour source.

A typical pull on a 6-month-old retainer site returns 40 to 120 unique 7+ word queries. Most are noise. The 20 to 30 that align with real business intent are gold.

Source 2 — AnswerThePublic-style scrapers

Tools like AnswerThePublic, AlsoAsked, Keyword Tool — they scrape Google autocomplete plus People-Also-Ask data and present it as a question tree. On its own, Google-only autocomplete data. Useful as a starting point because it produces dozens of question-shaped prompts at once.

The catch — these tools are still Google-centric. They will miss the prompts that exist on Perplexity or ChatGPT but do not show up in Google’s PAA. Use as a generation source, not a complete source.

The value is in the seed. Pull 100 questions from AnswerThePublic, then run each through Sources 3 and 4 to see which ones LLMs treat as real queries.

Source 3 — engine-by-engine autocomplete and “people also ask” surfaces

Each LLM has its own version of autocomplete. Perplexity shows related questions below answers. ChatGPT (free) shows suggested follow-ups. Gemini surfaces related queries. Bing Copilot has its own PAA pull. These are first-party signals from the engine itself about what it considers connected to your seed query.

The process — pick 10 seed prompts that matter for the business. Run them through each engine. For each engine, log the related queries it surfaces. Repeat for two weeks. After 14 days you have a list of 200 to 400 engine-suggested prompts, with engine attribution.

This is laborious. We typically split it across two team members and a shared spreadsheet. The output is the most engine-faithful prompt source available without a paid tool.

Source 4 — Perplexity API logs (the cleanest signal)

Perplexity is the only engine that exposes a real API with the same retrieval stack as the product. Set up a small daily script that runs your seed prompts against the API and logs the returned citations plus the related queries Perplexity surfaces. We described the setup at depth in Perplexity citations tactics.

What this unlocks — the related-query pool is 2 to 3× larger than what the web UI shows over the same period, because the API does not aggressively dedupe across sessions. After 30 days of daily logging you have a strong picture of which prompts Perplexity treats as connected to your business.

Cost is low — about $30 a month for the prompts a typical retainer needs.

Source 5 — 1:1 buyer interviews

The single source most teams skip. It is also the most accurate signal of what buyers actually ask.

Method — pick 5 to 8 buyers who closed in the last 12 months. Schedule 30-minute calls. Ask one question — “if you had to find a company like ours today, with no prior knowledge, what would you type into ChatGPT or Perplexity.” Let them think and type out loud. Most will produce 4 to 6 distinct prompts in a session.

Across 6 interviews on a typical B2B SaaS, we usually harvest 25 to 40 unique prompts. About half overlap with Sources 1-4. The other half are genuinely new — and they are the prompts that closest match the actual buying decision, not the generic top-of-funnel research.

If you only run one of the five sources, run this one.

Dedupe — by intent, not by string

After pulling from all five sources you will have 400 to 700 raw prompts. Most overlap. “Best AEO agency for SaaS” and “top AEO consultant for SaaS company” and “AEO services for SaaS companies” are one intent, not three.

The dedupe rule — collapse to unique intent, not unique string. Two prompts belong to the same cluster if a buyer would accept the same answer for both. The check is simple — if you wrote one answer page, would it serve both prompts. If yes, dedupe.

We see roughly 3:1 collapse on cross-engine pools. 600 raw becomes 200 unique intents. That number is still too big to act on. Score and prioritise.

The three-axis scoring model

Each unique-intent prompt gets scored on three axes. Each is a 1-to-5 manual score; we have not found a reliable automated way to do this without losing accuracy.

Business fit (1-5). How well does the prompt match what you actually sell. A “best AEO agency” prompt scores 5 for an AEO agency, 1 for an unrelated SaaS. Most prompts in your pool will score 2-3 — informational adjacent, not directly commercial.

Current citation visibility (1-5). Are you cited on this prompt now. 5 = cited on every engine. 1 = invisible. The interesting bucket is 1-2 — these are the prompts you have not earned yet but could.

Competitive density (1-5, lower is better). How many distinct competitor domains the engines cite across a 7-day window. 1 = thin field, easy to enter. 5 = saturated.

The working universe — the 50 to 80 prompts where business fit is 4-5, current visibility is 1-3, and competitive density is 1-3. High-fit, low-current-visibility, low-competition. That is where the next quarter’s citation wins live.

For everything else — keep it logged, score it again next quarter. Prompt landscapes shift.

Operational rhythm

Set up the method once. Refresh quarterly.

Quarter 1. Full method, all five sources. 12-16 hours of analyst time. Outputs a working universe of 50-80 prompts, scored.

Subsequent quarters. Re-run Sources 1, 3, 4. Skip the AnswerThePublic and interview sources unless onboarding new buyer cohorts. 2-3 hours of analyst time. Updates the existing pool plus catches new prompts.

The output feeds directly into the refresh cadence rhythm — which prompts the weekly micro-edits target, which become the quarter’s new pillar bets, which fall off the tracking list because they are no longer competitive.

Most teams pick prompts the way they pick keywords — by Google volume and gut feel. The pool you get from this method is smaller but truer. Eighty prompts you can defend beats four hundred you cannot.