/ blog · benchmarks

We ran 1,000 SaaS queries through every major LLM. Here's who got cited.

A benchmark of 1,000 buyer-intent queries across 50 B2B categories — and which brands actually showed up in Claude, ChatGPT, Perplexity, and Gemini answers.

FMFind me Cited
May 04, 202614 min read

§01tl;dr

We sent 1,000 buyer-intent queries across 50 B2B SaaS categories to Claude, ChatGPT, Perplexity, and Gemini, then parsed every answer for brand and URL citations. 4,000 total responses, 38,140 brand mentions extracted.

Three findings worth your time:

  • Perplexity cites 4.2× more URLs than ChatGPT. If you don’t have clean comparison and “alternatives to” pages, you’re invisible there.
  • Claude is the most concentrated. Top 3 brands per category capture 71% of all citations. Long tail brands almost never show up.
  • Notion was cited more than HubSpot on marketing-tool queries. Stripe was cited on 94% of payments queries — the highest of any brand we measured.
▸ the headline number

Of 1,000 queries, only 217 returned the same top-cited brand across all four models. The other 78.3% — the citation graph is fragmented, model by model.

§02how we ran it

The setup was simple — and using our own product, deliberately. We took the top 20 buyer-intent queries per category from Ahrefs Keywords Explorer (commercial intent > 50, search volume > 1k), then ran each through all four models with a neutral system prompt.

# the system prompt — identical across models
SYS="Answer the user's question thoroughly. When you reference
    specific companies, tools, products, or websites, mention them by name."

$ findmecited bench --queries 1000.csv --models all --runs 3
▸ querying 4 models · 3,000 calls per model · ~2.4hrs
▸ extracting brand mentions via gpt-4 + manual review
✓ done · 4,000 responses · 38,140 brand mentions

Each query was re-run three times per model to control for stochasticity. We took the median citation (cited in ≥2 of 3 runs) as the signal.

§03the winners

A handful of brands are cited so often, they’re effectively the default answer in their category. We’ll call them citation monopolies.

citation rate · buyer-intent queries · 4 modelsn=1000
Stripe
94%
Notion
88%
Linear
81%
Figma
79%
HubSpot
67%
Vercel
64%
Webflow
58%
Salesforce
42%

The interesting pattern: Salesforce got beat by HubSpot 67% to 42% on CRM queries — despite Salesforce being the larger brand by revenue. HubSpot’s comparison content — the “X vs Salesforce” pages — is structurally easier for an LLM to extract recommendations from than Salesforce’s product marketing.

“LLMs reward specificity, not authority. The brand that explains the trade-offs gets cited. The brand that posts about how great they are doesn’t.”— Maya Kapoor, Head of Growth at Conduit

§04the silence

The more striking finding wasn’t who got cited — it was who didn’t. Entire categories returned zero brand citations in 30%+ of responses.

Categories with the highest “no-citation” rate:

  1. Competitive intelligence tools — 64% of answers cited zero brands
  2. Sales engagement platforms — 41%
  3. Customer data platforms — 38%
  4. Headless CMS — 31%

These aren’t unprofitable categories. CDP alone is a $5B market. But every brand in the space looks identical to an LLM. Same generic copy, same “AI-powered platform” positioning. Generic positioning = generic answers.

§05model personalities

The four models have real, measurable differences in how they cite. Treat them as separate channels.

avg. citations per response · all queriesn=4000
Perplexity
7.2
Gemini
4.8
ChatGPT
3.4
Claude
2.8

Claude writes the most concise answers and is the most selective — but when it cites, it’s almost always in the genuine top 3.

Perplexity cites everyone. Average response includes 7.2 named brands. If you’re invisible on Perplexity, you have a discoverability problem, not a quality problem.

ChatGPT heavily favors recognizable brand names over technical correctness — leading to outdated recommendations in fast-moving categories.

Gemini is the wildcard. Highest variance run-to-run. The same query asked twice can return completely different brand sets.

§06what to do tomorrow

▸ if you do nothing else this week

Run your top 10 buyer-intent queries through all four models. Once. Look at who they cite. That’s the competition you didn’t know you had.

Three concrete actions, ranked by leverage:

  1. Build comparison pages. “X vs Y” pages are the single most-cited page type. Even if you’re the smaller brand. Especially if you’re the smaller brand.
  2. Get on Reddit and G2. Both are heavily cited by Perplexity and ChatGPT. Authentic third-party content beats your own marketing pages 3:1 in our data.
  3. Write the “best X for Y” page yourself. If no one in your category has the definitive listicle, you’ll rank in it — and LLMs cite listicles.

We’ll re-run this benchmark every quarter. Subscribe if you want it in your inbox. Or run your own check in 60 seconds.