GEO Audit Methodology: How Persipica Measures AI Visibility

How are Persipica GEO audits conducted?

A Persipica GEO audit is a structured, repeatable assessment of how AI platforms perceive, describe, and recommend a specific company. Each audit tests a company's visibility across up to six buyer journey stages (Discovery, Comparison, Brand, Buying Intent, Use Case, and Objection) using natural language prompts designed to mirror real buyer behaviour.

Audits are run manually with controlled conditions: each query is submitted as a fresh conversation with no prior context, using the default settings of each AI platform. This ensures that results reflect what a typical buyer would see, not a personalised or contextually primed response.

What is the difference between Persipica's research benchmark and client audit methodology?

Persipica uses two related but distinct methods. The published research benchmark is deliberately narrow and comparable: controlled query sets and documented model conditions. Client diagnostics use all six available buyer journey stages (Discovery, Comparison, Brand, Buying Intent, Use Case, and Objection), selecting the combination that best fits the commercial diagnostic needed.

The distinction matters. Research pages should be read as controlled benchmark evidence. Client audit pages should be read as the fuller diagnostic service that builds on the benchmark and turns the findings into an implementation roadmap. Persipica documents stage selection, model set, query depth, and source quality for each audit so results can be interpreted correctly. For the commercial scope, see what a Persipica audit includes.

Live audit tool scope

The live Persipica audit tool uses current frontier models from OpenAI and Anthropic. Those models may differ from the model versions used in a published study, because public studies are fixed to the methodology documented at the time of collection. Persipica keeps these separate so current client diagnostics can improve while published research remains auditable and comparable.

Which AI platforms does Persipica test?

The live audit tool tests three frontier models (ChatGPT, Claude, and Gemini) to capture meaningful differences in how each system represents companies. Retrieval-led systems such as Perplexity are important to GEO strategy, but they are not claimed as current benchmark coverage unless explicitly scoped in a client engagement.

Platform	Model version	Configuration
ChatGPT	GPT-5.4 Mini	Default settings, no browsing, fresh conversation per query
Claude	Claude Sonnet 4.6	Default settings, no tool use, fresh conversation per query
Gemini	Gemini 3.1 Pro Preview	Default settings, fresh conversation per query

Model versions are documented at the time of each audit. As AI platforms update their models, audit results may shift, which is why ongoing tracking is a core part of the Persipica service. The live audit tool is designed to be a gold-standard GEO diagnostic by continuously improving how it acquires source context, separates evidence types, and assesses AI responses without rewriting the published study record.

How are audit queries designed?

Each audit runs a set of natural language queries across six buyer journey stages, designed to mirror the prompts real buyers submit when researching solutions. The total query count varies based on the depth level and stages selected, typically 20 to 40 queries per company for a standard audit, with deeper engagements running higher volumes to improve statistical reliability.

Buyer journey stage	Stage description	Relative weight
Discovery	Problem-awareness queries: unbranded, testing whether the company surfaces without being named	1×
Comparison	Vendor comparison queries, testing competitive positioning and share of voice	1.5×
Brand	Direct brand name queries, testing description accuracy and entity recognition	1×
Buying Intent	High-commercial-intent queries, testing presence at the moment buyers are ready to shortlist	2×
Use Case	Scenario-specific queries, testing fit for specific buyer problems and workflow needs	2×
Objection	Skeptical buyer queries, testing how AI handles pricing, risk, and trust concerns	1.5×

Discovery and buying intent queries deliberately omit the company name. These stages test whether the company appears when a buyer describes their problem without naming a specific vendor. These are the queries that drive new pipeline. Buying Intent and Use Case stages carry the highest weight (2×) in the overall score, reflecting their direct commercial influence on vendor shortlists.

How are scores calculated?

Each audit produces five benchmark metrics alongside a composite AI Visibility Score. These metrics are designed to give teams a commercially useful picture of not just whether they appear, but how they appear and how that compares across buyer journey stages.

The five benchmark metrics

Mention rate: the share of queries in which the company is named at all, in any context.
Positive citation rate: the share of responses where the brand is mentioned with a positive or recommendatory framing.
Recommendation rate: the share of responses where the model actively recommends or endorses the company as a top pick.
Net sentiment score: positive and neutral mentions minus negative and hallucinated mentions, as a percentage of total mentions.
Hallucination rate: the share of responses where the model makes clearly unsupported or fabricated claims about the brand.

The composite AI Visibility Score is a multi-factor calculation, not a simple citation count. Each response is assessed for recommendation strength (listed, recommended, or top pick), sentiment quality, and brand prominence (whether the company appears first, second, third, or later in the response). These factors are combined into a per-response citation score, which is then averaged across all queries and weighted by the commercial significance of each buyer journey stage.

How the citation score is built

Each evaluated response starts with a recommendation tier base: 40 for a factual listing, 70 for a recommendation, 100 for a strong endorsement. That base is adjusted by a sentiment multiplier (positive: full credit; neutral: half credit; negative or hallucinated: penalised) and a prominence multiplier (appearing first carries a small bonus; appearing fourth or later carries a small discount). The resulting score is clamped to a 0–100 range. A response where the company is absent scores zero and is counted separately in the mention rate.

The weighted commercial score aggregates per-stage citation scores using stage weights that reflect commercial intent. Stages with direct purchase relevance carry more weight than awareness-only stages. The overall AI Visibility Score is the average of the weighted commercial score across all evaluated models.

Why results vary between runs

AI models are non-deterministic: the same query submitted twice may produce different responses. This is why a statistically meaningful audit requires sufficient query volume per stage, and why Persipica recommends selecting a depth level that produces at least 20 to 40 queries per company as a minimum. Single queries or small samples produce unreliable citation rates.

Where do the key statistics cited across this site come from?

Persipica references several third-party data sources across the site. Full attribution is provided below.

Statistic	Figure	Source
AI search conversion rate	14.2%	Opollo, 2026 AI Search Benchmark Report
Google organic conversion rate	2.8%	Opollo, 2026 AI Search Benchmark Report
AI referral traffic YoY growth	975%	Opollo, 2026 AI Search Benchmark Report
Buyers using AI in research	73%	Loganix, 2026 AI Buying Behavior Analysis
Buyer journey before first contact	61%	Forrester, 2025 Buyers Journey Survey
Marketers not tracking AI visibility	78%	McKinsey, State of AI 2025

Persipica's own research data

The audit findings presented on the Research page are first-party data from audits conducted by Persipica in April 2026. Each company was tested across a set of buyer journey queries on ChatGPT (GPT-5.4 Mini), Claude (Claude Sonnet 4.6), and Gemini (Gemini 3.1 Pro Preview). The finding that every audited company scored 0% at Discovery and Buying Intent is based on this first-party dataset.

What are the limitations of GEO auditing?

GEO auditing has inherent limitations that Persipica discloses transparently.

Model non-determinism. AI responses are probabilistic. The same query can produce different results across sessions. Persipica mitigates this through query volume (typically 20 to 40 queries per audit) and by running across multiple platforms, but individual query results should not be treated as deterministic.

Model version changes. AI platforms update their models regularly. A published benchmark conducted at a specific point in time may produce different results when re-tested against an updated model. This is why ongoing tracking, not one-time audits, is necessary for reliable GEO measurement.

Retrieval variability. AI platforms with real-time web retrieval (such as Perplexity and ChatGPT with browsing) may produce different results depending on what web content is available at the time of the query. Content published after an audit may change citation rates.

Sample scope. Persipica's published research covers a specific cohort of enterprise companies audited in April 2026. The findings are consistent across this cohort but should not be extrapolated to all industries or company sizes without further validation.

Next step

Want to see how your company scores?

A Persipica audit covers the core benchmark stages across current frontier models from OpenAI and Anthropic, with extended diagnostic stages where needed. Two weeks to a complete picture.

Book Your Audit

How does Persipica measure AI visibility and where do the key statistics come from?