Unified Analytics: GEO, Web, and Social Analytics in One Data Model

Most marketing analytics stacks measure three separate realities and treat them as one. Web analytics — Google Analytics 4, Adobe Analytics, Mixpanel — captures what happens on your domain after someone arrives. Social analytics — Meta Insights, LinkedIn Analytics, Sprout Social — captures what happens on platform before someone clicks. GEO analytics captures whether AI engines are citing your brand in the responses that increasingly shape what someone decides to click at all. Each discipline has mature tooling. None of them talk to the others. The result is a fundamental blind spot: marketers optimise the channels they can see while the channel they cannot see — AI-mediated discovery — quietly shapes the behaviour they observe everywhere else.

This article makes the case for unifying all three, defines the data model required to do it, maps the metrics that matter at each stage of the customer lifecycle, and identifies the business decisions that become possible only when the three data streams are combined. It builds on the eight-phase lifecycle framework from the GEO and AEO Optimisation Playbook and is designed for marketing architects and data teams who need a structural foundation, not a dashboard template.

The Three-Silo Problem

Before building anything, it is worth naming exactly what is lost when analytics silos exist. The loss is not just inconvenience — it is systematic misattribution that leads to wrong investment decisions.

Consider a realistic enterprise buyer journey in 2026: A procurement manager at a mid-market manufacturer queries Perplexity for "best ERP systems for discrete manufacturing". Your brand appears in the third citation. She then searches your brand on LinkedIn, reads two posts, and visits your website the next day directly. She downloads a whitepaper, re-engages two weeks later via a LinkedIn retargeting ad, and books a demo. Your web analytics reports the demo as a direct session or, at best, a LinkedIn-attributed conversion. Your social analytics reports two post engagements and one ad click. Your GEO analytics — if it exists at all — has no connection to this individual. The AI citation that opened the journey is invisible in every report.

This is not an edge case. As of 2025, Gartner estimates that AI-assisted search influences over 40 percent of B2B purchase journeys above €50,000 in deal value. The share is rising. A measurement architecture that cannot trace the AI-to-social-to-web path is systematically over-crediting late-funnel channels and under-crediting the content and schema investments that create AI visibility.

The three silos each have a distinct blind spot:

Web analytics captures post-arrival behaviour but has no visibility into pre-click intent, AI-referred journeys (most AI engines do not send referrer headers), or social dark traffic (direct visits from links shared on messaging platforms).
Social analytics captures platform engagement but cannot connect a LinkedIn post view to a web conversion, a Perplexity citation, or a CRM opportunity. Platform analytics are session-level, not journey-level.
GEO analytics captures AI citation events — when, where, and how prominently your brand appears in AI responses — but without a connection to web sessions or social signals, it cannot demonstrate business impact.

Unification does not eliminate these limitations. It creates a shared entity layer — the content asset and the lifecycle stage — against which signals from all three channels can be mapped, correlated, and reported together.

Event Model · GEO/AEO Flow

Loading diagram…

Command

Event

Read Model

Processor

Scroll · Drag · Pinch to zoom

The Unified Data Model

The unified analytics model has two structural layers: dimensions that define the entities all three channels share, and fact tables that record the signals each channel produces. The join key across all fact tables is the content asset — the specific page, post, or piece of content that appears in an AI response, receives social engagement, and drives web sessions.

Dimension Tables

Dimensions are the shared reference objects. They need to be defined once and maintained centrally — typically in a data warehouse or a simple metadata registry if you are starting small.

dim_content_asset

Field	Type	Description
`asset_id`	string (PK)	Canonical URL path, e.g. `/blogs/geo-aeo-playbook`
`title`	string	Page or content title
`content_type`	enum	blog_post, product_page, landing_page, support_article, case_study
`lifecycle_stage`	enum	discovery, consideration, sales, onboarding, support, commerce, retention, advocacy
`primary_intent`	enum	informational, navigational, transactional, commercial
`schema_types`	string[]	JSON-LD types applied: ["Article","FAQPage","HowTo"]
`published_at`	timestamp	First publication date
`modified_at`	timestamp	Most recent significant update
`word_count`	integer	Approximate content length
`has_faq`	boolean	FAQPage or QAPage schema present
`has_stats`	boolean	Contains quantitative data points
`has_comparison_table`	boolean	Contains a structured comparison table

dim_lifecycle_stage

Stage ID	Stage Name	Primary Query Intent	Primary AI Engine Behaviour
1	Discovery	Informational — "what is X"	Category synthesis, listicle extraction
2	Consideration	Commercial — "best X vs Y"	Comparison tables, rating extraction
3	Sales	Transactional — "X pricing, buy X"	Offer schema, product cards
4	Onboarding	Instructional — "how to set up X"	HowTo step extraction
5	Support	Diagnostic — "X error, X not working"	QAPage verbatim extraction
6	Commerce	Shopping — "buy X, X deal"	Product feed, Merchant Center cards
7	Retention	Evaluative — "alternatives to X"	Comparison, ROI content
8	Advocacy	Social proof — "X reviews, X case study"	Review schema, co-citation clusters

dim_channel

Field	Type	Description
`channel_id`	string (PK)	e.g. `ai_perplexity`, `social_linkedin`, `web_organic`
`channel_family`	enum	ai_engine, social_platform, web
`platform_name`	string	Perplexity, ChatGPT, LinkedIn, GA4, etc.
`is_paid`	boolean	Distinguishes organic vs. paid signals
`sends_referrer`	boolean	Whether the channel passes referrer headers to web analytics

Fact Tables

Each fact table records time-stamped signals from one channel family, linked to asset_id and lifecycle_stage for cross-channel joining.

fact_ai_citation — one row per AI probe result per asset per engine

Field	Type	Description
`citation_id`	string (PK)	Unique probe result ID
`asset_id`	FK → dim_content_asset	Content asset probed
`probe_date`	date	Date probe was run
`engine`	enum	perplexity, chatgpt, gemini, copilot, claude
`query_text`	string	Exact query submitted to AI engine
`query_lifecycle_stage`	FK → dim_lifecycle_stage	Lifecycle stage the query targets
`brand_cited`	boolean	Whether brand was mentioned in response
`asset_cited`	boolean	Whether the specific URL was cited as a source
`citation_rank`	integer	Position of brand mention (1 = first mention)
`sentiment_score`	float	-1.0 (negative) to +1.0 (positive)
`competitor_cited`	boolean	Whether a named competitor was cited instead
`response_excerpt`	text	Relevant excerpt from AI response

fact_web_session — one row per session per asset visit

Field	Type	Description
`session_id`	string (PK)	GA4 session ID or equivalent
`asset_id`	FK → dim_content_asset	Page visited
`session_date`	date	Date of session
`source`	string	UTM source or inferred source
`medium`	string	organic, social, referral, direct, cpc
`inferred_ai_referral`	boolean	Heuristic flag: direct visit within 30 min of a known AI probe event
`engaged_session`	boolean	GA4 engaged session (≥10s, ≥2 pageviews, or conversion)
`scroll_depth_pct`	integer	Max scroll depth reached (25/50/75/90/100)
`time_on_page_sec`	integer	Active time on page
`converted`	boolean	Session resulted in a defined conversion event
`conversion_type`	string	demo_request, whitepaper_download, purchase, trial_signup

fact_social_signal — one row per social engagement event per asset

Field	Type	Description
`signal_id`	string (PK)	Unique signal event ID
`asset_id`	FK → dim_content_asset	Content asset referenced
`signal_date`	date	Date signal was recorded
`platform`	enum	linkedin, twitter_x, reddit, instagram, facebook, hacker_news
`signal_type`	enum	share, mention, comment, reaction, save, dm_link
`reach`	integer	Estimated audience reached by this signal
`engagement_count`	integer	Reactions + comments on this specific signal
`sentiment_label`	enum	positive, neutral, negative
`is_organic`	boolean	Distinguishes earned social from paid promotion
`post_url`	string	URL of the social post referencing the asset

fact_unified_content_performance — materialised view joining all three fact tables, aggregated weekly per asset

Field	Description
`asset_id`	Content asset
`week_start`	Week beginning date
`lifecycle_stage`	Assigned lifecycle stage
`geo_citation_rate`	% of AI probes this week where brand was cited
`geo_avg_rank`	Average citation position across probes where cited
`geo_sentiment_avg`	Average sentiment score across all citation events
`web_sessions`	Total sessions landing on this asset
`web_engaged_rate`	% of sessions classified as engaged
`web_conversion_rate`	% of sessions resulting in a conversion
`social_total_reach`	Cumulative reach from social signals referencing this asset
`social_engagement_rate`	Engagements / reach for social signals
`unified_content_score`	Composite score (weighted: 40% GEO, 35% web engagement, 25% social reach)

Lifecycle-Stage Metrics: What to Measure and Why

The eight-phase lifecycle from the GEO/AEO Playbook maps directly to different measurement priorities. Each stage has a different primary question and a different leading indicator across the three channels. The unified model enables cross-channel answers that no single-channel tool can provide.

Stage 1 — Discovery

The question is: are we being cited when buyers are learning about the category? Discovery queries are informational — "what is X", "best X for Y", "how does X work". At this stage, web conversion is not the metric; AI citation rate and social reach on educational content are.

GEO metric: brand citation rate on category-level probe queries (target: >50% citation across top 5 informational queries)
Web metric: organic sessions to educational content, scroll depth on definition pages (target: >75% scroll on category pages)
Social metric: organic share rate on educational posts (LinkedIn posts sharing stat-led content should achieve >2% engagement rate to signal relevance)
Unified insight: content assets with high social reach but low GEO citation rate identify pieces that humans share but AI engines do not extract — typically missing FAQ schema or lack of a definition block

Stage 2 — Consideration

The question is: are we appearing in comparison queries? Buyers are evaluating options. AI engines are the new analyst, synthesising "X vs Y" queries from comparison tables and review aggregators.

GEO metric: brand rank in competitive probes ("best [category] for [use case]") — track rank 1–3 separately from rank 4+, since AI engines rarely cite below position 3
Web metric: sessions to comparison and case study pages, time on page (target: >180s on comparison pages, indicating genuine evaluation engagement)
Social metric: LinkedIn comment sentiment on comparison posts — a leading indicator of perceived competitive positioning
Unified insight: assets where competitor is cited instead of your brand, cross-referenced with your existing content gaps, produce a prioritised content backfill roadmap

Stage 3 — Sales

The question is: are AI engines driving transactional intent to our pricing and conversion pages? Transactional queries ("X pricing", "sign up for X") are now answered inline by AI — your pricing page either exists with Offer schema or you are absent from the response.

GEO metric: citation rate on transactional probe queries; presence in Perplexity Sponsored Answer positions
Web metric: sessions and conversion rate on /pricing, demo request form completions attributed to direct traffic (a reliable proxy for AI-referred sessions)
Social metric: LinkedIn ad click-through rate on retargeting audiences that previously engaged with consideration content
Unified insight: the gap between GEO citation rate on transactional queries and the web conversion rate on /pricing reveals whether the AI mention is credible and specific enough to drive action

Stage 4 — Onboarding

The question is: are new customers finding accurate help through AI engines before contacting support? Post-sale AI queries are "how to set up X", "X quickstart", "X first integration" — and if a competitor's documentation answers these questions more clearly, your brand is not in control of the onboarding experience.

GEO metric: citation rate on onboarding probe queries; track whether your documentation or a third-party forum is cited
Web metric: sessions to quickstart and documentation pages from customers (segment by logged-in or post-signup cohort); support ticket deflection rate
Social metric: community forum posts and LinkedIn comments asking onboarding questions — a volume spike signals a documentation gap AI engines cannot bridge

Stage 5 — Service and Support

The question is: when customers query AI engines about errors and issues, is your content the source of the answer? Support-stage queries are highly specific, and the risk is not just missed citation — it is a competitor's community forum answer being cited about your own product.

GEO metric: citation rate on diagnostic probe queries per product feature; flag probes where a non-owned source is cited
Web metric: sessions to support articles segmented by whether the visitor came from an AI engine referral (detectable via UTM tagging on QAPage schema url fields)
Social metric: Reddit and Hacker News mention volume for your brand in troubleshooting contexts — both are training sources for future LLM versions

Stages 6–8 — Commerce, Retention, Advocacy

At these stages, the unified model becomes a revenue attribution tool rather than a visibility tool. The question shifts from "are we cited?" to "do citations convert, retain, and generate referrals?"

Commerce: product feed impression data from Merchant Center, correlated with GEO commerce probe citation rates, and web purchase conversion rates by referral source
Retention: GEO citation rate on "alternatives to X" probes (a churn-risk proxy), web sessions to retention content (advanced use cases, changelog), social mention sentiment in existing customer communities
Advocacy: review velocity on G2/Trustpilot (a direct training signal for next-generation models), case study web sessions and social shares, inbound referral link growth from independent domains

Business Decisions the Unified Model Enables

Data models are only valuable if they inform decisions. The following are the five highest-leverage decisions that require all three channels to be unified — and that are impossible to make correctly without the model.

Decision 1: Content investment prioritisation

Without the unified model, content teams prioritise based on organic traffic or social engagement independently. The unified content score — weighting GEO citation rate, web engagement rate, and social reach together — produces a materially different ranking than any single-channel view. Content that ranks highly on GEO and social but drives low web conversion reveals a landing page quality problem. Content that converts well but has zero AI citation signals a schema or structure gap that a targeted fix can remedy. The unified score is the true measure of content's business value.

Decision 2: Budget allocation across channels

The most dangerous assumption in most media mix models is that the measurable channel deserves the budget. When AI citations are invisible in attribution, paid social and paid search receive disproportionate credit for conversions that began with an AI probe. Modelling the correlation between GEO citation events and subsequent direct web sessions — using the inferred_ai_referral flag — provides an estimated AI-influenced conversion volume. Even a conservative model showing 15% of direct conversions are AI-influenced justifies significant reallocation toward GEO content investment and away from late-funnel paid spend.

Decision 3: Lifecycle stage gap analysis

Aggregating the unified_content_score by lifecycle_stage immediately surfaces which stages are under-invested. Most brands discover they have strong Discovery content and reasonable Sales content, but almost nothing optimised for Support or Retention stages. These gaps are invisible in web analytics alone (support traffic is often suppressed by chatbots or gated documentation). The unified model makes them visible as low GEO citation rates on support-stage probe queries combined with high Reddit or community forum mention volume — a direct signal that your support content is losing ground to community alternatives.

Decision 4: Competitive threat assessment

The competitor_cited field in fact_ai_citation, aggregated across probe queries by lifecycle stage, produces a competitive threat map with a specificity that no traditional competitive intelligence tool can match. It answers: on which specific query types is a specific competitor displacing us, in which AI engine, and at which stage in the buyer journey? A competitor gaining citation share in Consideration-stage queries is a near-term pipeline threat. A competitor gaining share in Support-stage queries is a churn risk. Separating these signals enables proportionate, targeted responses.

Decision 5: Schema and content quality investment ROI

Schema changes — adding FAQPage, applying HowTo markup, updating dateModified — are low-cost interventions with variable ROI. The unified model makes the ROI measurable: run a structured probe set before and after each schema change, track GEO citation rate change, and correlate with web session change on the affected pages. Content quality investments (adding a definition block, a comparison table, a statistics section) are similarly testable. Over a quarter of deliberate A/B testing against the fact_ai_citation table produces a ranked list of the changes that actually move citation rates, as opposed to the changes theory suggests should work.

Implementation Architecture

The unified model can be implemented at four levels of sophistication. Each level delivers value and can serve as a foundation for the next.

Level 1 — Spreadsheet (weeks 1–4)

A Google Sheet or Notion database with one row per content asset, manually populated with monthly GEO probe results (tracked via Brand Intelligence), GA4 web session exports, and social analytics exports. This is sufficient to produce the first unified content score and identify the most obvious lifecycle-stage gaps. The manual cadence forces the discipline of asking the right questions before investing in automation.

Level 2 — Data warehouse (months 1–3)

BigQuery or Snowflake hosts the dimension and fact tables. GA4 raw data streams via Google BigQuery Export. Social analytics data ingests via API connectors (LinkedIn Marketing API, Meta Graph API, Reddit API). GEO probe results ingest from Brand Intelligence or a custom weekly probe script. A dbt model materialises fact_unified_content_performance on a weekly schedule. A Looker Studio or Metabase dashboard surfaces lifecycle-stage scorecards and the unified content score ranking.

Level 3 — Real-time pipeline (months 3–6)

GEO probe events are pushed to a Pub/Sub topic or Kafka stream within minutes of running. Web sessions stream via Measurement Protocol to a custom endpoint alongside GA4. Social signals ingest via webhooks where platforms support them, or via a daily polling job. The data warehouse becomes near-real-time. Anomaly detection on GEO citation rate drops triggers a Slack notification before the weekly report cycle. This level is appropriate for brands where AI citation share is a board-level metric.

Level 4 — AI-augmented decisioning (months 6+)

A language model with access to the unified data warehouse answers natural-language queries: "Which competitor gained the most GEO citation share last month and in which lifecycle stage?", "Which content assets have high social reach but have never appeared in an AI citation probe?", "What is the correlation between content freshness (days since last update) and GEO citation rate across our Support-stage assets?" At this level, the unified model becomes a conversational intelligence layer rather than a reporting layer.

The Metric That Does Not Yet Exist: AI-Influenced Revenue

The most commercially significant metric in the unified model — the one CFOs will eventually demand — is AI-influenced revenue: the proportion of closed deals, product purchases, or trial conversions where an AI engine citation was present at some point in the buyer journey. No current analytics stack can produce this number directly. The referrer gap, the dark traffic problem, and the session-level (not person-level) nature of most web analytics make it structurally unobservable without additional instrumentation.

The practical approximation is a three-step model. First, define the probe query set for each lifecycle stage and run it weekly across all target AI engines — this produces the numerator: citation events with timestamps. Second, identify the cohort of web sessions classified as direct in the 72 hours following a citation probe that returned a positive result for a given asset. Third, apply the base direct-to-AI conversion rate (an industry estimate, or derived from any observable window where AI referrer headers are present) to scale the estimate. This is directional, not precise — but directional is sufficient to shift budget allocation and justify content investment in a way that "we cannot measure AI" is not.

Frequently Asked Questions

Do I need all three data streams to get value from this model?

No. Start with two. The highest-value starting point for most teams is GEO plus Web — connecting AI citation rates to web session behaviour per content asset. Social analytics adds depth but requires more API integration effort. The dimension tables (dim_content_asset, dim_lifecycle_stage) are the critical infrastructure: once assets are tagged by lifecycle stage, any channel data joined to them becomes lifecycle-aware.

How often should GEO probes run?

Weekly is the minimum cadence for an actionable signal. AI engines that use live retrieval (Perplexity, ChatGPT Browse, Gemini with web access) can reflect content and schema changes within days. Monthly probes obscure the change trajectory and make it impossible to attribute citation rate shifts to specific content or schema changes. For brands in active competitive markets, bi-weekly probes for Consideration-stage queries are justified given the commercial stakes of competitive citation displacement.

What should our AI probe query set include?

Five to seven queries per lifecycle stage is a practical starting set — approximately 40–56 queries in total across all eight stages. Structure probe queries as realistic buyer questions: "what is [category]", "best [product] for [use case]", "[your brand] pricing", "how to set up [your product]", "[your brand] alternatives". Run identical queries across at least three engines (Perplexity, ChatGPT, Gemini) to detect engine-specific citation gaps. Rotate the query set quarterly to prevent optimisation against a fixed probe list rather than genuine buyer intent.

How do I handle AI engines that do not pass referrer headers?

Three partial solutions exist. UTM-tag all asset canonical URLs that appear in AI citations — Perplexity's cited sources are linked and can carry UTM parameters if you add them via hreflang annotations. For engines that strip referrer entirely, use the time-window correlation approach: flag direct sessions landing on AI-cited assets within 48 hours of a positive probe event as inferred_ai_referral = true. Finally, implement a lightweight first-party signal by adding a query parameter to links in your llms.txt and structured data URL fields, and monitor for that parameter in your web analytics.

What is the minimum viable tech stack to implement this?

Google Sheets (content asset registry) + GA4 (web sessions) + Brand Intelligence (GEO probes) + a social analytics export from your primary platform. This covers Level 1 and produces the first unified content score within a week. The entire setup requires no engineering resources — only the discipline to run weekly probes and update the sheet. The jump from Level 1 to Level 2 (data warehouse) is warranted when the asset registry grows beyond 50 content pieces or when weekly manual updates consume more than two hours.

Starting This Week

Unified analytics is not a project — it is a practice. The following four actions can be completed this week without engineering support:

Tag your top 20 content assets by lifecycle stage. Open your GA4 content report, identify the 20 highest-traffic pages, and assign each one a lifecycle stage from the eight-phase framework. This is the seed data for your dim_content_asset table.
Define your probe query set. Write five queries per lifecycle stage for your category. Use the buyer language you hear in sales calls and support tickets, not your internal product vocabulary. Forty queries is a complete starting set.
Run your first GEO probe sweep. Use Brand Intelligence or query each engine manually and record whether your brand is cited, at what position, and with what sentiment. Record results in a spreadsheet alongside your content asset list.
Identify your highest-priority gap. Which lifecycle stage has the lowest GEO citation rate and the highest web traffic? That gap — visible content that AI engines do not cite — is the highest-ROI content or schema fix available to you right now. A definition block, a FAQ section, or a structured table is typically all it takes to close it.

The organisations that will win on AI-mediated discovery are not those that invest the most — they are those that measure correctly, identify their gaps precisely, and fix the structural issues that prevent citation. The unified analytics model is the measurement layer that makes that precision possible.

Unified Analytics: Combining GEO, Web, and Social Data Into One Decision Layer