Refactor the monolithic seo-analyzer into two specialist agents orchestrated in parallel by the /seo skill, plus a standalone /geo skill for AI-only audits. Changes - agents/seo-analyzer.md: refocused on classical engines (Google, Bing, DuckDuckGo). Adds Core Web Vitals 2.0 (LCP/INP/CLS + VSI), CSP + full security headers, hreflang audit, video SEO (transcripts), accessibility as ranking signal, image/video sitemaps. - agents/geo-analyzer.md: new agent for AI engines (ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews, Copilot). Covers AI crawler policy, llms.txt/llms-full.txt, Schema.org for AI extraction (QAPage, Speakable, Person+Article, Organization graph), entity SEO (Wikidata, sameAs, Knowledge Panel), content shape (Definition Lead, TL;DR, Q->A, citable stats, freshness), AI visibility testing. - agents/resources/: shared knowledge base referenced by both agents — ai-crawlers-2026.md (25+ bots, training vs retrieval categories, permissive/restrictive templates), llms-txt-template.md, geo-schemas.md (incl. deprecated list: ClaimReview, CourseInfo, etc. removed June 2025), entity-seo.md, content-shape-for-ai.md, ai-visibility-tools.md, automation-catalog.md. - skills/seo/SKILL.md: becomes parallel dispatcher. Collects context once (depth + business), spawns both agents in a single message for concurrent execution, merges envelopes into unified SEO.md. Includes authoritative file-ownership matrix to prevent parallel-edit races. - skills/geo/SKILL.md: new standalone wrapper for GEO-only audits. Scoring - Combined score: GLOBAL = 0.80 * SEO + 0.20 * GEO (local B2C), 0.75 * SEO + 0.25 * GEO (SaaS/national/content). - GEO axis weight raised from 5% (old) to first-class dimension. Policy - AI crawlers: permissive default (maximise AI citations). Restrictive template available for premium/regulated content. - Every user action in SEO.md section 11 must cite automation options from automation-catalog.md. Tools - WebFetch + WebSearch added to allowed-tools of both skills and both agents (needed for live CWV via PageSpeed API, AI visibility testing, Wikidata/Knowledge Panel lookups, competitor analysis). Research basis (2026 state of the art validated via WebSearch): - Core Web Vitals 2.0 (VSI signal, Google core update March 2026) - AI Overviews trigger on ~48% of Google searches - ClaimReview + 6 other schema types deprecated June 2025 - Definition Lead Architecture (CMU KDD 2024, +impression score) - Citations + stats add up to 40% AI visibility (Aggarwal 2024) - Wikidata grounds every major LLM (ChatGPT, Claude, Gemini, Perplexity) Backup - agents/seo-analyzer.md.bak kept for rollback reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
251 lines
8.0 KiB
Markdown
251 lines
8.0 KiB
Markdown
# Content shape for LLM extraction
|
|
|
|
How to write pages so AI engines quote, cite, and recommend them.
|
|
Based on peer-reviewed GEO research (CMU KDD 2024, Aggarwal et al.)
|
|
and tracked citation patterns across ChatGPT, Perplexity, Claude,
|
|
Gemini, Google AI Overviews (2025-2026).
|
|
|
|
## The six patterns that measurably increase AI citations
|
|
|
|
### 1. Definition Lead Architecture
|
|
|
|
Open the page (or first paragraph after each major heading) with:
|
|
|
|
> **[Entity] is a [category] that [differentiator].**
|
|
|
|
Research backing: CMU GEO framework (KDD 2024) — pages with explicit
|
|
definitional openings score significantly higher in LLM retrieval
|
|
impression scores.
|
|
|
|
**Good**: "Astro is a static site generator that ships zero JavaScript by default, producing HTML at build time that search engines and AI crawlers can index without running a browser."
|
|
|
|
**Bad**: "In today's fast-paced digital landscape, choosing the right framework can feel overwhelming. At Acme, we know how important it is to..."
|
|
|
|
### 2. TL;DR / Answer Box above the fold
|
|
|
|
Insert an explicit summary block at the top of long content. AI engines
|
|
preferentially quote from these blocks because the content is
|
|
pre-summarised.
|
|
|
|
```html
|
|
<aside class="tldr">
|
|
<strong>TL;DR</strong> —
|
|
Next.js 15 removes the pages/ directory entirely in favour of App
|
|
Router. Migration requires rewriting route handlers, layouts, and
|
|
data fetching. Estimated effort: 2-5 days for a medium project.
|
|
</aside>
|
|
```
|
|
|
|
CSS: no class requirement, but mark it semantically (e.g. `aria-label="summary"`
|
|
or Speakable schema targeting this selector).
|
|
|
|
### 3. Question-then-direct-answer structure
|
|
|
|
Each H2/H3 heading phrased as a likely user query. First sentence
|
|
after the heading: a single-sentence direct answer. Supporting detail
|
|
follows.
|
|
|
|
**Pattern**:
|
|
```
|
|
## How much does a Qualibat RGE certification cost in France?
|
|
|
|
A Qualibat RGE certification costs between 500 and 1500 EUR for the
|
|
initial audit, plus an annual fee of 200-400 EUR. The cost varies by
|
|
trade category and company size.
|
|
|
|
[Detailed breakdown follows...]
|
|
```
|
|
|
|
Why it works: LLMs grade passages by answer-density relative to the
|
|
query. A one-sentence self-contained answer has the highest density.
|
|
|
|
### 4. Citations and statistics (strongest measured lever)
|
|
|
|
Adding peer-cited statistics with clear sources increases AI visibility
|
|
**by up to 40%** (Aggarwal et al., 2024 "GEO: Generative Engine
|
|
Optimization").
|
|
|
|
Pattern: embed specific numbers with attribution.
|
|
|
|
**Good**: "According to the ADEME 2024 energy report, French households spent an average of 2,137 EUR on heating in 2023 — a 12% increase from 2021."
|
|
|
|
**Bad**: "Heating costs have increased a lot recently."
|
|
|
|
Source attribution matters: link the citation to the original source
|
|
(`<a href>`), ideally with `rel="cite"`. AI engines use link graphs
|
|
to validate factual claims.
|
|
|
|
### 5. Structured lists and comparison tables
|
|
|
|
LLMs quote list items and table rows more readily than prose of the
|
|
same content. Convert what you can:
|
|
|
|
**Before** (prose):
|
|
"The best frameworks for public sites are Astro for static content,
|
|
Next.js for dynamic server-rendered apps, and Nuxt for Vue-based
|
|
projects."
|
|
|
|
**After** (list):
|
|
"Best frameworks for public sites by use case:
|
|
- **Astro** — static content (blog, docs, portfolio)
|
|
- **Next.js** — dynamic SSR with React
|
|
- **Nuxt** — dynamic SSR with Vue"
|
|
|
|
Comparison tables are even stronger. Structure:
|
|
|
|
| Framework | Rendering | Best for | JS by default |
|
|
|---|---|---|---|
|
|
| Astro | SSG + islands | Public content | 0 KB |
|
|
| Next.js | SSG + SSR | Hybrid apps | Large |
|
|
|
|
### 6. Freshness signals
|
|
|
|
Pages not updated at least quarterly are **3x more likely to lose AI
|
|
citations** (LLMRefs 2026 study).
|
|
|
|
What to maintain:
|
|
- Visible "Last updated: YYYY-MM-DD" at the top of content pages
|
|
- `dateModified` in Article/BlogPosting JSON-LD (ISO 8601)
|
|
- HTTP header `Last-Modified` in sync with content change
|
|
- Changelog on evergreen reference pages
|
|
|
|
Do NOT fake dates — AI engines and Google increasingly validate
|
|
freshness against actual content diffs.
|
|
|
|
## Anti-patterns — what to avoid
|
|
|
|
### Pronoun-heavy writing
|
|
|
|
LLMs resolve pronouns by context window, which costs them confidence.
|
|
Prefer explicit entity names.
|
|
|
|
**Bad**: "It was founded in 2015. Its founders wanted to solve a problem. They saw that..."
|
|
|
|
**Good**: "Acme Corp was founded in 2015. Acme's founders, Jane Doe and John Smith, wanted to solve..."
|
|
|
|
### Marketing fluff before facts
|
|
|
|
AI engines typically truncate retrieval windows. Fluff at the top
|
|
wastes the budget. Put factual claims FIRST.
|
|
|
|
**Bad** (first 200 chars wasted): "In today's fast-moving digital landscape, businesses are constantly looking for ways to stay competitive..."
|
|
|
|
**Good** (first 200 chars dense): "Our API processes 50M requests/day at p99 latency of 47ms across 8 regions, with a 99.99% SLA. Pricing starts at 99 EUR/month for the 10K requests tier."
|
|
|
|
### Claims without sources
|
|
|
|
Any numerical or comparative claim without a linked source degrades
|
|
trust. AI engines can detect the pattern "number without citation" and
|
|
weight those passages lower.
|
|
|
|
### Cookie-cutter content across pages (especially city pages)
|
|
|
|
The 30/70 rule: when creating per-city or per-service variants,
|
|
at most 30% of the content should be templated. 70% must be
|
|
unique per page (local landmarks, specific testimonials, unique
|
|
stats, real photos).
|
|
|
|
Generic city pages get filtered out as "doorway pages" by both
|
|
classical search and AI engines.
|
|
|
|
## Page templates by type
|
|
|
|
### Service page (local business)
|
|
|
|
```
|
|
<h1>[Service] in [City] — [Business Name]</h1>
|
|
|
|
<div class="tldr">
|
|
<strong>En résumé :</strong> [Business] offers [service] in [city + surrounding].
|
|
[Key differentiator — price, response time, certifications]. Open [hours].
|
|
Call [phone] or request a quote online.
|
|
</div>
|
|
|
|
<h2>What is [service]?</h2>
|
|
<p>[Service] is a [category] that [differentiator]. In [city], demand
|
|
is driven by [local factor — housing stock, climate, regulations].</p>
|
|
|
|
<h2>How much does [service] cost in [city]?</h2>
|
|
<p>[Specific price range] for a typical [job type], based on [n]
|
|
projects completed in [year]. Factors affecting cost: [list].</p>
|
|
|
|
<h2>Why choose [Business] for [service]?</h2>
|
|
<ul>
|
|
<li>[Certification 1] — [what it means]</li>
|
|
<li>[Certification 2]</li>
|
|
<li>[N+ years] experience on [specific housing stock]</li>
|
|
</ul>
|
|
|
|
<h2>FAQ</h2>
|
|
[QAPage or FAQPage schema + visible Q&A]
|
|
```
|
|
|
|
### Blog post / guide
|
|
|
|
```
|
|
<h1>[Clear, question-style or noun-phrase headline]</h1>
|
|
<p class="byline">By [Author Name] — Updated [Date]</p>
|
|
|
|
<div class="tldr">
|
|
[3-5 sentence summary. Include the key number, the key conclusion,
|
|
and any nuance.]
|
|
</div>
|
|
|
|
<h2>[Question 1]</h2>
|
|
<p>[One-sentence answer.] [Supporting detail with cited statistics.]</p>
|
|
|
|
<h2>[Question 2]</h2>
|
|
...
|
|
|
|
<h2>Sources</h2>
|
|
<ul>
|
|
<li><a href="...">Source 1 — author, year</a></li>
|
|
<li><a href="...">Source 2 — author, year</a></li>
|
|
</ul>
|
|
```
|
|
|
|
### Homepage / landing
|
|
|
|
```
|
|
<h1>[Entity] is a [category] that [differentiator].</h1>
|
|
<!-- The H1 IS the Definition Lead. Yes, really. -->
|
|
|
|
<p class="hero-subtitle">
|
|
[Elaboration on the H1. Include one concrete stat or proof point.]
|
|
</p>
|
|
|
|
[Primary CTA]
|
|
|
|
<section>
|
|
<h2>What [Entity] does</h2>
|
|
<p>[Functional description, one paragraph.]</p>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Who uses [Entity]</h2>
|
|
<ul><li>[Use case 1]</li><li>[Use case 2]</li>...</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>How it works</h2>
|
|
<!-- HowTo schema + visible steps -->
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Frequently asked</h2>
|
|
<!-- FAQPage schema + visible Q&A -->
|
|
</section>
|
|
```
|
|
|
|
## Self-audit — is this page AI-friendly?
|
|
|
|
- [ ] First sentence: `[Entity] is a [category] that [differentiator]` ?
|
|
- [ ] TL;DR or summary block above the fold ?
|
|
- [ ] Every H2/H3 phrased as a likely user question ?
|
|
- [ ] First sentence under each heading: direct answer ?
|
|
- [ ] At least 2-3 specific numerical claims with linked sources ?
|
|
- [ ] Visible "Last updated" date + matching `dateModified` in JSON-LD ?
|
|
- [ ] Lists or tables instead of dense prose where possible ?
|
|
- [ ] Entity names used explicitly, not pronouns ?
|
|
- [ ] If it's a city/service variant: ≥70% unique content ?
|