claude/agents/resources/content-shape-for-ai.md
bastien 4b01b487b5 feat(seo/geo): split into parallel seo + geo agents with shared resources
Refactor the monolithic seo-analyzer into two specialist agents
orchestrated in parallel by the /seo skill, plus a standalone /geo
skill for AI-only audits.

Changes
- agents/seo-analyzer.md: refocused on classical engines (Google, Bing,
  DuckDuckGo). Adds Core Web Vitals 2.0 (LCP/INP/CLS + VSI), CSP + full
  security headers, hreflang audit, video SEO (transcripts), accessibility
  as ranking signal, image/video sitemaps.
- agents/geo-analyzer.md: new agent for AI engines (ChatGPT, Claude,
  Perplexity, Gemini, Google AI Overviews, Copilot). Covers AI crawler
  policy, llms.txt/llms-full.txt, Schema.org for AI extraction (QAPage,
  Speakable, Person+Article, Organization graph), entity SEO (Wikidata,
  sameAs, Knowledge Panel), content shape (Definition Lead, TL;DR,
  Q->A, citable stats, freshness), AI visibility testing.
- agents/resources/: shared knowledge base referenced by both agents —
  ai-crawlers-2026.md (25+ bots, training vs retrieval categories,
  permissive/restrictive templates), llms-txt-template.md, geo-schemas.md
  (incl. deprecated list: ClaimReview, CourseInfo, etc. removed June 2025),
  entity-seo.md, content-shape-for-ai.md, ai-visibility-tools.md,
  automation-catalog.md.
- skills/seo/SKILL.md: becomes parallel dispatcher. Collects context
  once (depth + business), spawns both agents in a single message for
  concurrent execution, merges envelopes into unified SEO.md. Includes
  authoritative file-ownership matrix to prevent parallel-edit races.
- skills/geo/SKILL.md: new standalone wrapper for GEO-only audits.

Scoring
- Combined score: GLOBAL = 0.80 * SEO + 0.20 * GEO (local B2C),
  0.75 * SEO + 0.25 * GEO (SaaS/national/content).
- GEO axis weight raised from 5% (old) to first-class dimension.

Policy
- AI crawlers: permissive default (maximise AI citations). Restrictive
  template available for premium/regulated content.
- Every user action in SEO.md section 11 must cite automation options
  from automation-catalog.md.

Tools
- WebFetch + WebSearch added to allowed-tools of both skills and
  both agents (needed for live CWV via PageSpeed API, AI visibility
  testing, Wikidata/Knowledge Panel lookups, competitor analysis).

Research basis (2026 state of the art validated via WebSearch):
- Core Web Vitals 2.0 (VSI signal, Google core update March 2026)
- AI Overviews trigger on ~48% of Google searches
- ClaimReview + 6 other schema types deprecated June 2025
- Definition Lead Architecture (CMU KDD 2024, +impression score)
- Citations + stats add up to 40% AI visibility (Aggarwal 2024)
- Wikidata grounds every major LLM (ChatGPT, Claude, Gemini, Perplexity)

Backup
- agents/seo-analyzer.md.bak kept for rollback reference.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 16:16:30 +02:00

251 lines
8.0 KiB
Markdown

# Content shape for LLM extraction
How to write pages so AI engines quote, cite, and recommend them.
Based on peer-reviewed GEO research (CMU KDD 2024, Aggarwal et al.)
and tracked citation patterns across ChatGPT, Perplexity, Claude,
Gemini, Google AI Overviews (2025-2026).
## The six patterns that measurably increase AI citations
### 1. Definition Lead Architecture
Open the page (or first paragraph after each major heading) with:
> **[Entity] is a [category] that [differentiator].**
Research backing: CMU GEO framework (KDD 2024) — pages with explicit
definitional openings score significantly higher in LLM retrieval
impression scores.
**Good**: "Astro is a static site generator that ships zero JavaScript by default, producing HTML at build time that search engines and AI crawlers can index without running a browser."
**Bad**: "In today's fast-paced digital landscape, choosing the right framework can feel overwhelming. At Acme, we know how important it is to..."
### 2. TL;DR / Answer Box above the fold
Insert an explicit summary block at the top of long content. AI engines
preferentially quote from these blocks because the content is
pre-summarised.
```html
<aside class="tldr">
<strong>TL;DR</strong>
Next.js 15 removes the pages/ directory entirely in favour of App
Router. Migration requires rewriting route handlers, layouts, and
data fetching. Estimated effort: 2-5 days for a medium project.
</aside>
```
CSS: no class requirement, but mark it semantically (e.g. `aria-label="summary"`
or Speakable schema targeting this selector).
### 3. Question-then-direct-answer structure
Each H2/H3 heading phrased as a likely user query. First sentence
after the heading: a single-sentence direct answer. Supporting detail
follows.
**Pattern**:
```
## How much does a Qualibat RGE certification cost in France?
A Qualibat RGE certification costs between 500 and 1500 EUR for the
initial audit, plus an annual fee of 200-400 EUR. The cost varies by
trade category and company size.
[Detailed breakdown follows...]
```
Why it works: LLMs grade passages by answer-density relative to the
query. A one-sentence self-contained answer has the highest density.
### 4. Citations and statistics (strongest measured lever)
Adding peer-cited statistics with clear sources increases AI visibility
**by up to 40%** (Aggarwal et al., 2024 "GEO: Generative Engine
Optimization").
Pattern: embed specific numbers with attribution.
**Good**: "According to the ADEME 2024 energy report, French households spent an average of 2,137 EUR on heating in 2023 — a 12% increase from 2021."
**Bad**: "Heating costs have increased a lot recently."
Source attribution matters: link the citation to the original source
(`<a href>`), ideally with `rel="cite"`. AI engines use link graphs
to validate factual claims.
### 5. Structured lists and comparison tables
LLMs quote list items and table rows more readily than prose of the
same content. Convert what you can:
**Before** (prose):
"The best frameworks for public sites are Astro for static content,
Next.js for dynamic server-rendered apps, and Nuxt for Vue-based
projects."
**After** (list):
"Best frameworks for public sites by use case:
- **Astro** — static content (blog, docs, portfolio)
- **Next.js** — dynamic SSR with React
- **Nuxt** — dynamic SSR with Vue"
Comparison tables are even stronger. Structure:
| Framework | Rendering | Best for | JS by default |
|---|---|---|---|
| Astro | SSG + islands | Public content | 0 KB |
| Next.js | SSG + SSR | Hybrid apps | Large |
### 6. Freshness signals
Pages not updated at least quarterly are **3x more likely to lose AI
citations** (LLMRefs 2026 study).
What to maintain:
- Visible "Last updated: YYYY-MM-DD" at the top of content pages
- `dateModified` in Article/BlogPosting JSON-LD (ISO 8601)
- HTTP header `Last-Modified` in sync with content change
- Changelog on evergreen reference pages
Do NOT fake dates — AI engines and Google increasingly validate
freshness against actual content diffs.
## Anti-patterns — what to avoid
### Pronoun-heavy writing
LLMs resolve pronouns by context window, which costs them confidence.
Prefer explicit entity names.
**Bad**: "It was founded in 2015. Its founders wanted to solve a problem. They saw that..."
**Good**: "Acme Corp was founded in 2015. Acme's founders, Jane Doe and John Smith, wanted to solve..."
### Marketing fluff before facts
AI engines typically truncate retrieval windows. Fluff at the top
wastes the budget. Put factual claims FIRST.
**Bad** (first 200 chars wasted): "In today's fast-moving digital landscape, businesses are constantly looking for ways to stay competitive..."
**Good** (first 200 chars dense): "Our API processes 50M requests/day at p99 latency of 47ms across 8 regions, with a 99.99% SLA. Pricing starts at 99 EUR/month for the 10K requests tier."
### Claims without sources
Any numerical or comparative claim without a linked source degrades
trust. AI engines can detect the pattern "number without citation" and
weight those passages lower.
### Cookie-cutter content across pages (especially city pages)
The 30/70 rule: when creating per-city or per-service variants,
at most 30% of the content should be templated. 70% must be
unique per page (local landmarks, specific testimonials, unique
stats, real photos).
Generic city pages get filtered out as "doorway pages" by both
classical search and AI engines.
## Page templates by type
### Service page (local business)
```
<h1>[Service] in [City] — [Business Name]</h1>
<div class="tldr">
<strong>En résumé :</strong> [Business] offers [service] in [city + surrounding].
[Key differentiator — price, response time, certifications]. Open [hours].
Call [phone] or request a quote online.
</div>
<h2>What is [service]?</h2>
<p>[Service] is a [category] that [differentiator]. In [city], demand
is driven by [local factor — housing stock, climate, regulations].</p>
<h2>How much does [service] cost in [city]?</h2>
<p>[Specific price range] for a typical [job type], based on [n]
projects completed in [year]. Factors affecting cost: [list].</p>
<h2>Why choose [Business] for [service]?</h2>
<ul>
<li>[Certification 1] — [what it means]</li>
<li>[Certification 2]</li>
<li>[N+ years] experience on [specific housing stock]</li>
</ul>
<h2>FAQ</h2>
[QAPage or FAQPage schema + visible Q&A]
```
### Blog post / guide
```
<h1>[Clear, question-style or noun-phrase headline]</h1>
<p class="byline">By [Author Name] — Updated [Date]</p>
<div class="tldr">
[3-5 sentence summary. Include the key number, the key conclusion,
and any nuance.]
</div>
<h2>[Question 1]</h2>
<p>[One-sentence answer.] [Supporting detail with cited statistics.]</p>
<h2>[Question 2]</h2>
...
<h2>Sources</h2>
<ul>
<li><a href="...">Source 1 — author, year</a></li>
<li><a href="...">Source 2 — author, year</a></li>
</ul>
```
### Homepage / landing
```
<h1>[Entity] is a [category] that [differentiator].</h1>
<!-- The H1 IS the Definition Lead. Yes, really. -->
<p class="hero-subtitle">
[Elaboration on the H1. Include one concrete stat or proof point.]
</p>
[Primary CTA]
<section>
<h2>What [Entity] does</h2>
<p>[Functional description, one paragraph.]</p>
</section>
<section>
<h2>Who uses [Entity]</h2>
<ul><li>[Use case 1]</li><li>[Use case 2]</li>...</ul>
</section>
<section>
<h2>How it works</h2>
<!-- HowTo schema + visible steps -->
</section>
<section>
<h2>Frequently asked</h2>
<!-- FAQPage schema + visible Q&A -->
</section>
```
## Self-audit — is this page AI-friendly?
- [ ] First sentence: `[Entity] is a [category] that [differentiator]` ?
- [ ] TL;DR or summary block above the fold ?
- [ ] Every H2/H3 phrased as a likely user question ?
- [ ] First sentence under each heading: direct answer ?
- [ ] At least 2-3 specific numerical claims with linked sources ?
- [ ] Visible "Last updated" date + matching `dateModified` in JSON-LD ?
- [ ] Lists or tables instead of dense prose where possible ?
- [ ] Entity names used explicitly, not pronouns ?
- [ ] If it's a city/service variant: ≥70% unique content ?