bastien 95347d2e47 feat(seo/geo): split into parallel seo + geo agents with shared resources

Refactor the monolithic seo-analyzer into two specialist agents
orchestrated in parallel by the /seo skill, plus a standalone /geo
skill for AI-only audits.

Changes
- agents/seo-analyzer.md: refocused on classical engines (Google, Bing,
  DuckDuckGo). Adds Core Web Vitals 2.0 (LCP/INP/CLS + VSI), CSP + full
  security headers, hreflang audit, video SEO (transcripts), accessibility
  as ranking signal, image/video sitemaps.
- agents/geo-analyzer.md: new agent for AI engines (ChatGPT, Claude,
  Perplexity, Gemini, Google AI Overviews, Copilot). Covers AI crawler
  policy, llms.txt/llms-full.txt, Schema.org for AI extraction (QAPage,
  Speakable, Person+Article, Organization graph), entity SEO (Wikidata,
  sameAs, Knowledge Panel), content shape (Definition Lead, TL;DR,
  Q->A, citable stats, freshness), AI visibility testing.
- agents/resources/: shared knowledge base referenced by both agents —
  ai-crawlers-2026.md (25+ bots, training vs retrieval categories,
  permissive/restrictive templates), llms-txt-template.md, geo-schemas.md
  (incl. deprecated list: ClaimReview, CourseInfo, etc. removed June 2025),
  entity-seo.md, content-shape-for-ai.md, ai-visibility-tools.md,
  automation-catalog.md.
- skills/seo/SKILL.md: becomes parallel dispatcher. Collects context
  once (depth + business), spawns both agents in a single message for
  concurrent execution, merges envelopes into unified SEO.md. Includes
  authoritative file-ownership matrix to prevent parallel-edit races.
- skills/geo/SKILL.md: new standalone wrapper for GEO-only audits.

Scoring
- Combined score: GLOBAL = 0.80 * SEO + 0.20 * GEO (local B2C),
  0.75 * SEO + 0.25 * GEO (SaaS/national/content).
- GEO axis weight raised from 5% (old) to first-class dimension.

Policy
- AI crawlers: permissive default (maximise AI citations). Restrictive
  template available for premium/regulated content.
- Every user action in SEO.md section 11 must cite automation options
  from automation-catalog.md.

Tools
- WebFetch + WebSearch added to allowed-tools of both skills and
  both agents (needed for live CWV via PageSpeed API, AI visibility
  testing, Wikidata/Knowledge Panel lookups, competitor analysis).

Research basis (2026 state of the art validated via WebSearch):
- Core Web Vitals 2.0 (VSI signal, Google core update March 2026)
- AI Overviews trigger on ~48% of Google searches
- ClaimReview + 6 other schema types deprecated June 2025
- Definition Lead Architecture (CMU KDD 2024, +impression score)
- Citations + stats add up to 40% AI visibility (Aggarwal 2024)
- Wikidata grounds every major LLM (ChatGPT, Claude, Gemini, Perplexity)

Backup
- agents/seo-analyzer.md.bak kept for rollback reference.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-21 16:16:30 +02:00

5.0 KiB

Raw Blame History

llms.txt / llms-full.txt — template and strategy

Status as of 2026-04

Honest assessment: llms.txt is a proposed standard by Jeremy Howard (Answer.AI, Sept 2024). No major AI crawler has publicly confirmed they extract content via /llms.txt. A Search Engine Land study (2025) found 8 of 9 sites saw no measurable traffic change after adoption.

Why include it anyway:

Low cost (small static file).
Real value for developer-facing sites — AI coding assistants (Cursor, Continue, Claude Code, GitHub Copilot Chat) DO read it for doc retrieval.
Signals intent to AI ecosystem. Early mover advantage if adoption grows.
Reduces RAG token consumption when third parties ingest your content.

Do not promise ranking gains. Frame as "no-regret hedge", not "quick win".

Where it goes

/llms.txt — root of domain. Index of your content in markdown.
/llms-full.txt — root of domain. Full text of your most important pages concatenated. Optional but recommended for docs/blog/knowledge base.

Both MUST be reachable over HTTPS, content-type text/plain or text/markdown, and NOT blocked in robots.txt.

Canonical structure

# <Site or Project Name>

> <One-sentence elevator pitch. This is the single line AI systems extract
> as your site summary. Be concrete. Include entity + category + differentiator.>

<Optional free-form paragraph providing more context. Keep under 400 chars.>

## Docs

- [Getting started](https://example.com/docs/getting-started): What it does, how to install.
- [API reference](https://example.com/docs/api): All endpoints with examples.
- [Tutorials](https://example.com/docs/tutorials): Step-by-step walkthroughs.

## Examples

- [Quickstart example](https://example.com/examples/quickstart.md): Minimal working demo.

## Optional

- [Changelog](https://example.com/changelog.md): Version history.
- [Blog](https://example.com/blog/index.md): In-depth articles.

Structure rules (Jeremy Howard spec)

First line: # <Name> (H1 with project/site name).
Second non-comment line: > summary (blockquote, one sentence).
Optional paragraphs of free-form context after the blockquote.
H2 sections grouping links: ## Docs, ## Examples, ## Optional, etc.
Each link: [Title](URL): description. — description under 120 chars.
Any link pointing to a .md version of the page is preferred.
Total file: target under 8 KB. If larger, split into llms-full.txt.

llms-full.txt

Concatenation of the full text (stripped of nav/footer/ads) of your most important pages. Separator between pages:

---
URL: https://example.com/docs/getting-started
Title: Getting Started
---
<full markdown content of that page>

---
URL: https://example.com/docs/api
Title: API Reference
---
<full markdown content of that page>

Target under 500 KB. If your corpus is larger, trim to highest-value pages (most-linked, most-traffic, most-updated).

Generation patterns

Static sites (Astro, Hugo, Jekyll, 11ty, Next.js SSG)

Best practice: generate both files at build time from the same source as your regular pages. Examples:

Astro: add a src/pages/llms.txt.ts endpoint:

import { getCollection } from 'astro:content';

export async function GET() {
  const docs = await getCollection('docs');
  const body = [
    '# My Project',
    '',
    '> One-sentence pitch.',
    '',
    '## Docs',
    ...docs.map(d => `- [${d.data.title}](https://example.com/docs/${d.slug}): ${d.data.description}`),
  ].join('\n');
  return new Response(body, { headers: { 'Content-Type': 'text/plain' } });
}

Next.js App Router: app/llms.txt/route.ts:

export async function GET() {
  // similar — pull from your CMS/MDX/db
  return new Response(body, { headers: { 'Content-Type': 'text/plain' } });
}

Hugo: custom output format llms → llms.txt template in layouts.

CMS (WordPress, Drupal, Ghost)

Use a plugin OR a cron job that regenerates files weekly. Flag stale files (older than site content) in audits.

Static HTML / PHP

Hand-maintained file. Flag in audits if older than 90 days.

Automation tools (for SEO.md §11 "automatisation possible")

llms-txt-action (GitHub Action) — generates on each deploy
Mintlify — auto-generates for Mintlify-hosted docs
Fern — auto-generates for Fern-generated API docs
llmstxt-hub — community directory of examples
Custom script + cron — works for any static content source

What NOT to put in llms.txt

Login walls / private content
Pricing tables (change frequently → stale risk)
Testimonials (authenticity risk if AI quotes them)
Marketing fluff without factual anchors

Validation checklist

File reachable at /llms.txt over HTTPS
Content-type text/plain or text/markdown
H1 + blockquote present as first two non-comment lines
All linked URLs resolve (200)
No broken markdown (valid CommonMark)
Mentioned in /sitemap.xml? Optional, debated
NOT blocked in /robots.txt

5.0 KiB Raw Blame History