claude/agents/resources/llms-txt-template.md
bastien 95347d2e47 feat(seo/geo): split into parallel seo + geo agents with shared resources
Refactor the monolithic seo-analyzer into two specialist agents
orchestrated in parallel by the /seo skill, plus a standalone /geo
skill for AI-only audits.

Changes
- agents/seo-analyzer.md: refocused on classical engines (Google, Bing,
  DuckDuckGo). Adds Core Web Vitals 2.0 (LCP/INP/CLS + VSI), CSP + full
  security headers, hreflang audit, video SEO (transcripts), accessibility
  as ranking signal, image/video sitemaps.
- agents/geo-analyzer.md: new agent for AI engines (ChatGPT, Claude,
  Perplexity, Gemini, Google AI Overviews, Copilot). Covers AI crawler
  policy, llms.txt/llms-full.txt, Schema.org for AI extraction (QAPage,
  Speakable, Person+Article, Organization graph), entity SEO (Wikidata,
  sameAs, Knowledge Panel), content shape (Definition Lead, TL;DR,
  Q->A, citable stats, freshness), AI visibility testing.
- agents/resources/: shared knowledge base referenced by both agents —
  ai-crawlers-2026.md (25+ bots, training vs retrieval categories,
  permissive/restrictive templates), llms-txt-template.md, geo-schemas.md
  (incl. deprecated list: ClaimReview, CourseInfo, etc. removed June 2025),
  entity-seo.md, content-shape-for-ai.md, ai-visibility-tools.md,
  automation-catalog.md.
- skills/seo/SKILL.md: becomes parallel dispatcher. Collects context
  once (depth + business), spawns both agents in a single message for
  concurrent execution, merges envelopes into unified SEO.md. Includes
  authoritative file-ownership matrix to prevent parallel-edit races.
- skills/geo/SKILL.md: new standalone wrapper for GEO-only audits.

Scoring
- Combined score: GLOBAL = 0.80 * SEO + 0.20 * GEO (local B2C),
  0.75 * SEO + 0.25 * GEO (SaaS/national/content).
- GEO axis weight raised from 5% (old) to first-class dimension.

Policy
- AI crawlers: permissive default (maximise AI citations). Restrictive
  template available for premium/regulated content.
- Every user action in SEO.md section 11 must cite automation options
  from automation-catalog.md.

Tools
- WebFetch + WebSearch added to allowed-tools of both skills and
  both agents (needed for live CWV via PageSpeed API, AI visibility
  testing, Wikidata/Knowledge Panel lookups, competitor analysis).

Research basis (2026 state of the art validated via WebSearch):
- Core Web Vitals 2.0 (VSI signal, Google core update March 2026)
- AI Overviews trigger on ~48% of Google searches
- ClaimReview + 6 other schema types deprecated June 2025
- Definition Lead Architecture (CMU KDD 2024, +impression score)
- Citations + stats add up to 40% AI visibility (Aggarwal 2024)
- Wikidata grounds every major LLM (ChatGPT, Claude, Gemini, Perplexity)

Backup
- agents/seo-analyzer.md.bak kept for rollback reference.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 16:16:30 +02:00

5.0 KiB

llms.txt / llms-full.txt — template and strategy

Status as of 2026-04

Honest assessment: llms.txt is a proposed standard by Jeremy Howard (Answer.AI, Sept 2024). No major AI crawler has publicly confirmed they extract content via /llms.txt. A Search Engine Land study (2025) found 8 of 9 sites saw no measurable traffic change after adoption.

Why include it anyway:

  • Low cost (small static file).
  • Real value for developer-facing sites — AI coding assistants (Cursor, Continue, Claude Code, GitHub Copilot Chat) DO read it for doc retrieval.
  • Signals intent to AI ecosystem. Early mover advantage if adoption grows.
  • Reduces RAG token consumption when third parties ingest your content.

Do not promise ranking gains. Frame as "no-regret hedge", not "quick win".

Where it goes

  • /llms.txt — root of domain. Index of your content in markdown.
  • /llms-full.txt — root of domain. Full text of your most important pages concatenated. Optional but recommended for docs/blog/knowledge base.

Both MUST be reachable over HTTPS, content-type text/plain or text/markdown, and NOT blocked in robots.txt.

Canonical structure

# <Site or Project Name>

> <One-sentence elevator pitch. This is the single line AI systems extract
> as your site summary. Be concrete. Include entity + category + differentiator.>

<Optional free-form paragraph providing more context. Keep under 400 chars.>

## Docs

- [Getting started](https://example.com/docs/getting-started): What it does, how to install.
- [API reference](https://example.com/docs/api): All endpoints with examples.
- [Tutorials](https://example.com/docs/tutorials): Step-by-step walkthroughs.

## Examples

- [Quickstart example](https://example.com/examples/quickstart.md): Minimal working demo.

## Optional

- [Changelog](https://example.com/changelog.md): Version history.
- [Blog](https://example.com/blog/index.md): In-depth articles.

Structure rules (Jeremy Howard spec)

  1. First line: # <Name> (H1 with project/site name).
  2. Second non-comment line: > summary (blockquote, one sentence).
  3. Optional paragraphs of free-form context after the blockquote.
  4. H2 sections grouping links: ## Docs, ## Examples, ## Optional, etc.
  5. Each link: [Title](URL): description. — description under 120 chars.
  6. Any link pointing to a .md version of the page is preferred.
  7. Total file: target under 8 KB. If larger, split into llms-full.txt.

llms-full.txt

Concatenation of the full text (stripped of nav/footer/ads) of your most important pages. Separator between pages:

---
URL: https://example.com/docs/getting-started
Title: Getting Started
---
<full markdown content of that page>

---
URL: https://example.com/docs/api
Title: API Reference
---
<full markdown content of that page>

Target under 500 KB. If your corpus is larger, trim to highest-value pages (most-linked, most-traffic, most-updated).

Generation patterns

Static sites (Astro, Hugo, Jekyll, 11ty, Next.js SSG)

Best practice: generate both files at build time from the same source as your regular pages. Examples:

Astro: add a src/pages/llms.txt.ts endpoint:

import { getCollection } from 'astro:content';

export async function GET() {
  const docs = await getCollection('docs');
  const body = [
    '# My Project',
    '',
    '> One-sentence pitch.',
    '',
    '## Docs',
    ...docs.map(d => `- [${d.data.title}](https://example.com/docs/${d.slug}): ${d.data.description}`),
  ].join('\n');
  return new Response(body, { headers: { 'Content-Type': 'text/plain' } });
}

Next.js App Router: app/llms.txt/route.ts:

export async function GET() {
  // similar — pull from your CMS/MDX/db
  return new Response(body, { headers: { 'Content-Type': 'text/plain' } });
}

Hugo: custom output format llmsllms.txt template in layouts.

CMS (WordPress, Drupal, Ghost)

Use a plugin OR a cron job that regenerates files weekly. Flag stale files (older than site content) in audits.

Static HTML / PHP

Hand-maintained file. Flag in audits if older than 90 days.

Automation tools (for SEO.md §11 "automatisation possible")

  • llms-txt-action (GitHub Action) — generates on each deploy
  • Mintlify — auto-generates for Mintlify-hosted docs
  • Fern — auto-generates for Fern-generated API docs
  • llmstxt-hub — community directory of examples
  • Custom script + cron — works for any static content source

What NOT to put in llms.txt

  • Login walls / private content
  • Pricing tables (change frequently → stale risk)
  • Testimonials (authenticity risk if AI quotes them)
  • Marketing fluff without factual anchors

Validation checklist

  • File reachable at /llms.txt over HTTPS
  • Content-type text/plain or text/markdown
  • H1 + blockquote present as first two non-comment lines
  • All linked URLs resolve (200)
  • No broken markdown (valid CommonMark)
  • Mentioned in /sitemap.xml? Optional, debated
  • NOT blocked in /robots.txt