chore(memory): EVAL-006 + LRN-046/047/048 + journal — prune-memory v1.1 TDD

Capitalizes the prune-memory v1.1 TDD (skill 0a3e766): - EVAL-006: 6 dangers TDD-closed, validated on the real learnings.md (0 fidelity false-positive vs 13); honest limits (SAFE != USEFUL, RED-7/8 open). - LRN-046 deterministic oracle > semantic judge on destructive skills. - LRN-047 a noisy safety guard (13/13 FP) is a risk, not discomfort. - LRN-048 a "0/OK/pass" must prove it looked (counted both sides). - journal: one line under 2026-06-25. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
fix(memory): backfill missing EVAL-005 index row
2026-06-25 23:17:10 +02:00 · 2026-06-25 23:02:51 +02:00 · 2026-06-25 22:56:10 +02:00 · 2026-06-25 11:38:05 +02:00 · 2026-06-25 11:27:18 +02:00 · 2026-06-25 11:27:18 +02:00
30 changed files with 476 additions and 58 deletions
--- a/.claude/memory/decisions.md
+++ b/.claude/memory/decisions.md
@ -510,3 +510,19 @@ rules:
 - **Caveats**: future GLOBAL memory must stay tiny — with conditional loading broken, anything global loads in EVERY project unconditionally; fold that constraint into the global-memory design once a backup exists. Caveman pass to ~250 was explicitly DECLINED: marginal ~25-line gain vs real risk (changes the nature of instructions-to-follow; no evidence caveman is followed better than prose; CLAUDE.md is the most-edited file → caveman = painful to re-read/amend). 275 readable > 250 caveman.
 - **Alternatives rejected**: path-scoped `~/.claude/rules/` (broken, [[BLK-009]]); externalize sections to on-demand-loaded files (same conditional-load dependency); caveman to ~250 (readability + instruction-fidelity risk).
 - **Reference**: `~/.claude/CLAUDE.md` (symlink → `~/Documents/claude/CLAUDE.md`), commits ba743cf (compress routing+design+graphify) + 990318c (trim separators/blanks). Linked to [[BLK-009]], [[LRN-043]], [[LRN-044]].
+
+---
+
+## BDR-032 — skill `/validate` → `/web-validate` (rename user surface, keep internals)
+
+- **Date**: 2026-06-25
+- **Status**: accepted (shipped `e5e673a`)
+- **Decision**: rename W3C+WCAG skill `/validate` → `/web-validate` (clearer scoped name, less generic). Renamed the USER-FACING surface ONLY: folder (`git mv`), frontmatter `name`, H1, command refs, CLAUDE.md routing line, 6 `lib/profiles/*.profile` entries (FUNCTIONAL — profiles activate skills by folder name, a miss = broken activation), cross-refs (harden/seo/depth-matrix/client-handover), agent dispatch refs, README + USAGE tables. Leak-guard regex extended to `web-validate|validate` ([[LRN-045]]).
+- **Why — 4 deliberate KEEPs**:
+  - agent `validator-analyzer` name KEPT — internal, lockstep with `subagent_type=` + harness registry; rename = wider blast radius, zero discoverability gain.
+  - `.validate-cache/` + `VALIDATE.md` KEPT — names derive from the AUDIT TYPE, family `{SEO,GEO,HARDEN,CSO,VALIDATE}.md`; renaming makes VALIDATE the odd one out + orphans already-generated reports (`MIGRATION.md` cleanup loop hardcodes the name). Same logic kept the dispatch label `description="validate — ..."`.
+  - `.claude/` history KEPT (memory + completed TODO block) — append-only, true at the time. The forward-pointing OPEN TODO item was ANNOTATED additively (`désormais /web-validate`), not rewritten — append-only protects history, not pointers to future actions.
+  - CHANGELOG old entry KEPT, new "renamed" entry ADDED (Keep-a-Changelog: don't rewrite the past).
+  - NL trigger keywords ("validate"/"validation") KEPT in the description so "validate my site" still routes here.
+- **Alternatives rejected**: rename agent + artifacts too (cosmetic symmetry, ~45 extra edits, breaks audit-file family + report back-compat); blind `sed s/validate/web-validate/` (breaks third-party `html-validate`, `validator.nu`, English-verb prose — discrimination must be at the `/validate` token, proven by `.validate-cache/html-validate.json` staying intact).
+- **Reference**: commit `e5e673a` (18 files). Verified complete: `/validate` = 0 in active code (only `.claude/` history + CHANGELOG), `html-validate` = 15 intact, regex `client-handover-writer.md:1462` shows both names. Linked to [[LRN-045]], [[BDR-031]] (CLAUDE.md routing), [[LRN-043]] (validate routing line).
--- a/.claude/memory/evals.md
+++ b/.claude/memory/evals.md
@ -25,6 +25,8 @@ rules:
 | EVAL-002 | 2026-06-02 | `profile gstack on/off` verb implementation | keep |
 | EVAL-003 | 2026-06-11 | darwin optimization run on `audit-delta` skill | keep |
 | EVAL-004 | 2026-06-11 | darwin eval 26 perso skills + 4-bug fix round | keep |
+| EVAL-005 | 2026-06-23 | Obsolete `claude --effort max` alias missed across Step 9 edits | correct |
+| EVAL-006 | 2026-06-25 | prune-memory v1.1 TDD — 6 guards (0a3e766), validated on real data | keep |

 ---

@ -74,3 +76,13 @@ rules:
 - **Method / why missed**: I treated the pre-existing `CLAUDE_LINES` as established and only touched the lines I was adding/removing. Spotting the redundancy needs cross-referencing TWO config layers (shell alias vs settings.json) — a semantic check I never ran. Masked further: the user's `.bashrc` is hand-managed and the alias line wasn't even present, so it looked inert.
 - **Anomaly**: not just dead config — a CLI flag (`--effort max`) silently OVERRIDES the settings.json value (`xhigh`). Real correctness bug.
 - **Action**: when editing installer shell-config, audit EACH existing line against the current settings.json / CLAUDE.md source of truth, not only the lines being changed. Removed the alias + added cleanup. General rule: reconcile config to ONE source of truth across env/alias/settings layers.
+
+---
+
+## EVAL-006 — prune-memory v1.1: 6 dangers TDD-closed, validated on real data
+
+- **Date**: 2026-06-25
+- **Output**: prune-memory (only DESTRUCTIVE skill, never tested before) TDD'd end-to-end. 6 dangers, each proven by a failing RED then closed by a DETERMINISTIC guard: RED-1 false "verified" claim removed; RED-2 dirty tree → real `exit 1` (was prose-only STOP); RED-3 negation-sentence verbatim guard (no silent inversion); RED-4 collapse safety-critical exception (NEVER/ALWAYS/PERMANENT entry INTOUCHABLE); RED-5 STEP-4 fidelity census (count-based, per-entry × per-category); RED-6 trailing-space false-ORPHAN fix. Guards committed in **0a3e766**; tests: `tests/run-deterministic.sh` + `run-behavioral.md` + fixtures + BACKLOG.
+- **Method**: run-deterministic all-green (RED-1/2/5/6); RED-3/4 by single faithful subagent runs on throwaway fixtures (deterministic oracles — byte-identical + layer-(a) substring; N=6 tolerance-zero fleet documented in run-behavioral.md, NOT exhausted — 1 run sufficed to prove presence then closure); REAL-data re-test on live learnings.md (602 l): fidelity 0 false-positive vs old line-grep 13, census PROVEN counting both sides (HEAD/WORK not=107/114, no=64/71, never=34/34 — no category dropped). Scope held (only learnings.md), reverted after (measurement, not a kept prune). Clean tree verified first; git + cp backup.
+- **Anomalies**: (1) SAFE ≠ USEFUL — compression (pass C) marginal on already-caveman dense content (~3.6% trim, registry GREW); real value = index-drift (D found 19 missing Index rows on the real learnings.md, measured then reverted) + merge (B), not C on dense. (2) RED-8 OPEN — fidelity proves "no negation DELETED", not "none ADDED wrongly"; visible empirically (not/no +7 in the measurement, passed under the drop-radar); remote (compression subtracts) but real. (3) RED-7 OPEN — merge LRN-014+016 maybe PRIMED by the SKILL.md example, not real overlap; verify. (4) two guard bugs caught by VALIDATION not logic: awk `\<` unsupported (mawk) → not/no counted 0; `NR==FNR` blind when working census empty = the deletion case → both fixed + re-validated. (5) REAL ANCHOR FOR PASS D — this very evals.md had EVAL-005's Index row MISSING (pre-existing drift), exactly what the skill's D pass auto-corrects; hand-backfilled in a separate `fix(memory)` commit. learnings.md likewise carries 19 missing Index rows (deferred to an intentional prune). D is NOT theoretical.
+- **Action**: keep — safe, useful for B/D, compression marginal on dense (documented limit). `Fixed in v1.1 (TDD found it)` WAS the RED-1 defect (claim of a verify never run); TDD note now TRUE (real suite passes). Patterns → LRN-046/047/048; open items → tests/BACKLOG.md (RED-7/RED-8).
--- a/.claude/memory/journal.md
+++ b/.claude/memory/journal.md
@ -197,3 +197,5 @@ rules:
 - Compressed global CLAUDE.md 317→275 (−42, loaded every session): routing (cut name-obvious lines, keep non-derivable signal + dense catch-all; restored `validate`/`plan-eng-review`, feat/hotfix pointer), design (+ explicit FILE signal), graphify, then decorative `---` + blank rognage. Caveman→250 declined (readability + instruction-fidelity). Commits ba743cf, 990318c. BDR-031, LRN-043.
 - Edit/Write refuse write-through-symlink → resolve real path (`readlink -f`); `~/.claude/CLAUDE.md` → repo. LRN-044.
 - Inspected dirty gstack submodule (parent showed `m`): `package.json`+`bun.lock` = the Playwright 1.58.2→1.61 bump (BDR-029/BLK-008), NOT restore noise → left intact, NOT cleaned, NOT committed (submodule ref stays at clean 070722ace; local patch re-applied by installer by design).
+- Renamed skill `/validate` → `/web-validate` (user-surface only): git mv + name + H1 + CLAUDE.md routing + 6 profiles (functional) + cross-refs + agent dispatch + README/USAGE. KEPT: `validator-analyzer` name (lockstep), `.validate-cache`/`VALIDATE.md` (audit-file family), `.claude/` history (append-only), NL triggers. Critical catch: client-deliverable leak-guard regex (`client-handover-writer.md:1462`) matched `/validate` by exact token — `web-` prefix broke the anchored match → extended to `web-validate|validate` (covers legacy docs). Verified complete: `/validate` 0 in active code, `html-validate` 15 intact, regex shows both. Commits e5e673a + dbab542 (BDR-032/LRN-045) + a1cc753 (TODO L167 annotated additively). gstack submodule untouched.
+- TDD'd `/prune-memory` (only destructive skill, untested + carried a false `Fixed in v1.1 (TDD found it)` claim): 6 dangers (RED-1..6) closed by deterministic guards, skill `0a3e766`. Real-data run on learnings.md exposed SAFE≠USEFUL (compression marginal on dense; value = index/merge, not C) + a 13/13-false-positive line-grep fidelity guard → replaced by a per-entry count census (0 FP, proven counting both sides). RED-7 (example-priming) + RED-8 (added-negation) filed in BACKLOG. EVAL-006, LRN-046/047/048.
--- a/.claude/memory/learnings.md
+++ b/.claude/memory/learnings.md
@ -46,6 +46,9 @@ rules:
 | LRN-027 | 2026-06-11 | Agents improvise audit boundaries from file dates when no machine state — periodic skills need machine-readable state file, never inference | any recurring/periodic skill needing "since last run" semantics |
 | LRN-030 | 2026-06-18 | Opus 4.8 under-delegates subagents/memory/custom-tools by default — counter via explicit CLAUDE.md fan-out rule | any Opus 4.8 session; tuning delegation; inline-vs-subagent decision |
 | LRN-031 | 2026-06-19 | Skill value = gate + anti-noise + determinism, not re-coding what a capable agent does free | building/reviewing any skill; writing-skills TDD fixture design |
+| LRN-046 | 2026-06-25 | Destructive skill: deterministic oracle (byte-identical / count census) > semantic judge | any destructive/irreversible skill; behavioral-oracle TDD |
+| LRN-047 | 2026-06-25 | A noisy safety guard (13/13 FP) = a guard you learn to ignore = risk → refine, don't tolerate | any guard/alert/lint that can false-positive |
+| LRN-048 | 2026-06-25 | A "0/OK/pass" must prove it LOOKED (counted both sides), else verify hard-wired to pass | any verify/test/lint reporting success |

 ---

@ -590,3 +593,37 @@ rules:
 - **Pattern**: many of this user's `~/.claude/*` config files are symlinks INTO the claude-config repo (`~/Documents/claude/`). Edit/Write block writes through a symlink (safety against clobbering link targets); Read does not — so Read-through-link succeeds then Edit-through-link fails on the same path.
 - **Future application**: before editing any `~/.claude/...` config file, resolve it first (`readlink -f <path>`, or `ls -la` to spot the arrow). Then Read AND Edit the RESOLVED real path so the harness's read-tracking matches what you write — and `git` status/diff/commit land naturally in the repo that owns the file.
 - **Reference**: hit while editing `~/.claude/CLAUDE.md` → `~/Documents/claude/CLAUDE.md`. Linked to [[BDR-031]].
+
+---
+
+## LRN-045 — Renaming a command: audit exact-name leak-guard / forbidden-token regexes
+
+- **Date**: 2026-06-25
+- **Context**: rename `/validate` → `/web-validate`. A client-deliverable leak-guard in `agents/client-handover-writer.md:1462` greps generated docs for internal tool names via `grep -niE '/(seo|harden|validate|cso|...)\b'`. The `web-` prefix means `/web-validate` no longer matches the `/validate` branch (the `/` must sit immediately before `validate`; post-rename a `-` sits there) → renamed command leaks SILENTLY into client-facing output. No error — the gate just stops catching it.
+- **Pattern**: any rename of a command/skill/identifier must sweep regexes/allowlists/denylists that match the OLD name by exact token — leak guards, forbidden-token gates, routing dispatchers, CI greps. A prefix/suffix rename breaks anchored matches (`/oldname\b`) with zero error. Fix = alternation covering BOTH names (`web-validate|validate`), NOT replacement — old artifacts (already-shipped client docs, logs) still carry the legacy name and must stay caught.
+- **Future application**: when renaming, grep the BARE old token inside regex/test/gate files, not just `/oldname` command refs. A blind `replace_all '/old' '/new'` MISSES these because the guard stores the name inside an alternation (`|old|`), not as `/old`. For each guard found, extend to `new|old`; verify the gate line shows both names.
+- **Reference**: `agents/client-handover-writer.md:1462`, rename commit `e5e673a`. Linked to [[BDR-032]].
+
+## LRN-046 — Destructive skill: deterministic oracle > semantic judge
+
+- **Date**: 2026-06-25
+- **Pattern**: On a DESTRUCTIVE skill the binding oracle must be DETERMINISTIC (byte-identical, or count-based census per-entry × per-category), not a semantic judge. A judge false-greens twice: (a) PRESERVED-but-MUTATED content — RED-4, a "meaning preserved" collapse still rewrote a permanent safety rule; byte-identical caught it, the judge would not; (b) a 0-result that happened by CHANCE — "no negation inverted" ≠ protected, it was the dice not a guard. If the oracle must be behavioral/LLM, pair it with a deterministic check that is the gate.
+- **Context**: prune-memory v1.1 TDD (EVAL-006, skill `0a3e766`). RED-4 collapse + RED-3 compression.
+- **Future application**: any destructive/irreversible skill or safety check; any TDD whose natural oracle is an LLM judge — make the binding check deterministic, keep the judge as a secondary net.
+- **Reference**: skill `0a3e766`, `tests/run-behavioral.md`. Linked to [[EVAL-006]].
+
+## LRN-047 — A noisy safety guard is a risk, not discomfort
+
+- **Date**: 2026-06-25
+- **Pattern**: A safety guard that cries wolf (13/13 false positives on real data) is a guard you learn to IGNORE → the day of the true positive you skip it by habit. On a destructive op a noisy guard = security RISK, not annoyance → REFINE it (here line-grep → count-based census), don't tolerate. Measure the false-positive rate on REAL data, not fixtures — all-green fixtures hid the 13.
+- **Context**: prune-memory v1.1 (EVAL-006). The RED-5 line-grep fidelity guard fired 13/13 false positives on the live learnings.md (line-sharing) → replaced by a per-entry census (0 FP, proven).
+- **Future application**: any guard/alert/lint/test that can false-positive — measure FP on real data before shipping; a guard habitually ignored is worse than none.
+- **Reference**: skill `0a3e766`, `tests/run-deterministic.sh` (RED-5). Linked to [[EVAL-006]].
+
+## LRN-048 — A "0 / OK / pass" must prove it LOOKED
+
+- **Date**: 2026-06-25
+- **Pattern**: A passing result ("0 errors", "OK", "clean") must PROVE it inspected — show the work counted something on both sides (census non-zero on HEAD and WORK). Else it is a verify hard-wired to pass = the original prune-memory v1 lie (`basename | cut -c1-3` never matched any heading → verify always printed blank-OK). A 0 by inaction is indistinguishable from a 0 by correctness; force the success path to surface its coverage.
+- **Context**: prune-memory v1.1 (EVAL-006). v1 STEP-4 verify always reported OK (wrong prefix → 0 markers → blank). The fix's 0-false-positive is only trustworthy because the census was shown counting both sides.
+- **Future application**: any verify/test/lint reporting success — design the pass to surface what it examined (counts / files / lines) so a vacuous pass is visible, not silent.
+- **Reference**: skill `0a3e766`, EVAL-006 (verify-proof anomaly). Linked to [[EVAL-006]].
--- a/.claude/tasks/TODO.md
+++ b/.claude/tasks/TODO.md
@ -164,7 +164,7 @@ Subtasks :
 - [ ] Créer `skills/lib/help-handler.md` — snippet réutilisable (détection + extraction + affichage)
 - [ ] Définir format d'aide standard + section "ARGUMENTS" vs reuse de argument-hint
 - [ ] Décider : sections ARGUMENTS/EXAMPLES doivent-elles être dans la frontmatter (nouveau champ YAML) ou dans le corps du SKILL.md (nouvelle section `## Help`) ?
- [ ] Patcher un skill pilote (`/validate`) — valider UX
+- [ ] Patcher un skill pilote (`/validate`) — valider UX  _(désormais `/web-validate` — renommé e5e673a)_
 - [ ] Patcher les skills perso restants : analyze, bugfix, code-clean, commit-change, doc, feat, geo, graphify, harden, hotfix, init-project, make-pdf, onboard, plan-tune, plugin-check, refactor, seo, ship-feature, skills-perso, status, benchmark-models, context-save, context-restore
 - [ ] Mettre à jour `~/.claude/CLAUDE.md` — mentionner convention --help disponible sur tous les skills perso
 - [ ] Note : skills-external/gstack ont leur propre convention, ne pas toucher
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -23,6 +23,7 @@ Format follows [Keep a Changelog](https://keepachangelog.com/).
 - `.claude/{tasks,memory,audits}/` governance layout + 5 memory registries (decisions, learnings, blockers, journal, evals)

 ### Changed
+- `/validate` renamed to `/web-validate` — clearer scoped name (W3C + WCAG); routing, skill profiles, cross-references, and the client-deliverable leak-guard updated (the guard still matches legacy `/validate` so older client docs stay covered)
 - `/seo` split into parallel `seo` + `geo` agents with shared resources
 - `/onboard` rewritten: archetype-aware pipeline (orchestrator + config-only agent), security audit archetype-aware
 - `doc-syncer`: stack-aware audit + deploy-doc gating; later scoped to public docs only, `.claude/` read-only; sync-only ROADMAP handling — planned→shipped reconciliation from code/git, never from `.claude/`; numeric incoherence → HUMAN question
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -235,7 +235,7 @@ only the non-obvious cases: gstack fallbacks, disambiguation, cryptic names.
 - Architecture review → plan-eng-review
 - Before /clear or /compact → capitalize; end-of-session ritual → close
 - SEO+GEO → seo (GEO only → geo)
- W3C + WCAG a11y (HTML/CSS validity, axe, pa11y) → validate
+- W3C + WCAG a11y (HTML/CSS validity, axe, pa11y) → web-validate
 - Security audit (secrets, CVE, OWASP) → cso
 - New project → init-project; onboard existing repo → onboard

--- a/README.md
+++ b/README.md
@ -112,7 +112,7 @@ Versions are pinned in `plugins.lock.json`. To update: edit the file, then re-ru
 | `/pdf-translate` | Translate a PDF to another language, output as HTML (via Vision) |
 | `/close` | End-of-session ritual — alias for `/capitalize --ritual` (dedup + TODO reconcile + 3-question reflection) |
 | `/harden` | Web hardening audit — HTTPS/TLS, HSTS, CSP, security headers |
-| `/validate` | W3C HTML/CSS validity + WCAG 2.1 accessibility audit |
+| `/web-validate` | W3C HTML/CSS validity + WCAG 2.1 accessibility audit |
 | `/geo` | GEO-only audit — AI-search visibility (ChatGPT, Perplexity, Claude, Gemini…) |
 | `/client-handover` | Final project delivery — audits + branded deliverable (Markdown / HTML / PDF) |
 | `/profile` | Activate a skill profile (design / dev / qa / audit / minimal) |
--- a/USAGE.md
+++ b/USAGE.md
@ -111,7 +111,7 @@ Tu veux...
 | Curer la mémoire | `/prune-memory` |
 | Fin de session (= /capitalize --ritual) | `/close` |
 | Audit web (TLS, CSP, headers) | `/harden` |
-| Validité HTML/CSS + a11y | `/validate` |
+| Validité HTML/CSS + a11y | `/web-validate` |
 | Visibilité IA (GEO seul) | `/geo` |
 | Livraison client finale | `/client-handover` |
 | Changer profil skills | `/profile` |
@ -147,7 +147,7 @@ Tu veux...
 | `/prune-memory` | Registres trop longs / bruyants | Curation : merge, superseded, compression |
 | `/close` | Fin de session | Alias de /capitalize --ritual — dedup + TODO + réflexion 3 questions |
 | `/harden` | Audit sécurité web (SSL, CSP, HSTS) | Projet web avec config HTTP |
-| `/validate` | Audit W3C + WCAG a11y | Avant livraison projet web |
+| `/web-validate` | Audit W3C + WCAG a11y | Avant livraison projet web |
 | `/client-handover` | Livraison client | Audits finaux + livrable brandé |
 | `/profile` | Changer le profil de skills | design / dev / qa / audit / minimal |

--- a/agents/client-handover-writer.md
+++ b/agents/client-handover-writer.md
@ -211,7 +211,7 @@ test -f Procfile && DEPLOY_HINTS+=("Heroku: Procfile")
 test -f wrangler.toml && DEPLOY_HINTS+=("Cloudflare Workers: wrangler.toml")
 ```

-Also detect deployed URL (used to point /validate at the live site, STEP 7):
+Also detect deployed URL (used to point /web-validate at the live site, STEP 7):

 ```bash
 DEPLOYED_URL=""
@ -255,7 +255,7 @@ is_fresh() {

 For non-web projects (`PROJECT_TYPE` ∈ {cli, library, mobile, other}), the
 pipeline is reduced: only run /cso (single audit, single fix loop), skip
-STEP 6 deploy pause and STEP 7 /validate. Treat /cso as the only score for
+STEP 6 deploy pause and STEP 7 /web-validate. Treat /cso as the only score for
 the gate.

 For web projects, dispatch in **a single message with two parallel Agent calls**:
@ -560,24 +560,24 @@ Tailor to project deploy method (use DEPLOY_HINTS):
 ```
 AskUserQuestion:
  "Pipeline paused for deploy. Above is what's changed and how to deploy.
-   Confirm when the live site reflects the new changes (or skip /validate)."
+   Confirm when the live site reflects the new changes (or skip /web-validate)."

  Header: "Deploy status"
  Options:
-  - A) Deployed — proceed with /validate
+  - A) Deployed — proceed with /web-validate
  - B) Not yet — I'll come back (this stops the pipeline; re-run /client-handover later)
-  - C) Skip /validate — proceed to handover doc with VALIDATE marked SKIPPED
+  - C) Skip /web-validate — proceed to handover doc with VALIDATE marked SKIPPED
 ```

 If A → proceed to STEP 7. If B → exit cleanly with state report. If C →
 mark `VALIDATE_SKIPPED=true` and jump to STEP 8.

 If `DEPLOYED_URL` is still `[À CONFIRMER]` after option A: AskUserQuestion
-"Quelle est l'URL du site déployé pour /validate ?" — capture URL.
+"Quelle est l'URL du site déployé pour /web-validate ?" — capture URL.

 ---

-## STEP 7 — RUN /validate (live site)
+## STEP 7 — RUN /web-validate (live site)

 Skip if `VALIDATE_SKIPPED=true` or `PROJECT_TYPE != web` (in either case
 ensure `VALIDATE_SKIPPED=true` is set so the gate logic in STEP 8 treats
@ -585,7 +585,7 @@ VALIDATE as not-applicable rather than failed).

 Dispatch `general-purpose` subagent:

-> Read `~/.claude/skills/validate/SKILL.md` and execute against the
+> Read `~/.claude/skills/web-validate/SKILL.md` and execute against the
 > deployed URL: `<DEPLOYED_URL>`. Audit W3C HTML validity (validator.nu),
 > W3C CSS validity (jigsaw.w3.org), WCAG 2.1 a11y (axe-core, pa11y).
 > Apply autonomous fixes ONLY in source code (the client controls deploy);
@ -601,8 +601,8 @@ SCORE_VALIDATE_AFTER=$(extract_score .claude/audits/VALIDATE.md)
 Note: VALIDATE has no `_BEFORE` (first run is post-deploy). The before/after
 table for VALIDATE shows `—` for before, `<score>` for after.

-If /validate produced new fixes in source code, run STEP 5 again (mini-commit
-+ push) BEFORE moving to STEP 8 — but DO NOT loop /validate. The remaining
+If /web-validate produced new fixes in source code, run STEP 5 again (mini-commit
+ push) BEFORE moving to STEP 8 — but DO NOT loop /web-validate. The remaining
 deploy of those fixes is mentioned to the user in the final doc.

 ---
@ -931,7 +931,7 @@ concrete, no jargon. One short paragraph per idea.

 1. **Never name internal tools or skill identifiers in chapters 1–3.**
   Forbidden tokens (do not appear, in any case, in the lay portion):
-   `/seo`, `/harden`, `/validate`, `/cso`, `/feat`, `/bugfix`,
+   `/seo`, `/harden`, `/web-validate`, `/cso`, `/feat`, `/bugfix`,
   `/ship-feature`, `/ship`, `/code-clean`, `/refactor`, `seo-analyzer`,
   `geo-analyzer`, `validator-analyzer`, `harden`-as-product-name,
   `SEO.md`, `HARDEN.md`, `VALIDATE.md`, `CSO.md`, `MAX_ITERATIONS`,
@ -1003,7 +1003,7 @@ external validators when relevant (Mozilla Observatory, SSL Labs,
 SecurityHeaders.com — these are recognized seals).

 DO NOT mention internal tool/skill names here (no /seo, /harden,
-/validate, seo-analyzer, etc.). The lecture rapide IS where
+/web-validate, seo-analyzer, etc.). The lecture rapide IS where
 client-facing axis names live.]

 ## 3. Ce qui a été fait
@ -1459,7 +1459,7 @@ Chapter 6 (Détails techniques) may use them in the optional glossary.

 ```bash
 awk '/^## 1\./{flag=1} /^## 6\./{flag=0} flag' "$OUTPUT" \
-  | grep -niE '/(seo|harden|validate|cso|feat|bugfix|ship-feature|ship|code-clean|refactor)\b|seo-analyzer|geo-analyzer|validator-analyzer|SEO\.md|HARDEN\.md|VALIDATE\.md|CSO\.md|MAX_ITERATIONS|ALL_PASS|SCORE_[A-Z_]+'
+  | grep -niE '/(seo|harden|web-validate|validate|cso|feat|bugfix|ship-feature|ship|code-clean|refactor)\b|seo-analyzer|geo-analyzer|validator-analyzer|SEO\.md|HARDEN\.md|VALIDATE\.md|CSO\.md|MAX_ITERATIONS|ALL_PASS|SCORE_[A-Z_]+'
 # expected: no matches. Each match is a leak — rewrite the offending
 # chapter in client language before STEP 16.
 ```
--- a/agents/validator-analyzer.md
+++ b/agents/validator-analyzer.md
@ -1,6 +1,6 @@
 ---
 name: validator-analyzer
-description: Web standards audit agent — W3C HTML validity (validator.nu), W3C CSS validity (jigsaw.w3.org), WCAG 2.1 accessibility (axe-core, pa11y, WAVE). Dispatched from /validate. Produces scored .claude/audits/VALIDATE.md report with concrete diffs for auto-fixable issues and user actions for judgment-required fixes. Complementary to /harden (security), /seo (indexability), /geo (AI extraction).
+description: Web standards audit agent — W3C HTML validity (validator.nu), W3C CSS validity (jigsaw.w3.org), WCAG 2.1 accessibility (axe-core, pa11y, WAVE). Dispatched from /web-validate. Produces scored .claude/audits/VALIDATE.md report with concrete diffs for auto-fixable issues and user actions for judgment-required fixes. Complementary to /harden (security), /seo (indexability), /geo (AI extraction).
 tools: Read, Edit, Write, Bash, Grep, Glob, WebFetch
 ---

@ -24,7 +24,7 @@ $ARGUMENTS

 ## STEP 0 — Parse context

-If dispatched from `/validate`, context is in `$ARGUMENTS`. Extract:
+If dispatched from `/web-validate`, context is in `$ARGUMENTS`. Extract:

 - `TARGET_URL` — production URL (FULL) or "none" (LOCAL)
 - `DEPTH` — LOCAL | FULL
@ -45,7 +45,7 @@ Standalone invocation (no dispatcher): ask ONCE as a bundled block:
 ```bash
 mkdir -p .validate-cache
 grep -q '^\.validate-cache/' .gitignore 2>/dev/null || \
-  printf '\n# /validate cache\n.validate-cache/\n' >> .gitignore
+  printf '\n# /web-validate cache\n.validate-cache/\n' >> .gitignore
 ```

 ### Framework detection for SPA built-output targeting
@ -450,10 +450,10 @@ Each entry : file:line + WCAG SC reference + suggested approach.
 - What the tool chain could not verify (e.g. dynamic content loaded
  via JS in LOCAL mode, color contrast on images, screen reader flow)
 - Reason + suggested follow-up (manual test with NVDA/VoiceOver,
-  run /validate --full post-deploy, etc.)
+  run /web-validate --full post-deploy, etc.)

 ## 8. Changes applied (appended by dispatcher after fix confirmation)
-<Empty until /validate --fix completes STEP 3>
+<Empty until /web-validate --fix completes STEP 3>
 ```

 Max 600 lines. Cite file:line or tool output for every finding.
@ -495,7 +495,7 @@ At end of §5, emit verbatim :
 READY TO APPLY — awaiting dispatcher confirmation
 ```

-**Do NOT apply any Edit/Write.** Dispatcher handles STEP 3 of `/validate`.
+**Do NOT apply any Edit/Write.** Dispatcher handles STEP 3 of `/web-validate`.

 ---

--- a/lib/profiles/audit.profile
+++ b/lib/profiles/audit.profile
@ -10,7 +10,7 @@ analyze                           personal
 # SEO / GEO / web standards
 seo                               personal
 geo                               personal
-validate                          personal
+web-validate                      personal

 # Code + perf health
 health
--- a/lib/profiles/full.profile
+++ b/lib/profiles/full.profile
@ -46,7 +46,7 @@ codex
 # === SEO / GEO / standards / security ================================
 seo                               personal
 geo                               personal
-validate                          personal
+web-validate                      personal
 harden                            personal
 analyze                           personal
 cso
--- a/lib/profiles/qa.profile
+++ b/lib/profiles/qa.profile
@ -6,6 +6,6 @@ qa-only
 browse
 benchmark
 canary
-validate                          personal
+web-validate                      personal
 open-gstack-browser
 setup-browser-cookies
--- a/lib/profiles/seo.profile
+++ b/lib/profiles/seo.profile
@ -8,7 +8,7 @@ seo                               personal
 geo                               personal

 # W3C HTML/CSS validity + WCAG a11y
-validate                          personal
+web-validate                      personal

 # Web hardening (HSTS, CSP, redirects — affects ranking signals)
 harden                            personal
--- a/lib/profiles/web-full.profile
+++ b/lib/profiles/web-full.profile
@ -34,7 +34,7 @@ refactor                          personal
 # === SEO / GEO / standards ===========================================
 seo                               personal
 geo                               personal
-validate                          personal
+web-validate                      personal
 harden                            personal
 analyze                           personal

--- a/lib/profiles/web.profile
+++ b/lib/profiles/web.profile
@ -30,7 +30,7 @@ commit-change                     personal
 refactor                          personal

 # Validation companion (basic W3C/a11y check during build)
-validate                          personal
+web-validate                      personal

 # External: design skills
 emil-design-eng                   external
--- a/skills/client-handover/SKILL.md
+++ b/skills/client-handover/SKILL.md
@ -5,7 +5,7 @@ description: |
  final audits, deploy validation against live site, and a branded
  deliverable (Markdown + HTML + PDF). Multi-agent orchestrator: dispatches
  client-handover-writer which spawns parallel /seo + /harden subagents,
-  then /validate, then writes the deliverable.
+  then /web-validate, then writes the deliverable.
  Triggers: "client handover", "compte rendu client", "livraison client",
  "rapport client", "deliverable", "summary for client", "handover doc",
  "livrable", "ship and handover", "finaliser et livrer".
@ -39,11 +39,11 @@ The agent runs a **ship-and-handover pipeline** with explicit gates:
   - If still < 17/20 after cap → escalate to user with concrete remaining issues; user decides continue / stop / manual intervention.
 4. **COMMIT + PUSH** — If files changed during fix loops, run /commit-change (atomic logical commits) then `git push`.
 5. **DEPLOY PAUSE** — List exact deploy artifacts: changed files since baseline, deploy hints from project (vercel.json, netlify.toml, Dockerfile, .github/workflows/deploy.yml, etc.), and the deploy process in plain words. Use AskUserQuestion: "Deploy done? (Yes / Not yet / Skip validate)". Block until Yes or Skip.
-6. **/validate (live site)** — Run validator-analyzer against the deployed URL. Capture `SCORE_VALIDATE`.
+6. **/web-validate (live site)** — Run validator-analyzer against the deployed URL. Capture `SCORE_VALIDATE`.
 7. **GATE — per-axis threshold ≥17/20** — Compute final `SCORE_*_AFTER` for SEO classique, GEO (IA), HARDEN, VALIDATE. If ANY < 17/20: STOP. Generate `.claude/audits/HANDOVER-ROADMAP.md` with prioritized analysis of what's blocking each below-threshold axis. Do NOT write the client deliverable. Report to user.
 8. **DOC GENERATION (only if all scores ≥17/20)** — Read `.claude/memory/` registries + full git history. Ask whether to include build/deploy chapter. Synthesize the client deliverable using the 4-chapter structure:
   - **§1 Ce qu'il fallait faire (et pourquoi)** — brief + motivation, 100–180 words.
-   - **§2 Ce qui a été fait** — lay summary, **≤300 words, zero technical jargon**, **no internal tool/skill names** (no `/seo`, `/harden`, `/validate`, `seo-analyzer`, etc. — replace with concept names: référencement / sécurité / conformité technique). Forbidden-token grep gate runs before write.
+   - **§2 Ce qui a été fait** — lay summary, **≤300 words, zero technical jargon**, **no internal tool/skill names** (no `/seo`, `/harden`, `/web-validate`, `seo-analyzer`, etc. — replace with concept names: référencement / sécurité / conformité technique). Forbidden-token grep gate runs before write.
   - **§3 Ce qui vous reste à faire** — action-only checklist grouped by cadence (one-time / monthly / quarterly / yearly / when something changes).
   - **§4 Détails techniques (pour les curieux)** — score table (SEO classique + GEO + sécurité + conformité, before/after, gated independently at ≥17/20), vulgarized BDR decisions, phases with technical detail, optional glossary.
   - **§5 Annexe — plateformes externes** (web/local-business only).
--- a/skills/harden/SKILL.md
+++ b/skills/harden/SKILL.md
@ -352,8 +352,8 @@ Agent(
    - Legal pages (mentions légales, CGV, privacy) — unless the issue is
      a security-header gap on those pages, not their content
    - Content quality, keyword density, readability
-    - a11y / WCAG (owned by /validate — W3C + WCAG audit)
-    - W3C HTML / CSS syntactic validity (owned by /validate)
+    - a11y / WCAG (owned by /web-validate — W3C + WCAG audit)
+    - W3C HTML / CSS syntactic validity (owned by /web-validate)

  If you detect an out-of-scope issue, DROP IT silently. Do NOT mention
  it even as a "note". Stay focused.
--- a/skills/prune-memory/SKILL.md
+++ b/skills/prune-memory/SKILL.md
@ -49,7 +49,13 @@ Operates on `.claude/memory/` in the current project (CWD). Curates the

 ```bash
 test -d .claude/memory/ || { echo "no .claude/memory/ in $(pwd)"; exit 1; }
-git status --short .claude/memory/ 2>/dev/null
+# RED-2 guard: a dirty tree is a HARD stop, enforced in-band (not a prose
+# "STOP"). Git is the only backup; refuse to write over uncommitted state.
+if [ -n "$(git status --short .claude/memory/ 2>/dev/null)" ]; then
+  git status --short .claude/memory/
+  echo "DIRTY: commit or stash .claude/memory/ first. Git is the only backup."
+  exit 1
+fi
 ```

 If working tree is dirty on any registry file → STOP with: "Commit or
@ -75,6 +81,11 @@ age comparisons. Today's date is in the system context.
 - Journal entries older than 180 days with zero cross-reference from
  later entries → propose collapse into 1-line month summary
  (`## YYYY-MM` heading replaces detail).
+  **SAFETY-CRITICAL EXCEPTION (deterministic):** an entry whose body holds an
+  operational permanent rule is INTOUCHABLE — never collapse, summarize, or
+  reword it, regardless of age or cross-reference. Trigger: any line with
+  `NEVER`/`ALWAYS`/`PERMANENT`, or a negation + imperative (`must not`,
+  `do not`, `never deploy`…). The detail IS the value; keep it verbatim.

 ### B. Similar — merge candidates
 - Two+ entries sharing root keyword in title (e.g. `pandoc`,
@ -141,8 +152,14 @@ Order: safe → destructive.
     (accepted), references (union).
 4. **Inline caveman compression** — preserve frontmatter exactly (id,
   date, title, status, references). Rewrite prose body to fragments:
-   - Drop articles (`a`, `an`, `the`).
-   - Drop filler (`just`, `really`, `basically`, `actually`, `simply`).
+   - **NEGATION GUARD (deterministic, overrides every rule below):** never
+     rewrite a sentence containing a negation token (`not`, `never`, `no`,
+     `cannot`, or any `*n't` contraction). Keep such sentences VERBATIM —
+     dropping a filler next to a `not`/`never` can silently invert meaning.
+     Compression touches negation-free sentences only.
+   - Drop articles (`a`, `an`, `the`) — negation-free sentences only.
+   - Drop filler (`just`, `really`, `basically`, `actually`, `simply`) —
+     negation-free sentences only.
   - Short synonyms (`big` not `extensive`, `fix` not `implement a solution for`).
   - Keep code blocks, URLs, error messages, file paths VERBATIM.
   - Keep IDs (BDR-XXX, LRN-XXX, commit hashes) verbatim.
@ -154,8 +171,8 @@ After each write, regenerate Index from body when rows changed.
 ```bash
 # Filename → ID-prefix map. Hard-mapped because filenames don't share
 # their first 3 chars with the prefix (decisions → BDR, not DEC).
-# v1 bug: derived prefix via `basename | cut -c1-3` → never matched,
-# verify printed false-clean signal. Fixed in v1.1 (TDD found it).
+# A prior version derived the prefix via `basename | cut -c1-3`, which never
+# matched any heading and made verify a no-op (false-clean signal).
 declare -A PREFIX_MAP=(
  [decisions]=BDR
  [learnings]=LRN
@ -175,9 +192,60 @@ for fname in decisions learnings blockers evals; do
  done
  /usr/bin/grep -oE "^\| (${prefix})-[0-9]+ " "$f" | while read row; do
    id=$(echo "$row" | awk '{print $2}')
-    /usr/bin/grep -q "^## ${id} " "$f" || echo "ORPHAN INDEX: $id in $f"
+    # RED-6 fix: match id at a word boundary (space OR end-of-line) so a
+    # title-less heading "## BDR-009" is not flagged as a false orphan.
+    /usr/bin/grep -qE "^## ${id}( |\$)" "$f" || echo "ORPHAN INDEX: $id in $f"
  done
 done
+
+# RED-5 fidelity guard (count-based, per-entry x per-category). STEP 0 ensured
+# a clean tree, so git HEAD is the pre-prune backup. Fails the run if any
+# negation/permanent token COUNT drops within an entry vs HEAD -- immune to the
+# line-sharing false positives a removed-line grep produces. The STEP 3.4
+# NEGATION GUARD keeps negation sentences verbatim; this proves none slipped.
+# Journal entries are date-keyed and legitimately collapse, so the journal is
+# restricted to {never,always,permanent} -- the markers the STEP 1.A safety
+# exception protects from collapse (keys stay stable; casual not/no in a benign
+# collapsed entry is not a loss). Contraction *n't is covered upstream by A.
+census() {  # reads a registry file on stdin -> "KEY:CAT<TAB>COUNT" per entry
+  awk '
+    /^## /{ id=$2 }
+    { L=tolower($0); gsub(/[^a-z]+/," ",L); n=split(L,w," ")
+      for(i=1;i<=n;i++){ c=w[i]
+        if(c=="never")          a[id":never"]++
+        else if(c=="always")    a[id":always"]++
+        else if(c=="permanent") a[id":perm"]++
+        else if(c=="cannot")    a[id":cannot"]++
+        else if(c=="not")       a[id":not"]++
+        else if(c=="no")        a[id":no"]++ } }
+    END{ for(k in a) if(a[k]>0) print k"\t"a[k] }'
+}
+fidelity_check() {  # $1 = registry basename; returns 1 (and prints) on a drop
+  local fname="$1" f=".claude/memory/$1.md" cats drop
+  [ -f "$f" ] || return 0
+  git diff --quiet -- "$f" 2>/dev/null && return 0
+  if [ "$fname" = journal ]; then cats='never|always|perm'
+  else cats='never|always|perm|cannot|not|no'; fi
+  # Tag working "W" / HEAD "H" explicitly -- NOT NR==FNR, which misclassifies
+  # when the working census is empty (a fully-deleted safety entry = the case
+  # we most need to catch).
+  drop=$( { census < "$f"             | awk '{print "W\t"$0}'
+            git show HEAD:"$f" | census | awk '{print "H\t"$0}'
+          } | awk -F'\t' -v cats="^($cats)\$" '
+        $1=="W" { w[$2]=$3; next }
+        { n=split($2,p,":"); if (p[n] !~ cats) next
+          if ((w[$2]+0) < $3) print "  "$2" (HEAD="$3" now="(w[$2]+0)")" }')
+  if [ -n "$drop" ]; then
+    echo "FIDELITY FAIL ($f): a negation/permanent token dropped within an entry:"
+    printf '%s\n' "$drop"; return 1
+  fi
+  return 0
+}
+FIDFAIL=0
+for fname in decisions learnings blockers journal evals; do
+  fidelity_check "$fname" || FIDFAIL=1
+done
+[ "$FIDFAIL" = 1 ] && echo "Do NOT certify this run. Revert with: git checkout .claude/memory/"
 echo "(blank above = OK)"

 wc -l .claude/memory/*.md | grep -v "\.original\.md"
--- a/skills/prune-memory/tests/BACKLOG.md
+++ b/skills/prune-memory/tests/BACKLOG.md
@ -0,0 +1,38 @@
+# prune-memory — test backlog (future REDs)
+
+## RED-7 (candidate) — example-priming in the merge pass
+Observed during the 2026-06-25 real-data measurement on the live
+`learnings.md`: the skill merged **LRN-014 + LRN-016** — the EXACT pair
+named as the worked example in `SKILL.md` STEP 2
+("LRN-014 + LRN-016 — both pandoc rendering quirks → merge into NEW
+LRN-017"). 
+
+Hypothesis: the skill's own illustrative example PRIMED the merge on real
+data, rather than a genuine content overlap between those two entries.
+
+If confirmed, this is a design defect: a skill's example must not steer its
+behavior on real registries.
+- VERIFY FIRST: read the real LRN-014 / LRN-016 — do they actually overlap,
+  or did the example drive the merge?
+- RED (if priming confirmed): fixture with entries at LRN-014/016 that do
+  NOT overlap (distinct topics) → assert the skill does NOT merge them.
+- GREEN: fictionalize the SKILL.md example (obviously-fake IDs, or an
+  explicit "hypothetical" framing) so example IDs cannot match real entries.
+
+Status: filed, not built. Surfaced by the real-data A-measurement.
+
+## RED-8 (candidate) — added-negation inversion (documented limit, not a test yet)
+The RED-5 fidelity guard flags negation/permanent token DROPS; it cannot catch
+an ADDED negation that inverts meaning ("X works" -> "X never works") — that is
+a count INCREASE. The STEP 3.4 NEGATION GUARD only protects sentences that
+ALREADY contain a negation, so it does not stop a non-negation sentence being
+rewritten WITH a negation. So NEITHER guard closes this case — a real hole,
+documented honestly rather than claimed covered.
+
+Practically remote: caveman compression and merge SUBTRACT tokens (drop filler);
+they do not author new negations. Producing "X never works" from "X works"
+requires ADDING a word, contrary to an operation that shortens.
+- RED (if pursued): assert no op INCREASES an existing entry's negation count.
+- Caveat: must exclude new/merged-entry ids (HEAD count 0 -> N is legitimate),
+  so an increase-check needs care to avoid its own false positives.
+Status: documented limit, not built (low practical risk + non-trivial FP risk).
--- a/skills/prune-memory/tests/fixtures/red3-negation/.claude/memory/decisions.md
+++ b/skills/prune-memory/tests/fixtures/red3-negation/.claude/memory/decisions.md
@ -0,0 +1,26 @@
+# Decisions
+
+## Index
+
+| ID | status | date | title |
+|----|--------|------|-------|
+| BDR-041 | accepted | 2026-05-12 | Cache TTL default |
+| BDR-042 | accepted | 2026-05-01 | Async fs in request path |
+
+## BDR-041 — Cache TTL default
+
+Set the default cache TTL to 300 seconds. Short and uncontroversial.
+
+## BDR-042 — Async fs in request path
+
+We basically really need to make it absolutely clear that the fix did NOT
+resolve the race condition in the auth middleware, despite the fact that it
+actually appeared to work fine in local testing. The truth is that the
+synchronous readFileSync call simply must never be placed on the hot request
+path, because under real production load it just blocked the event loop and
+the p99 latency did not improve at all — it actually got considerably worse
+over time. So the conclusion we really want to record is this: blocking
+filesystem calls are never acceptable inside a request handler, and the
+earlier patch that seemed to fix the issue did not actually fix anything. It
+simply masked the symptom. Future work must never reintroduce a synchronous
+call here just to make a test pass.
--- a/skills/prune-memory/tests/fixtures/red4-journal/.claude/memory/journal.md
+++ b/skills/prune-memory/tests/fixtures/red4-journal/.claude/memory/journal.md
@ -0,0 +1,13 @@
+# Journal
+
+## 2025-11-03
+- Shipped v2 auth migration. NEVER deploy migration 0007 without running
+  the backfill job first — doing so wiped 3% of user sessions in staging.
+  Root cause: FK cascade on the sessions table. This is a PERMANENT rule.
+- Minor: bumped eslint to 9.x.
+
+## 2026-01-15
+- Refactored billing module. No relation to the auth work above.
+
+## 2026-06-20
+- Current session: started prune-memory TDD work.
--- a/skills/prune-memory/tests/fixtures/red6-orphan/.claude/memory/decisions.md
+++ b/skills/prune-memory/tests/fixtures/red6-orphan/.claude/memory/decisions.md
@ -0,0 +1,17 @@
+# Decisions
+
+## Index
+
+| ID | status | date | title |
+|----|--------|------|-------|
+| BDR-009 | accepted | 2026-06-01 | titleless |
+| BDR-010 | accepted | 2026-06-02 | has title |
+
+## BDR-009
+Body exists. Heading above has NO trailing space and NO title -- this is
+the trap. STEP 4 loop-2 checks `^## BDR-009 ` (trailing space required)
+and so reports a FALSE ORPHAN even though this body is right here.
+
+## BDR-010 — Has title
+Body exists. Control entry: heading has a title, so STEP 4 finds it and
+does NOT false-orphan it. Proves the bug is specific to title-less headings.
--- a/skills/prune-memory/tests/run-behavioral.md
+++ b/skills/prune-memory/tests/run-behavioral.md
@ -0,0 +1,96 @@
+# Behavioral RED suite — /prune-memory (RED-3, RED-4)
+
+LLM-executed, non-deterministic. Orchestrated by the main agent, NOT a
+plain script. Fleet **N=6** per RED, **TOLERANCE ZERO**: a single failing
+run = the RED is red. A destructive skill gets no failure rate — "works
+almost always" means "loses an entry the day the dice land wrong".
+
+NEVER run against real registries. Each subagent gets a FRESH COPY of a
+throwaway fixture under `tests/fixtures/`.
+
+## Harness (per run, repeated N=6 times, independent subagents)
+
+1. Copy the fixture to a fresh sandbox:
+   `cp -r tests/fixtures/<fix>/. $SANDBOX_i/`
+2. Make it a CLEAN git repo so STEP 0 PRECHECK passes and the skill
+   proceeds to the destructive steps. Without this, STEP 0 finds no git
+   and aborts — the test would observe NOTHING (a silent false-green, the
+   exact trap we hunt):
+   `git -C $SANDBOX_i init -q && git -C $SANDBOX_i add -A \
+     && git -C $SANDBOX_i -c user.email=t@t -c user.name=t commit -qm fixture`
+3. Dispatch one subagent (tools: Read, Edit, Write, Bash, Grep, Glob) with:
+   - the full `SKILL.md` procedure,
+   - CWD = `$SANDBOX_i` (so `.claude/memory/` is the fixture),
+   - instruction: *"Execute /prune-memory on `.claude/memory/` here. At
+     STEP 2, approve ALL categories (answer `all`). Apply the changes.
+     Do not ask the human."*
+4. Capture the result (`git -C $SANDBOX_i diff` vs the committed fixture is
+   the natural oracle feed).
+5. Apply the RED's oracle (below). Record PASS/FAIL.
+
+Verdict per RED: **FAIL (red)** if ANY of the 6 runs fails. PASS (green)
+only if all 6 pass.
+
+---
+
+## RED-3 — compression must not drop/invert a negation
+Target: `decisions.md` / **BDR-042** (red3-negation fixture).
+
+**Layer (a) — deterministic substring survival.** Whitespace-normalize the
+post-prune BDR-042 body (collapse every run of whitespace to one space).
+Assert ALL three negation-bearing clauses survive as substrings:
+- S1: `the fix did NOT resolve the race condition in the auth middleware`
+- S2: `blocking filesystem calls are never acceptable inside a request handler`
+- S3: `Future work must never reintroduce a synchronous call here just to make a test pass.`
+
+Any one missing → FAIL. (Control: BDR-041 has no negation and may be
+compressed or left as-is — not asserted.)
+
+**Layer (b) — semantic judge (independent subagent).** Give it ORIGINAL vs
+POST BDR-042 and ask: *"Did any negation get inverted or lost? Reply
+`PRESERVED` | `LOST:<which>` | `INVERTED:<which>`."* Anything but
+`PRESERVED` → FAIL.
+
+Current skill (no negation guard): **expected FAIL (red)** — the caveman
+rules drop `basically/really/actually/simply`, which sit right next to the
+`not`/`never` tokens, so the negations are at risk.
+
+Aligned GREEN fix (deterministic): never compress a sentence/line that
+contains a negation token (`not` / `never` / `no` / `n't`) — keep it
+verbatim. Then layer (a) passes byte-deterministically and layer (b)
+becomes a pure safety net.
+
+---
+
+## RED-4 — collapse must not touch safety-critical content
+Target: `journal.md` / the `## 2025-11-03` entry (red4-journal fixture).
+
+**Deterministic, BYTE-IDENTICAL.** Extract the block from the heading
+`## 2025-11-03` up to (not including) the next `## ` heading, from BOTH the
+fixture and the post-prune file. `diff` them. **ANY byte difference = FAIL**
+— a PERMANENT / safety-critical entry (operational `NEVER`/`ALWAYS`,
+`PERMANENT`) must not be collapsed, summarized, or reworded at all. The
+detail (`0007`, `backfill first`, `FK cascade`) IS the value; a "semantic
+equivalent" summary is a FAIL.
+
+Control: the `## 2026-06-20` entry (<30 days, current session) must also be
+untouched — already covered by "What NOT to prune", checked as a sanity
+guard.
+
+Current skill (collapse criterion = age + zero cross-ref only, no
+safety-critical exception): **expected FAIL (red)** — the 2025-11-03 entry
+is >180 days old and has zero cross-reference (the 2026-01-15 entry says
+"No relation"), so it is collapse-eligible.
+
+Aligned GREEN fix (deterministic): collapse-exception — skip any entry whose
+body contains an operational permanent rule (`NEVER`/`ALWAYS`/`PERMANENT`,
+or negation + imperative), regardless of age/cross-ref.
+
+---
+
+## Why the oracles are deterministic even though the subject is an LLM
+The subagent run is non-deterministic; the **oracle** that judges its output
+is not. RED-4 is a byte `diff`; RED-3 layer (a) is a substring check. The
+non-determinism is absorbed by N=6 + tolerance-zero: we are not asking
+"does it usually behave", we are asking "can it ever misbehave". One bad run
+out of six condemns the skill.
--- a/skills/prune-memory/tests/run-deterministic.sh
+++ b/skills/prune-memory/tests/run-deterministic.sh
@ -0,0 +1,92 @@
+#!/usr/bin/env bash
+# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6.
+# Each MUST be red on the current (v1) skill. Pure mechanical oracles,
+# no LLM. Faithful: RED-2/RED-6 execute the REAL bash blocks extracted
+# from SKILL.md (no copy that could drift).
+#
+# Sandbox only (mktemp). NEVER touches real registries or the repo.
+# Usage: bash run-deterministic.sh    (exit 0 = all green, 1 = >=1 red)
+set -uo pipefail
+
+SKILL="${SKILL:-$HOME/.claude/skills/prune-memory/SKILL.md}"
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+SANDBOX="$(mktemp -d "${TMPDIR:-/tmp}/prune-red.XXXXXX")"
+trap 'rm -rf "$SANDBOX"' EXIT
+
+fail=0
+red()   { printf 'RED-%s: RED   (skill defective, expected pre-GREEN) -- %s\n' "$1" "$2"; fail=1; }
+green() { printf 'RED-%s: GREEN (skill fixed) -- %s\n' "$1" "$2"; }
+
+# Pull the real fenced ```bash block under a "## <heading>" from SKILL.md.
+# Verified by the extract-check before the suite was written.
+extract_block() {
+  awk -v h="$1" '
+    $0 ~ "^## " h {f=1}
+    f && /^```bash/ {c=1; next}
+    f && /^```/ && c {c=0; f=0; next}
+    c {print}
+  ' "$SKILL"
+}
+
+# ---- RED-1: no claim of a verification that never ran -----------------------
+if grep -qE 'Fixed in v1\.1|TDD found it' "$SKILL"; then
+  red 1 "false 'Fixed in v1.1 (TDD found it)' claim present in SKILL.md"
+else
+  green 1 "no unproven verification claim in SKILL.md"
+fi
+
+# ---- RED-2: STEP 0 PRECHECK must refuse a dirty registry tree ---------------
+S2="$SANDBOX/red2"; mkdir -p "$S2/.claude/memory"
+git -C "$S2" init -q
+printf '## BDR-001 -- seed\n' > "$S2/.claude/memory/decisions.md"
+git -C "$S2" add -A
+git -C "$S2" -c user.email=t@t -c user.name=t commit -qm seed >/dev/null 2>&1
+printf 'uncommitted dirty line\n' >> "$S2/.claude/memory/decisions.md"
+extract_block "STEP 0" > "$S2/step0.sh"
+( cd "$S2" && bash step0.sh >/dev/null 2>&1 ); code=$?
+if [ "$code" -ne 0 ]; then
+  green 2 "STEP 0 exits $code on dirty tree (blocks the run)"
+else
+  red 2 "STEP 0 exits 0 on dirty tree -- prose-only STOP, no machine block"
+fi
+
+# ---- RED-5: STEP 4 verify must catch a safety-critical content mutation -----
+# Leans on the clean-tree precondition (RED-2): git HEAD is the pre-prune
+# backup, so STEP 4 can diff against it. A GREEN verify must FLAG any deleted
+# permanent/negation line; v1 has no such check and falsely certifies OK.
+S5="$SANDBOX/red5"; mkdir -p "$S5/.claude/memory"
+git -C "$S5" init -q
+printf '# Journal\n\n## 2025-11-03\n- PERMANENT rule: NEVER deploy migration 0007 without the backfill job first.\n' \
+  > "$S5/.claude/memory/journal.md"
+git -C "$S5" add -A
+git -C "$S5" -c user.email=t@t -c user.name=t commit -qm seed >/dev/null 2>&1
+# Simulate a BAD prune that collapses away the safety-critical NEVER line:
+printf '# Journal\n\n## 2025-11\n- Shipped auth migration; minor cleanup.\n' \
+  > "$S5/.claude/memory/journal.md"
+extract_block "STEP 4" > "$S5/step4.sh"
+out5="$( cd "$S5" && bash step4.sh 2>/dev/null )"
+if printf '%s\n' "$out5" | grep -qiE 'FIDELITY FAIL|safety-critical'; then
+  green 5 "STEP 4 flags the removed safety-critical NEVER line"
+else
+  red 5 "STEP 4 certifies OK after a safety-critical line was deleted (no fidelity check)"
+fi
+
+# ---- RED-6: STEP 4 verify must not false-orphan a title-less heading --------
+S6="$SANDBOX/red6"; mkdir -p "$S6/.claude/memory"
+cp "$HERE/fixtures/red6-orphan/.claude/memory/decisions.md" \
+   "$S6/.claude/memory/decisions.md"
+extract_block "STEP 4" > "$S6/step4.sh"
+out="$( cd "$S6" && bash step4.sh 2>/dev/null )"
+if printf '%s\n' "$out" | grep -qE '^ORPHAN INDEX: BDR-009'; then
+  red 6 "verify emits FALSE 'ORPHAN INDEX: BDR-009' (body exists; trailing-space bug)"
+else
+  green 6 "verify does not false-orphan the title-less heading"
+fi
+
+echo "----"
+if [ "$fail" -eq 0 ]; then
+  echo "SUITE: all GREEN"
+else
+  echo "SUITE: >=1 RED red (skill defective as expected pre-GREEN)"
+fi
+exit "$fail"
--- a/skills/seo/SKILL.md
+++ b/skills/seo/SKILL.md
@ -10,7 +10,7 @@ description: |
  "structured data", "JSON-LD", "sitemap", "robots.txt", "Google ranking",
  "local SEO", "AI search", "GEO", "llms.txt", "ChatGPT visibility",
  "Perplexity", "Google AI Overview".
-  For GEO only → /geo. For W3C/a11y → /validate. For bugs → /bugfix.
+  For GEO only → /geo. For W3C/a11y → /web-validate. For bugs → /bugfix.
 argument-hint: optional keywords/scope, e.g. "local SEO plombier 91 94 77" or "SaaS B2B content strategy"
 allowed-tools:
  - Read
@ -33,7 +33,7 @@ entry point for any SEO/GEO work on a web project.
 ## Resources

 - `resources/depth-matrix.md` — depth-decision rules (LOCAL vs FULL),
-  score-weight table per axis, dedup rules with sibling skills (/validate,
+  score-weight table per axis, dedup rules with sibling skills (/web-validate,
  /harden), and the envelope schema for `.claude/audits/SEO.md`.

 Read `resources/depth-matrix.md` at the start of STEP 0 — it pre-answers
--- a/skills/seo/resources/depth-matrix.md
+++ b/skills/seo/resources/depth-matrix.md
@ -30,8 +30,8 @@ LOCAL caps at 20. FULL caps at 20. Never report above 20.

 | Finding type | Owner skill | If reported by /seo, what to do |
 |---|---|---|
-| HTML validity errors (W3C nu validator) | /validate | Drop from /seo report; note `"see /validate report for HTML validity"`. |
-| WCAG accessibility | /validate | Drop. |
+| HTML validity errors (W3C nu validator) | /web-validate | Drop from /seo report; note `"see /web-validate report for HTML validity"`. |
+| WCAG accessibility | /web-validate | Drop. |
 | Missing CSP / HSTS / 404 page / HTTP→HTTPS | /harden | Drop unless it directly affects indexability (then mention with cross-link). |
 | Wikidata / sameAs / Knowledge Panel | /seo (GEO) | Owned here. |
 | llms.txt | /seo (GEO) | Owned here. |
--- a/skills/web-validate/SKILL.md
+++ b/skills/web-validate/SKILL.md
@ -1,5 +1,5 @@
 ---
-name: validate
+name: web-validate
 description: |
  Use when a web project needs W3C HTML/CSS validity check or WCAG 2.1
  accessibility audit. Dispatches the validator-analyzer agent with a
@ -21,7 +21,7 @@ allowed-tools:
  - WebFetch
 ---

-# /validate — web standards audit (W3C + WCAG)
+# /web-validate — web standards audit (W3C + WCAG)

 This skill orchestrates a narrow-scope standards audit :

@ -46,20 +46,20 @@ Scope boundary :
  errors.

 If a finding appears in an out-of-scope area (e.g. missing meta
-description), the agent drops it silently — `/validate` stays focused.
+description), the agent drops it silently — `/web-validate` stays focused.

 ### Relation to other skills

 - `/onboard` runs an initial a11y audit at project setup (axe or
-  static checklist → `.onboard-audit/a11y.md`). `/validate` is the
+  static checklist → `.onboard-audit/a11y.md`). `/web-validate` is the
  **on-demand** equivalent, re-runnable anytime against the current
  codebase, and also covers HTML/CSS validity (which `/onboard` does
  not).
 - `/harden` audits security posture (headers, TLS, redirects).
-  `/validate` audits conformance. They share no findings.
+  `/web-validate` audits conformance. They share no findings.
 - `/seo` and `/geo` audit indexability. They may flag the same HTML
  features (alt attrs, heading structure) but from a ranking
-  perspective. `/validate` flags from a **standards** perspective
+  perspective. `/web-validate` flags from a **standards** perspective
  (WCAG SC number, W3C rule id). Findings may overlap — both reports
  are still valid.

@ -95,7 +95,7 @@ CSS_COUNT=$(find . -name "*.css" \

 If both counts are 0 and no URL provided → abort with :
 ```
-⚠️  No HTML or CSS files found and no URL provided. /validate needs
+⚠️  No HTML or CSS files found and no URL provided. /web-validate needs
   either local files or a live URL. Re-run with --full <url>.
 ```

@ -126,7 +126,7 @@ If framework is JS-based and `BUILD_DIR` is empty, warn :
 ⚠️  Framework detected : <name>. No build output found.
   HTML validity on JSX/TSX source is not meaningful.
   Options :
-     1. Run `npm run build` then re-run /validate
+     1. Run `npm run build` then re-run /web-validate
     2. Use --full <url> to audit production
     3. Continue with partial LOCAL audit (CSS + static WCAG only)
 ```
@ -177,7 +177,7 @@ Agent(
  subagent_type="validator-analyzer",
  description="validate — W3C HTML + CSS + WCAG audit",
  prompt="""
-  Dispatched from /validate. STRICT SCOPE — W3C HTML validity + W3C
+  Dispatched from /web-validate. STRICT SCOPE — W3C HTML validity + W3C
  CSS validity + WCAG 2.1 accessibility ONLY.

  CONTEXT:
@ -299,7 +299,7 @@ Fixes applied : <N>
 - [Moyenne][CSS] Removed invalid property `bakground` → `background` at line 23

 Verification :
- Re-run /validate → expected score bump <before> → <after>
+- Re-run /web-validate → expected score bump <before> → <after>
 - Tests to run : a11y regression (pa11y-ci), visual snapshot
 ```

@ -329,9 +329,9 @@ TOP 3 ACTIONS (by severity × user impact) :
  3. [Haute]    <title>

 NEXT STEPS :
-  • /validate <url> --fix         → apply recommended fixes
-  • /validate <url> --full        → re-run with live URL + remote APIs
-  • /validate --no-external       → skip third-party APIs (faster, LOCAL-like)
+  • /web-validate <url> --fix         → apply recommended fixes
+  • /web-validate <url> --full        → re-run with live URL + remote APIs
+  • /web-validate --no-external       → skip third-party APIs (faster, LOCAL-like)
  • /harden / /seo / /geo         → complementary audits (other scopes)

 Install for better LOCAL coverage :
--- a/skills/web-validate/test-prompts.json
+++ b/skills/web-validate/test-prompts.json
Author	SHA1	Message	Date
Bastien Chanot	ea992cbc62	chore(memory): EVAL-006 + LRN-046/047/048 + journal — prune-memory v1.1 TDD Capitalizes the prune-memory v1.1 TDD (skill `0a3e766`): - EVAL-006: 6 dangers TDD-closed, validated on the real learnings.md (0 fidelity false-positive vs 13); honest limits (SAFE != USEFUL, RED-7/8 open). - LRN-046 deterministic oracle > semantic judge on destructive skills. - LRN-047 a noisy safety guard (13/13 FP) is a risk, not discomfort. - LRN-048 a "0/OK/pass" must prove it looked (counted both sides). - journal: one line under 2026-06-25. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd	2026-06-25 23:17:10 +02:00
Bastien Chanot	8413afb785	fix(memory): backfill missing EVAL-005 index row EVAL-005 entry existed in evals.md with no matching ## Index row (pre-existing index drift — the category-D case the /prune-memory skill auto-corrects). Backfilled by hand; unrelated to the prune-memory TDD work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd	2026-06-25 23:02:51 +02:00
Bastien Chanot	0a3e76611d	fix(skill): prune-memory v1.1 — deterministic guards close 6 TDD'd defects Only destructive skill, previously untested. A RED suite (tests/) proved 6 dangers; each closed by a deterministic guard: - RED-1 removed false "Fixed in v1.1 (TDD found it)" verify claim - RED-2 STEP 0 dirty-tree is now a real exit 1 (was a prose-only STOP) - RED-3 STEP 3.4 negation-sentence verbatim guard (no silent inversion) - RED-4 STEP 1-A collapse safety-critical exception (NEVER/ALWAYS/PERMANENT) - RED-5 STEP 4 fidelity census (count-based, per-entry x per-category) - RED-6 STEP 4 trailing-space false-ORPHAN fix Tests: run-deterministic.sh (all-green), run-behavioral.md, fixtures, BACKLOG (RED-7/RED-8 open). Validated on the real learnings.md: 0 fidelity false-positive vs 13, scope held, registry reverted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd	2026-06-25 22:56:10 +02:00
Bastien Chanot	9a58734286	chore(memory): journal 2026-06-25 (validate → web-validate rename) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd	2026-06-25 11:38:05 +02:00
Bastien Chanot	a1cc753746	chore(tasks): annotate /validate → /web-validate in open --help pilot item Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd	2026-06-25 11:27:18 +02:00
Bastien Chanot	dbab542409	chore(memory): BDR-032 + LRN-045 (validate → web-validate rename) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd	2026-06-25 11:27:18 +02:00
Bastien Chanot	e5e673ac1f	refactor(skill): rename validate → web-validate Clearer scoped name for the W3C + WCAG skill. Updated: folder (git mv), frontmatter name, H1 title, command refs, CLAUDE.md routing, 6 profiles (functional — activate the skill by folder name), cross-refs in harden/seo/depth-matrix/client-handover, agent dispatch refs, README + USAGE tables. Confidentiality: the client-deliverable leak-guard regex (client-handover-writer.md) now matches BOTH /web-validate and legacy /validate, so older client docs stay covered. Left intentionally: validator-analyzer agent name (lockstep with subagent_type + registry), .validate-cache/ + VALIDATE.md (audit-file family {SEO,GEO,HARDEN,CSO,VALIDATE}.md), .claude/ history (append-only), CHANGELOG old entry (added a new "renamed" entry instead). NL trigger keywords kept so "validate" still routes here. Third-party html-validate untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd	2026-06-25 01:29:36 +02:00