Compare commits

...

7 Commits

Author SHA1 Message Date
Bastien Chanot
ea992cbc62 chore(memory): EVAL-006 + LRN-046/047/048 + journal — prune-memory v1.1 TDD
Capitalizes the prune-memory v1.1 TDD (skill 0a3e766):
- EVAL-006: 6 dangers TDD-closed, validated on the real learnings.md
  (0 fidelity false-positive vs 13); honest limits (SAFE != USEFUL, RED-7/8 open).
- LRN-046 deterministic oracle > semantic judge on destructive skills.
- LRN-047 a noisy safety guard (13/13 FP) is a risk, not discomfort.
- LRN-048 a "0/OK/pass" must prove it looked (counted both sides).
- journal: one line under 2026-06-25.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
2026-06-25 23:17:10 +02:00
Bastien Chanot
8413afb785 fix(memory): backfill missing EVAL-005 index row
EVAL-005 entry existed in evals.md with no matching ## Index row
(pre-existing index drift — the category-D case the /prune-memory skill
auto-corrects). Backfilled by hand; unrelated to the prune-memory TDD work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
2026-06-25 23:02:51 +02:00
Bastien Chanot
0a3e76611d fix(skill): prune-memory v1.1 — deterministic guards close 6 TDD'd defects
Only destructive skill, previously untested. A RED suite (tests/) proved 6
dangers; each closed by a deterministic guard:
- RED-1 removed false "Fixed in v1.1 (TDD found it)" verify claim
- RED-2 STEP 0 dirty-tree is now a real exit 1 (was a prose-only STOP)
- RED-3 STEP 3.4 negation-sentence verbatim guard (no silent inversion)
- RED-4 STEP 1-A collapse safety-critical exception (NEVER/ALWAYS/PERMANENT)
- RED-5 STEP 4 fidelity census (count-based, per-entry x per-category)
- RED-6 STEP 4 trailing-space false-ORPHAN fix
Tests: run-deterministic.sh (all-green), run-behavioral.md, fixtures, BACKLOG
(RED-7/RED-8 open). Validated on the real learnings.md: 0 fidelity
false-positive vs 13, scope held, registry reverted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
2026-06-25 22:56:10 +02:00
Bastien Chanot
9a58734286 chore(memory): journal 2026-06-25 (validate → web-validate rename)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
2026-06-25 11:38:05 +02:00
Bastien Chanot
a1cc753746 chore(tasks): annotate /validate → /web-validate in open --help pilot item
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
2026-06-25 11:27:18 +02:00
Bastien Chanot
dbab542409 chore(memory): BDR-032 + LRN-045 (validate → web-validate rename)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
2026-06-25 11:27:18 +02:00
Bastien Chanot
e5e673ac1f refactor(skill): rename validate → web-validate
Clearer scoped name for the W3C + WCAG skill. Updated: folder (git mv),
frontmatter name, H1 title, command refs, CLAUDE.md routing, 6 profiles
(functional — activate the skill by folder name), cross-refs in
harden/seo/depth-matrix/client-handover, agent dispatch refs, README +
USAGE tables.

Confidentiality: the client-deliverable leak-guard regex
(client-handover-writer.md) now matches BOTH /web-validate and legacy
/validate, so older client docs stay covered.

Left intentionally: validator-analyzer agent name (lockstep with
subagent_type + registry), .validate-cache/ + VALIDATE.md (audit-file
family {SEO,GEO,HARDEN,CSO,VALIDATE}.md), .claude/ history (append-only),
CHANGELOG old entry (added a new "renamed" entry instead). NL trigger
keywords kept so "validate" still routes here. Third-party html-validate
untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
2026-06-25 01:29:36 +02:00
30 changed files with 476 additions and 58 deletions

View File

@ -510,3 +510,19 @@ rules:
- **Caveats**: future GLOBAL memory must stay tiny — with conditional loading broken, anything global loads in EVERY project unconditionally; fold that constraint into the global-memory design once a backup exists. Caveman pass to ~250 was explicitly DECLINED: marginal ~25-line gain vs real risk (changes the nature of instructions-to-follow; no evidence caveman is followed better than prose; CLAUDE.md is the most-edited file → caveman = painful to re-read/amend). 275 readable > 250 caveman.
- **Alternatives rejected**: path-scoped `~/.claude/rules/` (broken, [[BLK-009]]); externalize sections to on-demand-loaded files (same conditional-load dependency); caveman to ~250 (readability + instruction-fidelity risk).
- **Reference**: `~/.claude/CLAUDE.md` (symlink → `~/Documents/claude/CLAUDE.md`), commits ba743cf (compress routing+design+graphify) + 990318c (trim separators/blanks). Linked to [[BLK-009]], [[LRN-043]], [[LRN-044]].
---
## BDR-032 — skill `/validate``/web-validate` (rename user surface, keep internals)
- **Date**: 2026-06-25
- **Status**: accepted (shipped `e5e673a`)
- **Decision**: rename W3C+WCAG skill `/validate``/web-validate` (clearer scoped name, less generic). Renamed the USER-FACING surface ONLY: folder (`git mv`), frontmatter `name`, H1, command refs, CLAUDE.md routing line, 6 `lib/profiles/*.profile` entries (FUNCTIONAL — profiles activate skills by folder name, a miss = broken activation), cross-refs (harden/seo/depth-matrix/client-handover), agent dispatch refs, README + USAGE tables. Leak-guard regex extended to `web-validate|validate` ([[LRN-045]]).
- **Why — 4 deliberate KEEPs**:
- agent `validator-analyzer` name KEPT — internal, lockstep with `subagent_type=` + harness registry; rename = wider blast radius, zero discoverability gain.
- `.validate-cache/` + `VALIDATE.md` KEPT — names derive from the AUDIT TYPE, family `{SEO,GEO,HARDEN,CSO,VALIDATE}.md`; renaming makes VALIDATE the odd one out + orphans already-generated reports (`MIGRATION.md` cleanup loop hardcodes the name). Same logic kept the dispatch label `description="validate — ..."`.
- `.claude/` history KEPT (memory + completed TODO block) — append-only, true at the time. The forward-pointing OPEN TODO item was ANNOTATED additively (`désormais /web-validate`), not rewritten — append-only protects history, not pointers to future actions.
- CHANGELOG old entry KEPT, new "renamed" entry ADDED (Keep-a-Changelog: don't rewrite the past).
- NL trigger keywords ("validate"/"validation") KEPT in the description so "validate my site" still routes here.
- **Alternatives rejected**: rename agent + artifacts too (cosmetic symmetry, ~45 extra edits, breaks audit-file family + report back-compat); blind `sed s/validate/web-validate/` (breaks third-party `html-validate`, `validator.nu`, English-verb prose — discrimination must be at the `/validate` token, proven by `.validate-cache/html-validate.json` staying intact).
- **Reference**: commit `e5e673a` (18 files). Verified complete: `/validate` = 0 in active code (only `.claude/` history + CHANGELOG), `html-validate` = 15 intact, regex `client-handover-writer.md:1462` shows both names. Linked to [[LRN-045]], [[BDR-031]] (CLAUDE.md routing), [[LRN-043]] (validate routing line).

View File

@ -25,6 +25,8 @@ rules:
| EVAL-002 | 2026-06-02 | `profile gstack on/off` verb implementation | keep |
| EVAL-003 | 2026-06-11 | darwin optimization run on `audit-delta` skill | keep |
| EVAL-004 | 2026-06-11 | darwin eval 26 perso skills + 4-bug fix round | keep |
| EVAL-005 | 2026-06-23 | Obsolete `claude --effort max` alias missed across Step 9 edits | correct |
| EVAL-006 | 2026-06-25 | prune-memory v1.1 TDD — 6 guards (0a3e766), validated on real data | keep |
---
@ -74,3 +76,13 @@ rules:
- **Method / why missed**: I treated the pre-existing `CLAUDE_LINES` as established and only touched the lines I was adding/removing. Spotting the redundancy needs cross-referencing TWO config layers (shell alias vs settings.json) — a semantic check I never ran. Masked further: the user's `.bashrc` is hand-managed and the alias line wasn't even present, so it looked inert.
- **Anomaly**: not just dead config — a CLI flag (`--effort max`) silently OVERRIDES the settings.json value (`xhigh`). Real correctness bug.
- **Action**: when editing installer shell-config, audit EACH existing line against the current settings.json / CLAUDE.md source of truth, not only the lines being changed. Removed the alias + added cleanup. General rule: reconcile config to ONE source of truth across env/alias/settings layers.
---
## EVAL-006 — prune-memory v1.1: 6 dangers TDD-closed, validated on real data
- **Date**: 2026-06-25
- **Output**: prune-memory (only DESTRUCTIVE skill, never tested before) TDD'd end-to-end. 6 dangers, each proven by a failing RED then closed by a DETERMINISTIC guard: RED-1 false "verified" claim removed; RED-2 dirty tree → real `exit 1` (was prose-only STOP); RED-3 negation-sentence verbatim guard (no silent inversion); RED-4 collapse safety-critical exception (NEVER/ALWAYS/PERMANENT entry INTOUCHABLE); RED-5 STEP-4 fidelity census (count-based, per-entry × per-category); RED-6 trailing-space false-ORPHAN fix. Guards committed in **0a3e766**; tests: `tests/run-deterministic.sh` + `run-behavioral.md` + fixtures + BACKLOG.
- **Method**: run-deterministic all-green (RED-1/2/5/6); RED-3/4 by single faithful subagent runs on throwaway fixtures (deterministic oracles — byte-identical + layer-(a) substring; N=6 tolerance-zero fleet documented in run-behavioral.md, NOT exhausted — 1 run sufficed to prove presence then closure); REAL-data re-test on live learnings.md (602 l): fidelity 0 false-positive vs old line-grep 13, census PROVEN counting both sides (HEAD/WORK not=107/114, no=64/71, never=34/34 — no category dropped). Scope held (only learnings.md), reverted after (measurement, not a kept prune). Clean tree verified first; git + cp backup.
- **Anomalies**: (1) SAFE ≠ USEFUL — compression (pass C) marginal on already-caveman dense content (~3.6% trim, registry GREW); real value = index-drift (D found 19 missing Index rows on the real learnings.md, measured then reverted) + merge (B), not C on dense. (2) RED-8 OPEN — fidelity proves "no negation DELETED", not "none ADDED wrongly"; visible empirically (not/no +7 in the measurement, passed under the drop-radar); remote (compression subtracts) but real. (3) RED-7 OPEN — merge LRN-014+016 maybe PRIMED by the SKILL.md example, not real overlap; verify. (4) two guard bugs caught by VALIDATION not logic: awk `\<` unsupported (mawk) → not/no counted 0; `NR==FNR` blind when working census empty = the deletion case → both fixed + re-validated. (5) REAL ANCHOR FOR PASS D — this very evals.md had EVAL-005's Index row MISSING (pre-existing drift), exactly what the skill's D pass auto-corrects; hand-backfilled in a separate `fix(memory)` commit. learnings.md likewise carries 19 missing Index rows (deferred to an intentional prune). D is NOT theoretical.
- **Action**: keep — safe, useful for B/D, compression marginal on dense (documented limit). `Fixed in v1.1 (TDD found it)` WAS the RED-1 defect (claim of a verify never run); TDD note now TRUE (real suite passes). Patterns → LRN-046/047/048; open items → tests/BACKLOG.md (RED-7/RED-8).

View File

@ -197,3 +197,5 @@ rules:
- Compressed global CLAUDE.md 317→275 (42, loaded every session): routing (cut name-obvious lines, keep non-derivable signal + dense catch-all; restored `validate`/`plan-eng-review`, feat/hotfix pointer), design (+ explicit FILE signal), graphify, then decorative `---` + blank rognage. Caveman→250 declined (readability + instruction-fidelity). Commits ba743cf, 990318c. BDR-031, LRN-043.
- Edit/Write refuse write-through-symlink → resolve real path (`readlink -f`); `~/.claude/CLAUDE.md` → repo. LRN-044.
- Inspected dirty gstack submodule (parent showed `m`): `package.json`+`bun.lock` = the Playwright 1.58.2→1.61 bump (BDR-029/BLK-008), NOT restore noise → left intact, NOT cleaned, NOT committed (submodule ref stays at clean 070722ace; local patch re-applied by installer by design).
- Renamed skill `/validate``/web-validate` (user-surface only): git mv + name + H1 + CLAUDE.md routing + 6 profiles (functional) + cross-refs + agent dispatch + README/USAGE. KEPT: `validator-analyzer` name (lockstep), `.validate-cache`/`VALIDATE.md` (audit-file family), `.claude/` history (append-only), NL triggers. Critical catch: client-deliverable leak-guard regex (`client-handover-writer.md:1462`) matched `/validate` by exact token — `web-` prefix broke the anchored match → extended to `web-validate|validate` (covers legacy docs). Verified complete: `/validate` 0 in active code, `html-validate` 15 intact, regex shows both. Commits e5e673a + dbab542 (BDR-032/LRN-045) + a1cc753 (TODO L167 annotated additively). gstack submodule untouched.
- TDD'd `/prune-memory` (only destructive skill, untested + carried a false `Fixed in v1.1 (TDD found it)` claim): 6 dangers (RED-1..6) closed by deterministic guards, skill `0a3e766`. Real-data run on learnings.md exposed SAFE≠USEFUL (compression marginal on dense; value = index/merge, not C) + a 13/13-false-positive line-grep fidelity guard → replaced by a per-entry count census (0 FP, proven counting both sides). RED-7 (example-priming) + RED-8 (added-negation) filed in BACKLOG. EVAL-006, LRN-046/047/048.

View File

@ -46,6 +46,9 @@ rules:
| LRN-027 | 2026-06-11 | Agents improvise audit boundaries from file dates when no machine state — periodic skills need machine-readable state file, never inference | any recurring/periodic skill needing "since last run" semantics |
| LRN-030 | 2026-06-18 | Opus 4.8 under-delegates subagents/memory/custom-tools by default — counter via explicit CLAUDE.md fan-out rule | any Opus 4.8 session; tuning delegation; inline-vs-subagent decision |
| LRN-031 | 2026-06-19 | Skill value = gate + anti-noise + determinism, not re-coding what a capable agent does free | building/reviewing any skill; writing-skills TDD fixture design |
| LRN-046 | 2026-06-25 | Destructive skill: deterministic oracle (byte-identical / count census) > semantic judge | any destructive/irreversible skill; behavioral-oracle TDD |
| LRN-047 | 2026-06-25 | A noisy safety guard (13/13 FP) = a guard you learn to ignore = risk → refine, don't tolerate | any guard/alert/lint that can false-positive |
| LRN-048 | 2026-06-25 | A "0/OK/pass" must prove it LOOKED (counted both sides), else verify hard-wired to pass | any verify/test/lint reporting success |
---
@ -590,3 +593,37 @@ rules:
- **Pattern**: many of this user's `~/.claude/*` config files are symlinks INTO the claude-config repo (`~/Documents/claude/`). Edit/Write block writes through a symlink (safety against clobbering link targets); Read does not — so Read-through-link succeeds then Edit-through-link fails on the same path.
- **Future application**: before editing any `~/.claude/...` config file, resolve it first (`readlink -f <path>`, or `ls -la` to spot the arrow). Then Read AND Edit the RESOLVED real path so the harness's read-tracking matches what you write — and `git` status/diff/commit land naturally in the repo that owns the file.
- **Reference**: hit while editing `~/.claude/CLAUDE.md``~/Documents/claude/CLAUDE.md`. Linked to [[BDR-031]].
---
## LRN-045 — Renaming a command: audit exact-name leak-guard / forbidden-token regexes
- **Date**: 2026-06-25
- **Context**: rename `/validate``/web-validate`. A client-deliverable leak-guard in `agents/client-handover-writer.md:1462` greps generated docs for internal tool names via `grep -niE '/(seo|harden|validate|cso|...)\b'`. The `web-` prefix means `/web-validate` no longer matches the `/validate` branch (the `/` must sit immediately before `validate`; post-rename a `-` sits there) → renamed command leaks SILENTLY into client-facing output. No error — the gate just stops catching it.
- **Pattern**: any rename of a command/skill/identifier must sweep regexes/allowlists/denylists that match the OLD name by exact token — leak guards, forbidden-token gates, routing dispatchers, CI greps. A prefix/suffix rename breaks anchored matches (`/oldname\b`) with zero error. Fix = alternation covering BOTH names (`web-validate|validate`), NOT replacement — old artifacts (already-shipped client docs, logs) still carry the legacy name and must stay caught.
- **Future application**: when renaming, grep the BARE old token inside regex/test/gate files, not just `/oldname` command refs. A blind `replace_all '/old' '/new'` MISSES these because the guard stores the name inside an alternation (`|old|`), not as `/old`. For each guard found, extend to `new|old`; verify the gate line shows both names.
- **Reference**: `agents/client-handover-writer.md:1462`, rename commit `e5e673a`. Linked to [[BDR-032]].
## LRN-046 — Destructive skill: deterministic oracle > semantic judge
- **Date**: 2026-06-25
- **Pattern**: On a DESTRUCTIVE skill the binding oracle must be DETERMINISTIC (byte-identical, or count-based census per-entry × per-category), not a semantic judge. A judge false-greens twice: (a) PRESERVED-but-MUTATED content — RED-4, a "meaning preserved" collapse still rewrote a permanent safety rule; byte-identical caught it, the judge would not; (b) a 0-result that happened by CHANCE — "no negation inverted" ≠ protected, it was the dice not a guard. If the oracle must be behavioral/LLM, pair it with a deterministic check that is the gate.
- **Context**: prune-memory v1.1 TDD (EVAL-006, skill `0a3e766`). RED-4 collapse + RED-3 compression.
- **Future application**: any destructive/irreversible skill or safety check; any TDD whose natural oracle is an LLM judge — make the binding check deterministic, keep the judge as a secondary net.
- **Reference**: skill `0a3e766`, `tests/run-behavioral.md`. Linked to [[EVAL-006]].
## LRN-047 — A noisy safety guard is a risk, not discomfort
- **Date**: 2026-06-25
- **Pattern**: A safety guard that cries wolf (13/13 false positives on real data) is a guard you learn to IGNORE → the day of the true positive you skip it by habit. On a destructive op a noisy guard = security RISK, not annoyance → REFINE it (here line-grep → count-based census), don't tolerate. Measure the false-positive rate on REAL data, not fixtures — all-green fixtures hid the 13.
- **Context**: prune-memory v1.1 (EVAL-006). The RED-5 line-grep fidelity guard fired 13/13 false positives on the live learnings.md (line-sharing) → replaced by a per-entry census (0 FP, proven).
- **Future application**: any guard/alert/lint/test that can false-positive — measure FP on real data before shipping; a guard habitually ignored is worse than none.
- **Reference**: skill `0a3e766`, `tests/run-deterministic.sh` (RED-5). Linked to [[EVAL-006]].
## LRN-048 — A "0 / OK / pass" must prove it LOOKED
- **Date**: 2026-06-25
- **Pattern**: A passing result ("0 errors", "OK", "clean") must PROVE it inspected — show the work counted something on both sides (census non-zero on HEAD and WORK). Else it is a verify hard-wired to pass = the original prune-memory v1 lie (`basename | cut -c1-3` never matched any heading → verify always printed blank-OK). A 0 by inaction is indistinguishable from a 0 by correctness; force the success path to surface its coverage.
- **Context**: prune-memory v1.1 (EVAL-006). v1 STEP-4 verify always reported OK (wrong prefix → 0 markers → blank). The fix's 0-false-positive is only trustworthy because the census was shown counting both sides.
- **Future application**: any verify/test/lint reporting success — design the pass to surface what it examined (counts / files / lines) so a vacuous pass is visible, not silent.
- **Reference**: skill `0a3e766`, EVAL-006 (verify-proof anomaly). Linked to [[EVAL-006]].

View File

@ -164,7 +164,7 @@ Subtasks :
- [ ] Créer `skills/lib/help-handler.md` — snippet réutilisable (détection + extraction + affichage)
- [ ] Définir format d'aide standard + section "ARGUMENTS" vs reuse de argument-hint
- [ ] Décider : sections ARGUMENTS/EXAMPLES doivent-elles être dans la frontmatter (nouveau champ YAML) ou dans le corps du SKILL.md (nouvelle section `## Help`) ?
- [ ] Patcher un skill pilote (`/validate`) — valider UX
- [ ] Patcher un skill pilote (`/validate`) — valider UX _(désormais `/web-validate` — renommé e5e673a)_
- [ ] Patcher les skills perso restants : analyze, bugfix, code-clean, commit-change, doc, feat, geo, graphify, harden, hotfix, init-project, make-pdf, onboard, plan-tune, plugin-check, refactor, seo, ship-feature, skills-perso, status, benchmark-models, context-save, context-restore
- [ ] Mettre à jour `~/.claude/CLAUDE.md` — mentionner convention --help disponible sur tous les skills perso
- [ ] Note : skills-external/gstack ont leur propre convention, ne pas toucher

View File

@ -23,6 +23,7 @@ Format follows [Keep a Changelog](https://keepachangelog.com/).
- `.claude/{tasks,memory,audits}/` governance layout + 5 memory registries (decisions, learnings, blockers, journal, evals)
### Changed
- `/validate` renamed to `/web-validate` — clearer scoped name (W3C + WCAG); routing, skill profiles, cross-references, and the client-deliverable leak-guard updated (the guard still matches legacy `/validate` so older client docs stay covered)
- `/seo` split into parallel `seo` + `geo` agents with shared resources
- `/onboard` rewritten: archetype-aware pipeline (orchestrator + config-only agent), security audit archetype-aware
- `doc-syncer`: stack-aware audit + deploy-doc gating; later scoped to public docs only, `.claude/` read-only; sync-only ROADMAP handling — planned→shipped reconciliation from code/git, never from `.claude/`; numeric incoherence → HUMAN question

View File

@ -235,7 +235,7 @@ only the non-obvious cases: gstack fallbacks, disambiguation, cryptic names.
- Architecture review → plan-eng-review
- Before /clear or /compact → capitalize; end-of-session ritual → close
- SEO+GEO → seo (GEO only → geo)
- W3C + WCAG a11y (HTML/CSS validity, axe, pa11y) → validate
- W3C + WCAG a11y (HTML/CSS validity, axe, pa11y) → web-validate
- Security audit (secrets, CVE, OWASP) → cso
- New project → init-project; onboard existing repo → onboard

View File

@ -112,7 +112,7 @@ Versions are pinned in `plugins.lock.json`. To update: edit the file, then re-ru
| `/pdf-translate` | Translate a PDF to another language, output as HTML (via Vision) |
| `/close` | End-of-session ritual — alias for `/capitalize --ritual` (dedup + TODO reconcile + 3-question reflection) |
| `/harden` | Web hardening audit — HTTPS/TLS, HSTS, CSP, security headers |
| `/validate` | W3C HTML/CSS validity + WCAG 2.1 accessibility audit |
| `/web-validate` | W3C HTML/CSS validity + WCAG 2.1 accessibility audit |
| `/geo` | GEO-only audit — AI-search visibility (ChatGPT, Perplexity, Claude, Gemini…) |
| `/client-handover` | Final project delivery — audits + branded deliverable (Markdown / HTML / PDF) |
| `/profile` | Activate a skill profile (design / dev / qa / audit / minimal) |

View File

@ -111,7 +111,7 @@ Tu veux...
| Curer la mémoire | `/prune-memory` |
| Fin de session (= /capitalize --ritual) | `/close` |
| Audit web (TLS, CSP, headers) | `/harden` |
| Validité HTML/CSS + a11y | `/validate` |
| Validité HTML/CSS + a11y | `/web-validate` |
| Visibilité IA (GEO seul) | `/geo` |
| Livraison client finale | `/client-handover` |
| Changer profil skills | `/profile` |
@ -147,7 +147,7 @@ Tu veux...
| `/prune-memory` | Registres trop longs / bruyants | Curation : merge, superseded, compression |
| `/close` | Fin de session | Alias de /capitalize --ritual — dedup + TODO + réflexion 3 questions |
| `/harden` | Audit sécurité web (SSL, CSP, HSTS) | Projet web avec config HTTP |
| `/validate` | Audit W3C + WCAG a11y | Avant livraison projet web |
| `/web-validate` | Audit W3C + WCAG a11y | Avant livraison projet web |
| `/client-handover` | Livraison client | Audits finaux + livrable brandé |
| `/profile` | Changer le profil de skills | design / dev / qa / audit / minimal |

View File

@ -211,7 +211,7 @@ test -f Procfile && DEPLOY_HINTS+=("Heroku: Procfile")
test -f wrangler.toml && DEPLOY_HINTS+=("Cloudflare Workers: wrangler.toml")
```
Also detect deployed URL (used to point /validate at the live site, STEP 7):
Also detect deployed URL (used to point /web-validate at the live site, STEP 7):
```bash
DEPLOYED_URL=""
@ -255,7 +255,7 @@ is_fresh() {
For non-web projects (`PROJECT_TYPE` ∈ {cli, library, mobile, other}), the
pipeline is reduced: only run /cso (single audit, single fix loop), skip
STEP 6 deploy pause and STEP 7 /validate. Treat /cso as the only score for
STEP 6 deploy pause and STEP 7 /web-validate. Treat /cso as the only score for
the gate.
For web projects, dispatch in **a single message with two parallel Agent calls**:
@ -560,24 +560,24 @@ Tailor to project deploy method (use DEPLOY_HINTS):
```
AskUserQuestion:
"Pipeline paused for deploy. Above is what's changed and how to deploy.
Confirm when the live site reflects the new changes (or skip /validate)."
Confirm when the live site reflects the new changes (or skip /web-validate)."
Header: "Deploy status"
Options:
- A) Deployed — proceed with /validate
- A) Deployed — proceed with /web-validate
- B) Not yet — I'll come back (this stops the pipeline; re-run /client-handover later)
- C) Skip /validate — proceed to handover doc with VALIDATE marked SKIPPED
- C) Skip /web-validate — proceed to handover doc with VALIDATE marked SKIPPED
```
If A → proceed to STEP 7. If B → exit cleanly with state report. If C →
mark `VALIDATE_SKIPPED=true` and jump to STEP 8.
If `DEPLOYED_URL` is still `[À CONFIRMER]` after option A: AskUserQuestion
"Quelle est l'URL du site déployé pour /validate ?" — capture URL.
"Quelle est l'URL du site déployé pour /web-validate ?" — capture URL.
---
## STEP 7 — RUN /validate (live site)
## STEP 7 — RUN /web-validate (live site)
Skip if `VALIDATE_SKIPPED=true` or `PROJECT_TYPE != web` (in either case
ensure `VALIDATE_SKIPPED=true` is set so the gate logic in STEP 8 treats
@ -585,7 +585,7 @@ VALIDATE as not-applicable rather than failed).
Dispatch `general-purpose` subagent:
> Read `~/.claude/skills/validate/SKILL.md` and execute against the
> Read `~/.claude/skills/web-validate/SKILL.md` and execute against the
> deployed URL: `<DEPLOYED_URL>`. Audit W3C HTML validity (validator.nu),
> W3C CSS validity (jigsaw.w3.org), WCAG 2.1 a11y (axe-core, pa11y).
> Apply autonomous fixes ONLY in source code (the client controls deploy);
@ -601,8 +601,8 @@ SCORE_VALIDATE_AFTER=$(extract_score .claude/audits/VALIDATE.md)
Note: VALIDATE has no `_BEFORE` (first run is post-deploy). The before/after
table for VALIDATE shows `—` for before, `<score>` for after.
If /validate produced new fixes in source code, run STEP 5 again (mini-commit
+ push) BEFORE moving to STEP 8 — but DO NOT loop /validate. The remaining
If /web-validate produced new fixes in source code, run STEP 5 again (mini-commit
+ push) BEFORE moving to STEP 8 — but DO NOT loop /web-validate. The remaining
deploy of those fixes is mentioned to the user in the final doc.
---
@ -931,7 +931,7 @@ concrete, no jargon. One short paragraph per idea.
1. **Never name internal tools or skill identifiers in chapters 13.**
Forbidden tokens (do not appear, in any case, in the lay portion):
`/seo`, `/harden`, `/validate`, `/cso`, `/feat`, `/bugfix`,
`/seo`, `/harden`, `/web-validate`, `/cso`, `/feat`, `/bugfix`,
`/ship-feature`, `/ship`, `/code-clean`, `/refactor`, `seo-analyzer`,
`geo-analyzer`, `validator-analyzer`, `harden`-as-product-name,
`SEO.md`, `HARDEN.md`, `VALIDATE.md`, `CSO.md`, `MAX_ITERATIONS`,
@ -1003,7 +1003,7 @@ external validators when relevant (Mozilla Observatory, SSL Labs,
SecurityHeaders.com — these are recognized seals).
DO NOT mention internal tool/skill names here (no /seo, /harden,
/validate, seo-analyzer, etc.). The lecture rapide IS where
/web-validate, seo-analyzer, etc.). The lecture rapide IS where
client-facing axis names live.]
## 3. Ce qui a été fait
@ -1459,7 +1459,7 @@ Chapter 6 (Détails techniques) may use them in the optional glossary.
```bash
awk '/^## 1\./{flag=1} /^## 6\./{flag=0} flag' "$OUTPUT" \
| grep -niE '/(seo|harden|validate|cso|feat|bugfix|ship-feature|ship|code-clean|refactor)\b|seo-analyzer|geo-analyzer|validator-analyzer|SEO\.md|HARDEN\.md|VALIDATE\.md|CSO\.md|MAX_ITERATIONS|ALL_PASS|SCORE_[A-Z_]+'
| grep -niE '/(seo|harden|web-validate|validate|cso|feat|bugfix|ship-feature|ship|code-clean|refactor)\b|seo-analyzer|geo-analyzer|validator-analyzer|SEO\.md|HARDEN\.md|VALIDATE\.md|CSO\.md|MAX_ITERATIONS|ALL_PASS|SCORE_[A-Z_]+'
# expected: no matches. Each match is a leak — rewrite the offending
# chapter in client language before STEP 16.
```

View File

@ -1,6 +1,6 @@
---
name: validator-analyzer
description: Web standards audit agent — W3C HTML validity (validator.nu), W3C CSS validity (jigsaw.w3.org), WCAG 2.1 accessibility (axe-core, pa11y, WAVE). Dispatched from /validate. Produces scored .claude/audits/VALIDATE.md report with concrete diffs for auto-fixable issues and user actions for judgment-required fixes. Complementary to /harden (security), /seo (indexability), /geo (AI extraction).
description: Web standards audit agent — W3C HTML validity (validator.nu), W3C CSS validity (jigsaw.w3.org), WCAG 2.1 accessibility (axe-core, pa11y, WAVE). Dispatched from /web-validate. Produces scored .claude/audits/VALIDATE.md report with concrete diffs for auto-fixable issues and user actions for judgment-required fixes. Complementary to /harden (security), /seo (indexability), /geo (AI extraction).
tools: Read, Edit, Write, Bash, Grep, Glob, WebFetch
---
@ -24,7 +24,7 @@ $ARGUMENTS
## STEP 0 — Parse context
If dispatched from `/validate`, context is in `$ARGUMENTS`. Extract:
If dispatched from `/web-validate`, context is in `$ARGUMENTS`. Extract:
- `TARGET_URL` — production URL (FULL) or "none" (LOCAL)
- `DEPTH` — LOCAL | FULL
@ -45,7 +45,7 @@ Standalone invocation (no dispatcher): ask ONCE as a bundled block:
```bash
mkdir -p .validate-cache
grep -q '^\.validate-cache/' .gitignore 2>/dev/null || \
printf '\n# /validate cache\n.validate-cache/\n' >> .gitignore
printf '\n# /web-validate cache\n.validate-cache/\n' >> .gitignore
```
### Framework detection for SPA built-output targeting
@ -450,10 +450,10 @@ Each entry : file:line + WCAG SC reference + suggested approach.
- What the tool chain could not verify (e.g. dynamic content loaded
via JS in LOCAL mode, color contrast on images, screen reader flow)
- Reason + suggested follow-up (manual test with NVDA/VoiceOver,
run /validate --full post-deploy, etc.)
run /web-validate --full post-deploy, etc.)
## 8. Changes applied (appended by dispatcher after fix confirmation)
<Empty until /validate --fix completes STEP 3>
<Empty until /web-validate --fix completes STEP 3>
```
Max 600 lines. Cite file:line or tool output for every finding.
@ -495,7 +495,7 @@ At end of §5, emit verbatim :
READY TO APPLY — awaiting dispatcher confirmation
```
**Do NOT apply any Edit/Write.** Dispatcher handles STEP 3 of `/validate`.
**Do NOT apply any Edit/Write.** Dispatcher handles STEP 3 of `/web-validate`.
---

View File

@ -10,7 +10,7 @@ analyze personal
# SEO / GEO / web standards
seo personal
geo personal
validate personal
web-validate personal
# Code + perf health
health

View File

@ -46,7 +46,7 @@ codex
# === SEO / GEO / standards / security ================================
seo personal
geo personal
validate personal
web-validate personal
harden personal
analyze personal
cso

View File

@ -6,6 +6,6 @@ qa-only
browse
benchmark
canary
validate personal
web-validate personal
open-gstack-browser
setup-browser-cookies

View File

@ -8,7 +8,7 @@ seo personal
geo personal
# W3C HTML/CSS validity + WCAG a11y
validate personal
web-validate personal
# Web hardening (HSTS, CSP, redirects — affects ranking signals)
harden personal

View File

@ -34,7 +34,7 @@ refactor personal
# === SEO / GEO / standards ===========================================
seo personal
geo personal
validate personal
web-validate personal
harden personal
analyze personal

View File

@ -30,7 +30,7 @@ commit-change personal
refactor personal
# Validation companion (basic W3C/a11y check during build)
validate personal
web-validate personal
# External: design skills
emil-design-eng external

View File

@ -5,7 +5,7 @@ description: |
final audits, deploy validation against live site, and a branded
deliverable (Markdown + HTML + PDF). Multi-agent orchestrator: dispatches
client-handover-writer which spawns parallel /seo + /harden subagents,
then /validate, then writes the deliverable.
then /web-validate, then writes the deliverable.
Triggers: "client handover", "compte rendu client", "livraison client",
"rapport client", "deliverable", "summary for client", "handover doc",
"livrable", "ship and handover", "finaliser et livrer".
@ -39,11 +39,11 @@ The agent runs a **ship-and-handover pipeline** with explicit gates:
- If still < 17/20 after cap escalate to user with concrete remaining issues; user decides continue / stop / manual intervention.
4. **COMMIT + PUSH** — If files changed during fix loops, run /commit-change (atomic logical commits) then `git push`.
5. **DEPLOY PAUSE** — List exact deploy artifacts: changed files since baseline, deploy hints from project (vercel.json, netlify.toml, Dockerfile, .github/workflows/deploy.yml, etc.), and the deploy process in plain words. Use AskUserQuestion: "Deploy done? (Yes / Not yet / Skip validate)". Block until Yes or Skip.
6. **/validate (live site)** — Run validator-analyzer against the deployed URL. Capture `SCORE_VALIDATE`.
6. **/web-validate (live site)** — Run validator-analyzer against the deployed URL. Capture `SCORE_VALIDATE`.
7. **GATE — per-axis threshold ≥17/20** — Compute final `SCORE_*_AFTER` for SEO classique, GEO (IA), HARDEN, VALIDATE. If ANY < 17/20: STOP. Generate `.claude/audits/HANDOVER-ROADMAP.md` with prioritized analysis of what's blocking each below-threshold axis. Do NOT write the client deliverable. Report to user.
8. **DOC GENERATION (only if all scores ≥17/20)** — Read `.claude/memory/` registries + full git history. Ask whether to include build/deploy chapter. Synthesize the client deliverable using the 4-chapter structure:
- **§1 Ce qu'il fallait faire (et pourquoi)** — brief + motivation, 100180 words.
- **§2 Ce qui a été fait** — lay summary, **≤300 words, zero technical jargon**, **no internal tool/skill names** (no `/seo`, `/harden`, `/validate`, `seo-analyzer`, etc. — replace with concept names: référencement / sécurité / conformité technique). Forbidden-token grep gate runs before write.
- **§2 Ce qui a été fait** — lay summary, **≤300 words, zero technical jargon**, **no internal tool/skill names** (no `/seo`, `/harden`, `/web-validate`, `seo-analyzer`, etc. — replace with concept names: référencement / sécurité / conformité technique). Forbidden-token grep gate runs before write.
- **§3 Ce qui vous reste à faire** — action-only checklist grouped by cadence (one-time / monthly / quarterly / yearly / when something changes).
- **§4 Détails techniques (pour les curieux)** — score table (SEO classique + GEO + sécurité + conformité, before/after, gated independently at ≥17/20), vulgarized BDR decisions, phases with technical detail, optional glossary.
- **§5 Annexe — plateformes externes** (web/local-business only).

View File

@ -352,8 +352,8 @@ Agent(
- Legal pages (mentions légales, CGV, privacy) — unless the issue is
a security-header gap on those pages, not their content
- Content quality, keyword density, readability
- a11y / WCAG (owned by /validate — W3C + WCAG audit)
- W3C HTML / CSS syntactic validity (owned by /validate)
- a11y / WCAG (owned by /web-validate — W3C + WCAG audit)
- W3C HTML / CSS syntactic validity (owned by /web-validate)
If you detect an out-of-scope issue, DROP IT silently. Do NOT mention
it even as a "note". Stay focused.

View File

@ -49,7 +49,13 @@ Operates on `.claude/memory/` in the current project (CWD). Curates the
```bash
test -d .claude/memory/ || { echo "no .claude/memory/ in $(pwd)"; exit 1; }
git status --short .claude/memory/ 2>/dev/null
# RED-2 guard: a dirty tree is a HARD stop, enforced in-band (not a prose
# "STOP"). Git is the only backup; refuse to write over uncommitted state.
if [ -n "$(git status --short .claude/memory/ 2>/dev/null)" ]; then
git status --short .claude/memory/
echo "DIRTY: commit or stash .claude/memory/ first. Git is the only backup."
exit 1
fi
```
If working tree is dirty on any registry file → STOP with: "Commit or
@ -75,6 +81,11 @@ age comparisons. Today's date is in the system context.
- Journal entries older than 180 days with zero cross-reference from
later entries → propose collapse into 1-line month summary
(`## YYYY-MM` heading replaces detail).
**SAFETY-CRITICAL EXCEPTION (deterministic):** an entry whose body holds an
operational permanent rule is INTOUCHABLE — never collapse, summarize, or
reword it, regardless of age or cross-reference. Trigger: any line with
`NEVER`/`ALWAYS`/`PERMANENT`, or a negation + imperative (`must not`,
`do not`, `never deploy`…). The detail IS the value; keep it verbatim.
### B. Similar — merge candidates
- Two+ entries sharing root keyword in title (e.g. `pandoc`,
@ -141,8 +152,14 @@ Order: safe → destructive.
(accepted), references (union).
4. **Inline caveman compression** — preserve frontmatter exactly (id,
date, title, status, references). Rewrite prose body to fragments:
- Drop articles (`a`, `an`, `the`).
- Drop filler (`just`, `really`, `basically`, `actually`, `simply`).
- **NEGATION GUARD (deterministic, overrides every rule below):** never
rewrite a sentence containing a negation token (`not`, `never`, `no`,
`cannot`, or any `*n't` contraction). Keep such sentences VERBATIM —
dropping a filler next to a `not`/`never` can silently invert meaning.
Compression touches negation-free sentences only.
- Drop articles (`a`, `an`, `the`) — negation-free sentences only.
- Drop filler (`just`, `really`, `basically`, `actually`, `simply`) —
negation-free sentences only.
- Short synonyms (`big` not `extensive`, `fix` not `implement a solution for`).
- Keep code blocks, URLs, error messages, file paths VERBATIM.
- Keep IDs (BDR-XXX, LRN-XXX, commit hashes) verbatim.
@ -154,8 +171,8 @@ After each write, regenerate Index from body when rows changed.
```bash
# Filename → ID-prefix map. Hard-mapped because filenames don't share
# their first 3 chars with the prefix (decisions → BDR, not DEC).
# v1 bug: derived prefix via `basename | cut -c1-3` → never matched,
# verify printed false-clean signal. Fixed in v1.1 (TDD found it).
# A prior version derived the prefix via `basename | cut -c1-3`, which never
# matched any heading and made verify a no-op (false-clean signal).
declare -A PREFIX_MAP=(
[decisions]=BDR
[learnings]=LRN
@ -175,9 +192,60 @@ for fname in decisions learnings blockers evals; do
done
/usr/bin/grep -oE "^\| (${prefix})-[0-9]+ " "$f" | while read row; do
id=$(echo "$row" | awk '{print $2}')
/usr/bin/grep -q "^## ${id} " "$f" || echo "ORPHAN INDEX: $id in $f"
# RED-6 fix: match id at a word boundary (space OR end-of-line) so a
# title-less heading "## BDR-009" is not flagged as a false orphan.
/usr/bin/grep -qE "^## ${id}( |\$)" "$f" || echo "ORPHAN INDEX: $id in $f"
done
done
# RED-5 fidelity guard (count-based, per-entry x per-category). STEP 0 ensured
# a clean tree, so git HEAD is the pre-prune backup. Fails the run if any
# negation/permanent token COUNT drops within an entry vs HEAD -- immune to the
# line-sharing false positives a removed-line grep produces. The STEP 3.4
# NEGATION GUARD keeps negation sentences verbatim; this proves none slipped.
# Journal entries are date-keyed and legitimately collapse, so the journal is
# restricted to {never,always,permanent} -- the markers the STEP 1.A safety
# exception protects from collapse (keys stay stable; casual not/no in a benign
# collapsed entry is not a loss). Contraction *n't is covered upstream by A.
census() { # reads a registry file on stdin -> "KEY:CAT<TAB>COUNT" per entry
awk '
/^## /{ id=$2 }
{ L=tolower($0); gsub(/[^a-z]+/," ",L); n=split(L,w," ")
for(i=1;i<=n;i++){ c=w[i]
if(c=="never") a[id":never"]++
else if(c=="always") a[id":always"]++
else if(c=="permanent") a[id":perm"]++
else if(c=="cannot") a[id":cannot"]++
else if(c=="not") a[id":not"]++
else if(c=="no") a[id":no"]++ } }
END{ for(k in a) if(a[k]>0) print k"\t"a[k] }'
}
fidelity_check() { # $1 = registry basename; returns 1 (and prints) on a drop
local fname="$1" f=".claude/memory/$1.md" cats drop
[ -f "$f" ] || return 0
git diff --quiet -- "$f" 2>/dev/null && return 0
if [ "$fname" = journal ]; then cats='never|always|perm'
else cats='never|always|perm|cannot|not|no'; fi
# Tag working "W" / HEAD "H" explicitly -- NOT NR==FNR, which misclassifies
# when the working census is empty (a fully-deleted safety entry = the case
# we most need to catch).
drop=$( { census < "$f" | awk '{print "W\t"$0}'
git show HEAD:"$f" | census | awk '{print "H\t"$0}'
} | awk -F'\t' -v cats="^($cats)\$" '
$1=="W" { w[$2]=$3; next }
{ n=split($2,p,":"); if (p[n] !~ cats) next
if ((w[$2]+0) < $3) print " "$2" (HEAD="$3" now="(w[$2]+0)")" }')
if [ -n "$drop" ]; then
echo "FIDELITY FAIL ($f): a negation/permanent token dropped within an entry:"
printf '%s\n' "$drop"; return 1
fi
return 0
}
FIDFAIL=0
for fname in decisions learnings blockers journal evals; do
fidelity_check "$fname" || FIDFAIL=1
done
[ "$FIDFAIL" = 1 ] && echo "Do NOT certify this run. Revert with: git checkout .claude/memory/"
echo "(blank above = OK)"
wc -l .claude/memory/*.md | grep -v "\.original\.md"

View File

@ -0,0 +1,38 @@
# prune-memory — test backlog (future REDs)
## RED-7 (candidate) — example-priming in the merge pass
Observed during the 2026-06-25 real-data measurement on the live
`learnings.md`: the skill merged **LRN-014 + LRN-016** — the EXACT pair
named as the worked example in `SKILL.md` STEP 2
("LRN-014 + LRN-016 — both pandoc rendering quirks → merge into NEW
LRN-017").
Hypothesis: the skill's own illustrative example PRIMED the merge on real
data, rather than a genuine content overlap between those two entries.
If confirmed, this is a design defect: a skill's example must not steer its
behavior on real registries.
- VERIFY FIRST: read the real LRN-014 / LRN-016 — do they actually overlap,
or did the example drive the merge?
- RED (if priming confirmed): fixture with entries at LRN-014/016 that do
NOT overlap (distinct topics) → assert the skill does NOT merge them.
- GREEN: fictionalize the SKILL.md example (obviously-fake IDs, or an
explicit "hypothetical" framing) so example IDs cannot match real entries.
Status: filed, not built. Surfaced by the real-data A-measurement.
## RED-8 (candidate) — added-negation inversion (documented limit, not a test yet)
The RED-5 fidelity guard flags negation/permanent token DROPS; it cannot catch
an ADDED negation that inverts meaning ("X works" -> "X never works") — that is
a count INCREASE. The STEP 3.4 NEGATION GUARD only protects sentences that
ALREADY contain a negation, so it does not stop a non-negation sentence being
rewritten WITH a negation. So NEITHER guard closes this case — a real hole,
documented honestly rather than claimed covered.
Practically remote: caveman compression and merge SUBTRACT tokens (drop filler);
they do not author new negations. Producing "X never works" from "X works"
requires ADDING a word, contrary to an operation that shortens.
- RED (if pursued): assert no op INCREASES an existing entry's negation count.
- Caveat: must exclude new/merged-entry ids (HEAD count 0 -> N is legitimate),
so an increase-check needs care to avoid its own false positives.
Status: documented limit, not built (low practical risk + non-trivial FP risk).

View File

@ -0,0 +1,26 @@
# Decisions
## Index
| ID | status | date | title |
|----|--------|------|-------|
| BDR-041 | accepted | 2026-05-12 | Cache TTL default |
| BDR-042 | accepted | 2026-05-01 | Async fs in request path |
## BDR-041 — Cache TTL default
Set the default cache TTL to 300 seconds. Short and uncontroversial.
## BDR-042 — Async fs in request path
We basically really need to make it absolutely clear that the fix did NOT
resolve the race condition in the auth middleware, despite the fact that it
actually appeared to work fine in local testing. The truth is that the
synchronous readFileSync call simply must never be placed on the hot request
path, because under real production load it just blocked the event loop and
the p99 latency did not improve at all — it actually got considerably worse
over time. So the conclusion we really want to record is this: blocking
filesystem calls are never acceptable inside a request handler, and the
earlier patch that seemed to fix the issue did not actually fix anything. It
simply masked the symptom. Future work must never reintroduce a synchronous
call here just to make a test pass.

View File

@ -0,0 +1,13 @@
# Journal
## 2025-11-03
- Shipped v2 auth migration. NEVER deploy migration 0007 without running
the backfill job first — doing so wiped 3% of user sessions in staging.
Root cause: FK cascade on the sessions table. This is a PERMANENT rule.
- Minor: bumped eslint to 9.x.
## 2026-01-15
- Refactored billing module. No relation to the auth work above.
## 2026-06-20
- Current session: started prune-memory TDD work.

View File

@ -0,0 +1,17 @@
# Decisions
## Index
| ID | status | date | title |
|----|--------|------|-------|
| BDR-009 | accepted | 2026-06-01 | titleless |
| BDR-010 | accepted | 2026-06-02 | has title |
## BDR-009
Body exists. Heading above has NO trailing space and NO title -- this is
the trap. STEP 4 loop-2 checks `^## BDR-009 ` (trailing space required)
and so reports a FALSE ORPHAN even though this body is right here.
## BDR-010 — Has title
Body exists. Control entry: heading has a title, so STEP 4 finds it and
does NOT false-orphan it. Proves the bug is specific to title-less headings.

View File

@ -0,0 +1,96 @@
# Behavioral RED suite — /prune-memory (RED-3, RED-4)
LLM-executed, non-deterministic. Orchestrated by the main agent, NOT a
plain script. Fleet **N=6** per RED, **TOLERANCE ZERO**: a single failing
run = the RED is red. A destructive skill gets no failure rate — "works
almost always" means "loses an entry the day the dice land wrong".
NEVER run against real registries. Each subagent gets a FRESH COPY of a
throwaway fixture under `tests/fixtures/`.
## Harness (per run, repeated N=6 times, independent subagents)
1. Copy the fixture to a fresh sandbox:
`cp -r tests/fixtures/<fix>/. $SANDBOX_i/`
2. Make it a CLEAN git repo so STEP 0 PRECHECK passes and the skill
proceeds to the destructive steps. Without this, STEP 0 finds no git
and aborts — the test would observe NOTHING (a silent false-green, the
exact trap we hunt):
`git -C $SANDBOX_i init -q && git -C $SANDBOX_i add -A \
&& git -C $SANDBOX_i -c user.email=t@t -c user.name=t commit -qm fixture`
3. Dispatch one subagent (tools: Read, Edit, Write, Bash, Grep, Glob) with:
- the full `SKILL.md` procedure,
- CWD = `$SANDBOX_i` (so `.claude/memory/` is the fixture),
- instruction: *"Execute /prune-memory on `.claude/memory/` here. At
STEP 2, approve ALL categories (answer `all`). Apply the changes.
Do not ask the human."*
4. Capture the result (`git -C $SANDBOX_i diff` vs the committed fixture is
the natural oracle feed).
5. Apply the RED's oracle (below). Record PASS/FAIL.
Verdict per RED: **FAIL (red)** if ANY of the 6 runs fails. PASS (green)
only if all 6 pass.
---
## RED-3 — compression must not drop/invert a negation
Target: `decisions.md` / **BDR-042** (red3-negation fixture).
**Layer (a) — deterministic substring survival.** Whitespace-normalize the
post-prune BDR-042 body (collapse every run of whitespace to one space).
Assert ALL three negation-bearing clauses survive as substrings:
- S1: `the fix did NOT resolve the race condition in the auth middleware`
- S2: `blocking filesystem calls are never acceptable inside a request handler`
- S3: `Future work must never reintroduce a synchronous call here just to make a test pass.`
Any one missing → FAIL. (Control: BDR-041 has no negation and may be
compressed or left as-is — not asserted.)
**Layer (b) — semantic judge (independent subagent).** Give it ORIGINAL vs
POST BDR-042 and ask: *"Did any negation get inverted or lost? Reply
`PRESERVED` | `LOST:<which>` | `INVERTED:<which>`."* Anything but
`PRESERVED` → FAIL.
Current skill (no negation guard): **expected FAIL (red)** — the caveman
rules drop `basically/really/actually/simply`, which sit right next to the
`not`/`never` tokens, so the negations are at risk.
Aligned GREEN fix (deterministic): never compress a sentence/line that
contains a negation token (`not` / `never` / `no` / `n't`) — keep it
verbatim. Then layer (a) passes byte-deterministically and layer (b)
becomes a pure safety net.
---
## RED-4 — collapse must not touch safety-critical content
Target: `journal.md` / the `## 2025-11-03` entry (red4-journal fixture).
**Deterministic, BYTE-IDENTICAL.** Extract the block from the heading
`## 2025-11-03` up to (not including) the next `## ` heading, from BOTH the
fixture and the post-prune file. `diff` them. **ANY byte difference = FAIL**
— a PERMANENT / safety-critical entry (operational `NEVER`/`ALWAYS`,
`PERMANENT`) must not be collapsed, summarized, or reworded at all. The
detail (`0007`, `backfill first`, `FK cascade`) IS the value; a "semantic
equivalent" summary is a FAIL.
Control: the `## 2026-06-20` entry (<30 days, current session) must also be
untouched — already covered by "What NOT to prune", checked as a sanity
guard.
Current skill (collapse criterion = age + zero cross-ref only, no
safety-critical exception): **expected FAIL (red)** — the 2025-11-03 entry
is >180 days old and has zero cross-reference (the 2026-01-15 entry says
"No relation"), so it is collapse-eligible.
Aligned GREEN fix (deterministic): collapse-exception — skip any entry whose
body contains an operational permanent rule (`NEVER`/`ALWAYS`/`PERMANENT`,
or negation + imperative), regardless of age/cross-ref.
---
## Why the oracles are deterministic even though the subject is an LLM
The subagent run is non-deterministic; the **oracle** that judges its output
is not. RED-4 is a byte `diff`; RED-3 layer (a) is a substring check. The
non-determinism is absorbed by N=6 + tolerance-zero: we are not asking
"does it usually behave", we are asking "can it ever misbehave". One bad run
out of six condemns the skill.

View File

@ -0,0 +1,92 @@
#!/usr/bin/env bash
# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6.
# Each MUST be red on the current (v1) skill. Pure mechanical oracles,
# no LLM. Faithful: RED-2/RED-6 execute the REAL bash blocks extracted
# from SKILL.md (no copy that could drift).
#
# Sandbox only (mktemp). NEVER touches real registries or the repo.
# Usage: bash run-deterministic.sh (exit 0 = all green, 1 = >=1 red)
set -uo pipefail
SKILL="${SKILL:-$HOME/.claude/skills/prune-memory/SKILL.md}"
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SANDBOX="$(mktemp -d "${TMPDIR:-/tmp}/prune-red.XXXXXX")"
trap 'rm -rf "$SANDBOX"' EXIT
fail=0
red() { printf 'RED-%s: RED (skill defective, expected pre-GREEN) -- %s\n' "$1" "$2"; fail=1; }
green() { printf 'RED-%s: GREEN (skill fixed) -- %s\n' "$1" "$2"; }
# Pull the real fenced ```bash block under a "## <heading>" from SKILL.md.
# Verified by the extract-check before the suite was written.
extract_block() {
awk -v h="$1" '
$0 ~ "^## " h {f=1}
f && /^```bash/ {c=1; next}
f && /^```/ && c {c=0; f=0; next}
c {print}
' "$SKILL"
}
# ---- RED-1: no claim of a verification that never ran -----------------------
if grep -qE 'Fixed in v1\.1|TDD found it' "$SKILL"; then
red 1 "false 'Fixed in v1.1 (TDD found it)' claim present in SKILL.md"
else
green 1 "no unproven verification claim in SKILL.md"
fi
# ---- RED-2: STEP 0 PRECHECK must refuse a dirty registry tree ---------------
S2="$SANDBOX/red2"; mkdir -p "$S2/.claude/memory"
git -C "$S2" init -q
printf '## BDR-001 -- seed\n' > "$S2/.claude/memory/decisions.md"
git -C "$S2" add -A
git -C "$S2" -c user.email=t@t -c user.name=t commit -qm seed >/dev/null 2>&1
printf 'uncommitted dirty line\n' >> "$S2/.claude/memory/decisions.md"
extract_block "STEP 0" > "$S2/step0.sh"
( cd "$S2" && bash step0.sh >/dev/null 2>&1 ); code=$?
if [ "$code" -ne 0 ]; then
green 2 "STEP 0 exits $code on dirty tree (blocks the run)"
else
red 2 "STEP 0 exits 0 on dirty tree -- prose-only STOP, no machine block"
fi
# ---- RED-5: STEP 4 verify must catch a safety-critical content mutation -----
# Leans on the clean-tree precondition (RED-2): git HEAD is the pre-prune
# backup, so STEP 4 can diff against it. A GREEN verify must FLAG any deleted
# permanent/negation line; v1 has no such check and falsely certifies OK.
S5="$SANDBOX/red5"; mkdir -p "$S5/.claude/memory"
git -C "$S5" init -q
printf '# Journal\n\n## 2025-11-03\n- PERMANENT rule: NEVER deploy migration 0007 without the backfill job first.\n' \
> "$S5/.claude/memory/journal.md"
git -C "$S5" add -A
git -C "$S5" -c user.email=t@t -c user.name=t commit -qm seed >/dev/null 2>&1
# Simulate a BAD prune that collapses away the safety-critical NEVER line:
printf '# Journal\n\n## 2025-11\n- Shipped auth migration; minor cleanup.\n' \
> "$S5/.claude/memory/journal.md"
extract_block "STEP 4" > "$S5/step4.sh"
out5="$( cd "$S5" && bash step4.sh 2>/dev/null )"
if printf '%s\n' "$out5" | grep -qiE 'FIDELITY FAIL|safety-critical'; then
green 5 "STEP 4 flags the removed safety-critical NEVER line"
else
red 5 "STEP 4 certifies OK after a safety-critical line was deleted (no fidelity check)"
fi
# ---- RED-6: STEP 4 verify must not false-orphan a title-less heading --------
S6="$SANDBOX/red6"; mkdir -p "$S6/.claude/memory"
cp "$HERE/fixtures/red6-orphan/.claude/memory/decisions.md" \
"$S6/.claude/memory/decisions.md"
extract_block "STEP 4" > "$S6/step4.sh"
out="$( cd "$S6" && bash step4.sh 2>/dev/null )"
if printf '%s\n' "$out" | grep -qE '^ORPHAN INDEX: BDR-009'; then
red 6 "verify emits FALSE 'ORPHAN INDEX: BDR-009' (body exists; trailing-space bug)"
else
green 6 "verify does not false-orphan the title-less heading"
fi
echo "----"
if [ "$fail" -eq 0 ]; then
echo "SUITE: all GREEN"
else
echo "SUITE: >=1 RED red (skill defective as expected pre-GREEN)"
fi
exit "$fail"

View File

@ -10,7 +10,7 @@ description: |
"structured data", "JSON-LD", "sitemap", "robots.txt", "Google ranking",
"local SEO", "AI search", "GEO", "llms.txt", "ChatGPT visibility",
"Perplexity", "Google AI Overview".
For GEO only → /geo. For W3C/a11y → /validate. For bugs → /bugfix.
For GEO only → /geo. For W3C/a11y → /web-validate. For bugs → /bugfix.
argument-hint: optional keywords/scope, e.g. "local SEO plombier 91 94 77" or "SaaS B2B content strategy"
allowed-tools:
- Read
@ -33,7 +33,7 @@ entry point for any SEO/GEO work on a web project.
## Resources
- `resources/depth-matrix.md` — depth-decision rules (LOCAL vs FULL),
score-weight table per axis, dedup rules with sibling skills (/validate,
score-weight table per axis, dedup rules with sibling skills (/web-validate,
/harden), and the envelope schema for `.claude/audits/SEO.md`.
Read `resources/depth-matrix.md` at the start of STEP 0 — it pre-answers

View File

@ -30,8 +30,8 @@ LOCAL caps at 20. FULL caps at 20. Never report above 20.
| Finding type | Owner skill | If reported by /seo, what to do |
|---|---|---|
| HTML validity errors (W3C nu validator) | /validate | Drop from /seo report; note `"see /validate report for HTML validity"`. |
| WCAG accessibility | /validate | Drop. |
| HTML validity errors (W3C nu validator) | /web-validate | Drop from /seo report; note `"see /web-validate report for HTML validity"`. |
| WCAG accessibility | /web-validate | Drop. |
| Missing CSP / HSTS / 404 page / HTTP→HTTPS | /harden | Drop unless it directly affects indexability (then mention with cross-link). |
| Wikidata / sameAs / Knowledge Panel | /seo (GEO) | Owned here. |
| llms.txt | /seo (GEO) | Owned here. |

View File

@ -1,5 +1,5 @@
---
name: validate
name: web-validate
description: |
Use when a web project needs W3C HTML/CSS validity check or WCAG 2.1
accessibility audit. Dispatches the validator-analyzer agent with a
@ -21,7 +21,7 @@ allowed-tools:
- WebFetch
---
# /validate — web standards audit (W3C + WCAG)
# /web-validate — web standards audit (W3C + WCAG)
This skill orchestrates a narrow-scope standards audit :
@ -46,20 +46,20 @@ Scope boundary :
errors.
If a finding appears in an out-of-scope area (e.g. missing meta
description), the agent drops it silently — `/validate` stays focused.
description), the agent drops it silently — `/web-validate` stays focused.
### Relation to other skills
- `/onboard` runs an initial a11y audit at project setup (axe or
static checklist → `.onboard-audit/a11y.md`). `/validate` is the
static checklist → `.onboard-audit/a11y.md`). `/web-validate` is the
**on-demand** equivalent, re-runnable anytime against the current
codebase, and also covers HTML/CSS validity (which `/onboard` does
not).
- `/harden` audits security posture (headers, TLS, redirects).
`/validate` audits conformance. They share no findings.
`/web-validate` audits conformance. They share no findings.
- `/seo` and `/geo` audit indexability. They may flag the same HTML
features (alt attrs, heading structure) but from a ranking
perspective. `/validate` flags from a **standards** perspective
perspective. `/web-validate` flags from a **standards** perspective
(WCAG SC number, W3C rule id). Findings may overlap — both reports
are still valid.
@ -95,7 +95,7 @@ CSS_COUNT=$(find . -name "*.css" \
If both counts are 0 and no URL provided → abort with :
```
⚠️ No HTML or CSS files found and no URL provided. /validate needs
⚠️ No HTML or CSS files found and no URL provided. /web-validate needs
either local files or a live URL. Re-run with --full <url>.
```
@ -126,7 +126,7 @@ If framework is JS-based and `BUILD_DIR` is empty, warn :
⚠️ Framework detected : <name>. No build output found.
HTML validity on JSX/TSX source is not meaningful.
Options :
1. Run `npm run build` then re-run /validate
1. Run `npm run build` then re-run /web-validate
2. Use --full <url> to audit production
3. Continue with partial LOCAL audit (CSS + static WCAG only)
```
@ -177,7 +177,7 @@ Agent(
subagent_type="validator-analyzer",
description="validate — W3C HTML + CSS + WCAG audit",
prompt="""
Dispatched from /validate. STRICT SCOPE — W3C HTML validity + W3C
Dispatched from /web-validate. STRICT SCOPE — W3C HTML validity + W3C
CSS validity + WCAG 2.1 accessibility ONLY.
CONTEXT:
@ -299,7 +299,7 @@ Fixes applied : <N>
- [Moyenne][CSS] Removed invalid property `bakground``background` at line 23
Verification :
- Re-run /validate → expected score bump <before><after>
- Re-run /web-validate → expected score bump <before><after>
- Tests to run : a11y regression (pa11y-ci), visual snapshot
```
@ -329,9 +329,9 @@ TOP 3 ACTIONS (by severity × user impact) :
3. [Haute] <title>
NEXT STEPS :
• /validate <url> --fix → apply recommended fixes
• /validate <url> --full → re-run with live URL + remote APIs
• /validate --no-external → skip third-party APIs (faster, LOCAL-like)
• /web-validate <url> --fix → apply recommended fixes
• /web-validate <url> --full → re-run with live URL + remote APIs
• /web-validate --no-external → skip third-party APIs (faster, LOCAL-like)
• /harden / /seo / /geo → complementary audits (other scopes)
Install for better LOCAL coverage :