4 dagar sedan · f4ceed20bb
--- a/.claude/audits/DARWIN-SKILL-2026-05-12.md
+++ b/.claude/audits/DARWIN-SKILL-2026-05-12.md
@@ -0,0 +1,139 @@
 
				+# Darwin Skill Optimization — 2026-05-12
			
 
				+
			
 
				+**Branch:** `auto-optimize/20260512-1319`
			
 
				+**Scope:** 28 dirs in `~/Documents/claude/skills/` (cwd = source of truth, ~/.claude/skills is runtime mirror)
			
 
				+**Eval mode:** structural dim 1-7 full / dim 8 `dry_run`
			
 
				+**Phase reached:** 2 (round 1 per skill — rounds 2-3 skipped for diminishing returns)
			
 
				+**Runtime:** ~30 min
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Baseline scorecard (23 scoreable + 5 excluded)
			
 
				+
			
 
				+23 personal skills scored. 5 broken gstack symlinks excluded (not user-owned).
			
 
				+
			
 
				+| Rank | Skill            | Baseline | Weakest |
			
 
				+|------|------------------|---------:|---------|
			
 
				+| 1    | status           | 45.3 | D3 |
			
 
				+| 2    | refactor         | 48.4 | D4 |
			
 
				+| 3    | plugin-check     | 59.2 | D4 |
			
 
				+| 4    | skills-perso     | 66.4 | D3 |
			
 
				+| 5    | commit-change    | 69.6 | D3 |
			
 
				+| 6    | hotfix           | 74.1 | D4 |
			
 
				+| 7    | code-clean       | 74.7 | D3 |
			
 
				+| 8    | feat             | 74.7 | D4 |
			
 
				+| 9    | doc              | 76.2 | D3 |
			
 
				+| 10   | analyze          | 76.4 | D4 |
			
 
				+| 11   | validate         | 76.4 | D4 |
			
 
				+| 12   | geo              | 76.5 | D4 |
			
 
				+| 13   | profile          | 78.3 | D3 |
			
 
				+| 14   | close            | 79.2 | D3 |
			
 
				+| 15   | bugfix           | 79.3 | D4 |
			
 
				+| 16   | seo              | 82.5 | D4 |
			
 
				+| 17   | onboard          | 83.5 | D8 |
			
 
				+| 18   | client-handover  | 83.9 | D8 |
			
 
				+| 19   | init-project     | 84.1 | D8 |
			
 
				+| 20   | ship-feature     | 84.3 | D8 |
			
 
				+| 21   | prune-memory     | 84.6 | D8 |
			
 
				+| 22   | harden           | 85.5 | D8 |
			
 
				+| *    | graphify         | 29.0 | D4 |
			
 
				+
			
 
				+graphify scored on partial read (62KB SKILL.md exceeded token limit). Real fix likely a Phase 2.5 exploratory rewrite, not hill-climb — deferred.
			
 
				+
			
 
				+**Excluded broken symlinks (out of scope):** benchmark-models, context-restore, context-save, make-pdf, plan-tune. All point to `skills-external/gstack/<name>/SKILL.md` which doesn't exist.
			
 
				+
			
 
				+**Average across 23: 75.6 / 100.**
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Phase 2 — optimized skills (bottom 5)
			
 
				+
			
 
				+User-confirmed queue: bottom 5 by baseline (graphify deferred).
			
 
				+
			
 
				+| Skill          | Before | After | Δ      | Round | Status | Change                                       |
			
 
				+|----------------|-------:|------:|-------:|:-----:|:------:|----------------------------------------------|
			
 
				+| status         | 45.3   | 76.2  | +30.9  | 1     | keep   | D3 fallback + frontmatter triggers           |
			
 
				+| refactor       | 48.4   | 74.3  | +25.9  | 1     | keep   | D3 fallback + frontmatter triggers           |
			
 
				+| plugin-check   | 59.2   | 76.8  | +17.6  | 1     | keep   | D3 fallback + triggers + advisory note       |
			
 
				+| skills-perso   | 66.4   | 80.1  | +13.7  | 1     | keep   | D3 "Known limits of heuristic" section       |
			
 
				+| commit-change  | 69.6   | 83.5  | +13.9  | 1     | keep   | D3 fallback + pre-flight git checks (HEAD/identity) |
			
 
				+
			
 
				+**Average:** 58.0 → 78.2 (+20.2 per skill).
			
 
				+**Kept:** 5/5. **Reverted:** 0.
			
 
				+
			
 
				+### Per-skill commit
			
 
				+
			
 
				+- `512df48` optimize status: round 1 — D3 fallback + triggers
			
 
				+- `079074d` optimize refactor: round 1 — D3 fallback + D1 triggers
			
 
				+- `d3dd31c` optimize plugin-check: round 1 — D3 fallback + D1 triggers + advisory clarification
			
 
				+- `134561d` optimize skills-perso + commit-change: round 1 — D3 edge cases
			
 
				+
			
 
				+### Same pattern across the bottom 4 dispatchers
			
 
				+
			
 
				+status, refactor, plugin-check, commit-change are all thin dispatchers (15-30 lines, body = "load and follow agents/<x>.md + $ARGUMENTS"). Baseline rubric penalized them harshly across most dims because there's "nothing to score". Round 1 fix is invariant:
			
 
				+
			
 
				+1. Add fallback clause: "If agent file unreachable, emit `<X> agent missing.` and STOP."
			
 
				+2. Add explicit triggers in frontmatter description.
			
 
				+3. For destructive skills (refactor, commit-change) add safety rationale ("never improvise — silent behavior change").
			
 
				+
			
 
				+skills-perso is the outlier — full implementation, not a dispatcher. Improvement was a "Known limits of the heuristic" section that names false-positives (agent refs in fenced code blocks), false-negatives (custom agent paths), the `owner: user` override path, and frontmatter-malformed silent-skip behavior.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Methodology notes & caveats
			
 
				+
			
 
				+1. **Eval math drift.** Round-1 subagents occasionally used `Σ(dim×weight)/100` instead of the correct `/10`, and one used D8 weight 7 instead of 25. Totals reported above are recomputed by the main thread from the subagents' dim-by-dim judgments, so absolute numbers are trustworthy even though the subagents' written totals sometimes weren't.
			
 
				+
			
 
				+2. **Score recalibration at round 1.** The two passes (baseline subagent vs round-1 eval subagent) are different model invocations and re-anchor independently. The +30 jump for `status` reflects partly real improvement (fallback section) and partly that the baseline subagent under-scored "by-design thin dispatchers" across the board. Δ direction is reliable; absolute magnitudes are noisy.
			
 
				+
			
 
				+3. **150% size cap is tight for thin dispatchers.** `status` (15 lines, 776B) and `refactor` (15 lines, 379B) had ~22-line / ~570B caps. Multiple trim cycles per skill to fit. Recommend the spec consider an absolute-byte floor (e.g., min cap = max(150%, 1000B)) for cases where the original was minimal.
			
 
				+
			
 
				+4. **D8 (effect) untested.** Spec allows `dry_run`; all 5 logged as such. To upgrade, would need to run each skill twice (with/without optimized SKILL.md) and compare outputs. Saved for a follow-up pass.
			
 
				+
			
 
				+5. **Rounds 2-3 skipped.** Round 1 Δ for all 5 was big (+13.7 to +30.9). Round 2 expected Δ is small (returns flatten near 80). Better marginal value optimizing the next-lowest skills (hotfix, code-clean, feat at 74-75) than re-grinding the bottom 5.
			
 
				+
			
 
				+6. **Screenshot.mjs is macOS-only.** Script hardcodes `/Users/alchain/.npm-global/...`. To generate PNG cards on Linux, swap to `npx playwright screenshot file://… out.png --viewport-size=960,1280` or fix the script's `require()` path.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Suggested next passes
			
 
				+
			
 
				+| Priority | Action |
			
 
				+|----------|--------|
			
 
				+| P1 | Hill-climb rank 6-12 (hotfix, code-clean, feat, doc, analyze, validate, geo) — same dispatcher fallback pattern; expected +5 to +15 per skill |
			
 
				+| P1 | Phase 2.5 exploratory rewrite of `graphify` (62KB SKILL.md, baseline 29) |
			
 
				+| P2 | Real dim-8 effect testing on the top 5 (onboard, client-handover, init-project, ship-feature, harden) to confirm baseline isn't masking issues |
			
 
				+| P3 | Round 2 on bottom 5 — target next-weakest dim (D4 checkpoints for destructive skills, D5 specificity for status) |
			
 
				+| P3 | Fix `scripts/screenshot.mjs` for Linux + generate PNG cards |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Files touched (committed on `auto-optimize/20260512-1319`)
			
 
				+
			
 
				+```
			
 
				+skills/status/SKILL.md          (+9 -2)
			
 
				+skills/refactor/SKILL.md        (+3 -4)
			
 
				+skills/plugin-check/SKILL.md    (+5 -5)
			
 
				+skills/skills-perso/SKILL.md    (+18 -0)
			
 
				+skills/commit-change/SKILL.md   (+5 -3)
			
 
				+skills/close/test-prompts.json       (new)
			
 
				+skills/graphify/test-prompts.json    (new)
			
 
				+skills/harden/test-prompts.json      (new)
			
 
				+skills/profile/test-prompts.json     (new)
			
 
				+skills/prune-memory/test-prompts.json (new)
			
 
				+```
			
 
				+
			
 
				+5 commits on branch. To merge:
			
 
				+
			
 
				+```bash
			
 
				+git checkout master
			
 
				+git merge --no-ff auto-optimize/20260512-1319
			
 
				+```
			
 
				+
			
 
				+Or review per-commit and cherry-pick. No master commits yet — branch is purely additive.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## results.tsv
			
 
				+
			
 
				+Persisted to `~/.agents/skills/darwin-skill/results.tsv` (28 baseline rows + 5 round-1 rows = 33 total).
			
--- a/skills/close/test-prompts.json
+++ b/skills/close/test-prompts.json
@@ -0,0 +1,5 @@
 
				+[
			
 
				+  {"id": 1, "prompt": "ferme la session", "expected": "Skill runs PRECHECK on .claude/memory/, gathers git log + diff + status, presents the 3-question ritual (decided/learned/blocked) with pre-filled drafts from conversation context, waits for approval, then appends approved entries (English, caveman) to decisions.md/learnings.md/blockers.md plus one journal line."},
			
 
				+  {"id": 2, "prompt": "close — retro rapide avant que je parte", "expected": "Same 3-question ritual; if a registry has nothing notable to log, skill emits '(rien à logger cette session)' for that question and still writes the journal timeline line under today's date heading."},
			
 
				+  {"id": 3, "prompt": "checkpoint memory — but .claude/memory/ doesn't exist in this repo", "expected": "PRECHECK detects missing .claude/memory/, STOPS with the warning telling user to run /onboard or /init-project first — does NOT create the directory itself."}
			
 
				+]
			
--- a/skills/commit-change/SKILL.md
+++ b/skills/commit-change/SKILL.md
@@ -18,9 +18,12 @@ allowed-tools:
 
				   - AskUserQuestion
			
 
				 ---
			
 
				 
			
 
				-Load and follow strictly:
			
 
				-- $HOME/.claude/agents/commit-changer.md
			
 
				+Load and follow strictly: `$HOME/.claude/agents/commit-changer.md`.
			
 
				 
			
 
				-Execute the COMMIT-CHANGER agent on the current working directory.
			
 
				+If unreachable, emit `Commit-changer agent missing.` and STOP. Never auto-commit blind — a wrong group is harder to undo than not committing.
			
 
				+
			
 
				+Pre-flight checks (the agent should also perform, but flag here):
			
 
				+- Detached HEAD or unmerged conflicts → STOP, report state.
			
 
				+- Identity unconfigured (`git config user.email` empty) → STOP, ask user.
			
 
				 
			
 
				 $ARGUMENTS
			
--- a/skills/graphify/test-prompts.json
+++ b/skills/graphify/test-prompts.json
@@ -0,0 +1,5 @@
 
				+[
			
 
				+  {"id": 1, "prompt": "/graphify .", "expected": "Skill runs full pipeline on CWD: ensures graphify python installed, detects files into categories (code/docs/papers/images/video), shows clean corpus summary, dispatches AST + parallel semantic subagents (general-purpose, NOT Explore), merges chunks, builds graph + communities, writes graph.json + GRAPH_REPORT.md + interactive HTML in graphify-out/."},
			
 
				+  {"id": 2, "prompt": "how does AuthModule relate to the Database layer in this codebase?", "expected": "Skill recognizes a cross-module architecture question on a graphified project — runs `graphify path 'AuthModule' 'Database'` (or `graphify query` if path is too literal) to traverse EXTRACTED + INFERRED edges instead of grepping files, then narrates the connection."},
			
 
				+  {"id": 3, "prompt": "/graphify https://github.com/anthropics/anthropic-sdk-python https://github.com/openai/openai-python", "expected": "Skill executes Step 0 multi-repo flow: `graphify clone` each URL into ~/.graphify/repos/<owner>/<repo>, runs the pipeline on each, then `graphify merge-graphs` into a single cross-repo-graph.json with `repo` attribute on each node."}
			
 
				+]
			
--- a/skills/harden/test-prompts.json
+++ b/skills/harden/test-prompts.json
@@ -0,0 +1,5 @@
 
				+[
			
 
				+  {"id": 1, "prompt": "/harden https://example.com --full", "expected": "Skill runs FULL-mode hardening audit: parses URL+domain, detects framework configs (.htaccess/nginx.conf/next.config.js/etc.), launches external validators in parallel (Mozilla Observatory + SecurityHeaders.com + SSL Labs async), dispatches seo-analyzer with STRICT narrow scope (transport/HSTS/CSP/headers/canonical/404/server-config ONLY), polls SSL Labs, writes .claude/audits/HARDEN.md with score + external grades + top-3 actions."},
			
 
				+  {"id": 2, "prompt": "audit sécurité web sur ce repo, mais en local — pas d'appels externes", "expected": "Skill defaults DEPTH=LOCAL (no URL), auto-skips Step 0b external validators (EXTERNAL forced off in LOCAL), dispatches seo-analyzer with narrow scope on detected config files only, produces .claude/audits/HARDEN.md without the External validators section."},
			
 
				+  {"id": 3, "prompt": "/harden https://example.com — and check my meta tags + sitemap too", "expected": "Skill respects scope boundary: silently DROPS the meta-tags/sitemap request (those are /seo's territory), runs ONLY the 6 in-scope hardening areas, and ideally points the user to /seo for the dropped concerns — does NOT mix scopes."}
			
 
				+]
			
--- a/skills/plugin-check/SKILL.md
+++ b/skills/plugin-check/SKILL.md
@@ -1,15 +1,15 @@
 
				 ---
			
 
				 name: plugin-check
			
 
				-description: Audit active plugins vs project needs. Recommends enable/disable actions.
			
 
				+description: Audit active plugins vs project needs. Read-only advisory recommending enable/disable. Triggers: "plugin-check", "quels plugins".
			
 
				 argument-hint: [ex: "React + FastAPI" or "Rust CLI, no frontend"]
			
 
				 disable-model-invocation: true
			
 
				 allowed-tools: Read, Bash, Glob, Grep
			
 
				 ---
			
 
				 
			
 
				-Load and follow strictly:
			
 
				-- $HOME/.claude/agents/plugin-advisor.md
			
 
				+Load and follow strictly: `$HOME/.claude/agents/plugin-advisor.md`.
			
 
				 
			
 
				-Analyze active plugins and the following context,
			
 
				-then produce the full PLUGIN ADVISOR REPORT:
			
 
				+Analyze active plugins + context below, produce PLUGIN ADVISOR REPORT.
			
 
				+
			
 
				+If `$HOME/.claude/agents/plugin-advisor.md` unreachable: emit `Plugin advisor agent missing.` and STOP. Never write — user toggles via `claude plugin enable/disable`.
			
 
				 
			
 
				 $ARGUMENTS
			
--- a/skills/profile/test-prompts.json
+++ b/skills/profile/test-prompts.json
@@ -0,0 +1,5 @@
 
				+[
			
 
				+  {"id": 1, "prompt": "profile list", "expected": "Skill runs `bash $HOME/.claude/lib/profile.sh list` and prints the table of available profiles (web, seo, web-full, backend, design, dev, qa, audit, minimal) without extra commentary."},
			
 
				+  {"id": 2, "prompt": "active les skills design — désactive le bruit gstack", "expected": "Skill interprets this as `set design` (destructive — disables non-listed gstack skills), confirms first since `set` is destructive, then runs `bash $HOME/.claude/lib/profile.sh set design` and reports the count of skills moved plus the reminder to start a new Claude session to pick up changes."},
			
 
				+  {"id": 3, "prompt": "quel profil est actif?", "expected": "Skill runs `bash $HOME/.claude/lib/profile.sh current` and reports the detected active profile with its match percentage; does NOT toggle any symlinks."}
			
 
				+]
			
--- a/skills/prune-memory/test-prompts.json
+++ b/skills/prune-memory/test-prompts.json
@@ -0,0 +1,5 @@
 
				+[
			
 
				+  {"id": 1, "prompt": "prune memory — registres trop longs", "expected": "Skill runs PRECHECK (refuses if working tree dirty on .claude/memory/), audits all 5 registries for A) obsolete/superseded entries B) similar merge candidates C) bloated prose D) Index drift, presents the per-registry PRUNE PLAN with category-level approval gate, applies only approved changes (mark-superseded never hard-delete), then runs Index sanity verify at STEP 4."},
			
 
				+  {"id": 2, "prompt": "/prune-memory decisions", "expected": "Skill filters audit to decisions.md only — leaves learnings/blockers/journal/evals untouched, presents plan for that single registry, and exits cleanly after applying approved changes with the same append-only + caveman rules."},
			
 
				+  {"id": 3, "prompt": "compresse les memoires — mais j'ai des changements pas commités dans .claude/memory/", "expected": "STEP 0 PRECHECK detects dirty working tree on registry files and STOPS immediately: 'Commit or stash pending changes in .claude/memory/ first. Skill writes in-place. Git is the only backup.' — no writes, no plan."}
			
 
				+]
			
--- a/skills/refactor/SKILL.md
+++ b/skills/refactor/SKILL.md
@@ -1,14 +1,13 @@
 
				 ---
			
 
				 name: refactor
			
 
				-description: Improve code quality without changing behavior — strict norm enforcement
			
 
				+description: Improve code quality without changing behavior — strict norm enforcement. Triggers: "refactor", "clean up code", "normaliser".
			
 
				 argument-hint: <file, function, or module to refactor>
			
 
				 disable-model-invocation: true
			
 
				 allowed-tools: Read, Write, Edit, Grep, Glob, Bash
			
 
				 ---
			
 
				 
			
 
				-Load and follow strictly:
			
 
				-- $HOME/.claude/agents/refactorer.md
			
 
				+Load and follow strictly: `$HOME/.claude/agents/refactorer.md`.
			
 
				 
			
 
				-Execute the REFACTORER agent on the following target:
			
 
				+If unreachable, emit `Refactorer agent missing.` and STOP. Never improvise — silent behavior change is unsafe.
			
 
				 
			
 
				 $ARGUMENTS
			
--- a/skills/skills-perso/SKILL.md
+++ b/skills/skills-perso/SKILL.md
@@ -99,3 +99,20 @@ fi
 
				 ```
			
 
				 
			
 
				 Keep descriptions to one line (~80 chars max, truncate with "..." if needed).
			
 
				+
			
 
				+## Known limits of the detection heuristic
			
 
				+
			
 
				+- **False positive (rare):** agent references buried in fenced code blocks
			
 
				+  (` ``` ... ``` `) match Signal 2 even though they are not active delegations.
			
 
				+  Mitigation: skill author adds `owner: user` (Signal 1) — explicit always wins.
			
 
				+- **False negative:** personal skills that delegate to agents under non-standard
			
 
				+  paths (e.g. `~/.config/myagents/`, `agents-shared/`) won't match Signal 2.
			
 
				+  Mitigation: same — add `owner: user` to frontmatter.
			
 
				+- **Frontmatter malformed / missing:** `is_personal()` returns false (skill
			
 
				+  silently excluded). The "0 personal skills detected" diagnostic catches the
			
 
				+  zero case but not partial misses.
			
 
				+- **Description extract edge cases:** plain multi-line YAML (no `|`/`>`) is
			
 
				+  read as first line only. For users of `description: |` block scalars this is
			
 
				+  intended; otherwise inspect raw `SKILL.md` if a description looks truncated.
			
 
				+- **Override:** to force-include a framework skill, fork it into `~/.claude/skills/`
			
 
				+  and add `owner: user`. The fork is then yours to maintain.
			
--- a/skills/status/SKILL.md
+++ b/skills/status/SKILL.md
@@ -1,14 +1,21 @@
 
				 ---
			
 
				 name: status
			
 
				-description: Consolidated project snapshot — plugins active, token cost, git state, recent commits, GSD v2 milestone progress. Read-only. Run at session start or after a break.
			
 
				+description: Consolidated project snapshot — plugins, token cost, git state, recent commits, GSD v2 milestone progress. Read-only. Run at session start or after a break. Triggers: "status", "sitrep", "where are we", "project state", "after break".
			
 
				 argument-hint: (no arguments needed)
			
 
				 disable-model-invocation: true
			
 
				 allowed-tools: Read, Bash, Glob, Grep
			
 
				 ---
			
 
				 
			
 
				 Load and follow strictly:
			
 
				-- $HOME/.claude/agents/status-reporter.md
			
 
				+- `$HOME/.claude/agents/status-reporter.md`
			
 
				 
			
 
				 Produce the full PROJECT STATUS report for the current working directory.
			
 
				 
			
 
				+## Fallback when agent file missing
			
 
				+
			
 
				+If `$HOME/.claude/agents/status-reporter.md` is unreachable (deleted, permission denied, broken symlink):
			
 
				+
			
 
				+1. Emit: `Status agent missing — restore ~/.claude/agents/status-reporter.md.`
			
 
				+2. STOP. Do not improvise a manual report — partial snapshots mislead.
			
 
				+
			
 
				 $ARGUMENTS