Ver código fonte

memory: capitalize darwin 2026-05-12 — LRN-017/018 + BLK-003 + BDR-015

LRN-017: thin-dispatcher SKILL.md round-1 invariant = fallback + triggers.
LRN-018: darwin eval subagents drift on total math (factor-10, D8 weight).
BLK-003: scripts/screenshot.mjs hardcoded macOS path blocks PNG cards on Linux.
BDR-015: exclude broken gstack symlinks from /darwin-skill scope.
journal: 2026-05-12 session entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bastien 4 dias atrás
pai
commit
1bf02eaa61

+ 13 - 1
.claude/memory/blockers.md

@@ -22,6 +22,7 @@ rules:
 |----|------|---------|--------|
 | BLK-001 | 2026-04-22 | `rtk curl` breaks JSON pipelines | upstream |
 | BLK-002 | 2026-04-23 | `rmdir` denied in sandbox on empty directory | resolved |
+| BLK-003 | 2026-05-12 | `scripts/screenshot.mjs` hardcoded macOS path blocks PNG cards on Linux | upstream |
 
 ---
 
@@ -44,4 +45,15 @@ rules:
 - **Solution**:
   - This session: `git rm tasks/*.md` handled files individually (via `git rm`, cleared gate). Git auto-detected renames to `.claude/tasks/`, so `tasks/` directory removed implicitly at commit time.
   - If dir persists empty after `git rm`: ask user to run `rmdir tasks` manually.
-- **Status**: resolved (fixed via `git rm` + rename auto-detection; no `rmdir` needed in practice).
+- **Status**: resolved (fixed via `git rm` + rename auto-detection; no `rmdir` needed in practice).
+## BLK-003 — `scripts/screenshot.mjs` hardcoded macOS path blocks PNG cards on Linux
+
+- **Date**: 2026-05-12
+- **Friction**: `/darwin-skill` Phase 3 generates result cards via `node ~/.agents/skills/darwin-skill/scripts/screenshot.mjs <html> <png>`. On Linux: script fails immediately — `require('/Users/alchain/.npm-global/lib/node_modules/playwright/node_modules/playwright-core')` resolves to a non-existent macOS user path. No PNG cards produced; Phase 3 falls back to markdown report only.
+- **Real cause**: upstream `alchaincyf/darwin-skill` author dev'd on macOS, shipped absolute path to their own homedir's global npm install of playwright. Zero portability layer (no PATH lookup, no `playwright` bare require, no fallback to `npx`).
+- **Solution**:
+  - Workaround (used 2026-05-12): skip PNG generation, deliver markdown + HTML cards (HTML viewable in browser without playwright).
+  - Local patch: `npm i -g playwright` then replace `require('/Users/alchain/...')` with `require('playwright')`. Two lines edit.
+  - Spec-documented fallback: `npx playwright screenshot "file:///path/to/card.html#<theme>" out.png --viewport-size=960,1280 --wait-for-timeout=2000` — works without modifying the file, costs ~150MB chromium download.
+  - PR upstream to `github.com/alchaincyf/darwin-skill` once tested.
+- **Status**: upstream (third-party skill at `~/.agents/skills/darwin-skill/scripts/screenshot.mjs`, not in any of our repos).

+ 18 - 0
.claude/memory/decisions.md

@@ -36,6 +36,7 @@ rules:
 | BDR-012 | 2026-05-07 | client-handover cover: white bg + green accents + PNG logo default | accepted |
 | BDR-013 | 2026-05-11 | client-handover: 6-chapter doc — promote scores §2 + NAP §4 | accepted |
 | BDR-014 | 2026-05-11 | Personal SKILL.md descriptions: "Use when [triggers]…" pattern + 1024-char spec limit | accepted |
+| BDR-015 | 2026-05-12 | Exclude broken gstack symlinks from /darwin-skill scope (external ownership) | accepted |
 
 ---
 
@@ -264,3 +265,20 @@ rules:
   - Orchestrators still describe orchestration role explicitly (e.g. client-handover: "Multi-agent orchestrator: dispatches the client-handover-writer agent which spawns parallel /seo + /harden subagents") — that's role identification, not workflow summary.
   - Other 10 personal skills (analyze, bugfix, code-clean, commit-change, feat, hotfix, plugin-check, refactor, status, skills-perso) still partially summarize workflow but stay under 1024 chars. Not retrofitted in this pass — flagged for follow-up only if shortcut symptoms observed.
 - **Reference**: commit `1da6a31`, 8 SKILL.md files (client-handover, doc, geo, seo, validate, ship-feature, init-project, onboard), superpowers:writing-skills "CSO" section, agentskills.io/specification.
+
+---
+
+## BDR-015 — Exclude broken gstack symlinks from /darwin-skill scope (external ownership)
+
+- **Date**: 2026-05-12
+- **Status**: accepted
+- **Decision**: 5 dirs in `~/Documents/claude/skills/` whose `SKILL.md` symlinks point to non-existent gstack paths (`skills-external/gstack/<name>/SKILL.md` missing) — `benchmark-models`, `context-restore`, `context-save`, `make-pdf`, `plan-tune` — are excluded from `/darwin-skill` baseline + optimization. Marked `status=error` in `results.tsv` with note `broken gstack symlink — out of scope`. NOT scored, NOT optimized, NOT deleted.
+- **Why**: darwin-skill constraint #1 forbids changing a skill's core function — implies external/gstack-owned skills are out of scope. Symlinks resolve to `skills-external/gstack` which is third-party submodule. Plus the targets are broken — gstack's actual layout (`benchmark/`, `health/`, `qa/`, etc.) doesn't include these 5 names, suggesting upstream rename or removal. Repairing them is a separate triage task, not darwin's concern.
+- **Alternatives rejected**:
+  - Fix symlinks first then darwin-optimize → out of scope, blocks the optimization queue on gstack archaeology.
+  - Score them with `FILE_NOT_FOUND` and include in averages → biases stats, mixes signal with infrastructure issue.
+  - Optimize the gstack source files directly → external ownership, never modify.
+  - Delete the broken symlinks → would obscure that the user once expected these to exist; leave for triage.
+- **Caveats**:
+  - If/when symlinks are repaired (real gstack target exists), re-run baseline to bring them in scope.
+  - Bigger picture: `benchmark-models` looks like a deliberate rename of gstack's `benchmark` to disambiguate from the gstack-skill called `/benchmark`. Could be a planned migration that stalled. Worth a one-line ticket separate from darwin.

+ 12 - 0
.claude/memory/journal.md

@@ -87,3 +87,15 @@ rules:
 - NAP checklist polish (commit `abd2612`): added "Description courte" field + replaced retired BrightLocal Free Tools with Moz Local Citation Checker (LRN-015).
 - CSS bugfix (commit `465fe9e`): pandoc GFM checkbox markup `<li><input ...> text…</li>` has no wrapper class, adjacent-sibling rule `li input + *` yanks `<a>`/`<code>` siblings out of flow. Fixed by targeting `li > input[type="checkbox"]` directly. Captured as LRN-016.
 - 4 atomic commits `b15b275..1da6a31` via `/commit-change`. Decisions BDR-013, BDR-014 + learnings LRN-014, LRN-015, LRN-016 capitalized. Pre-existing BDR-012 + LRN-013 Index rows backfilled (prior session entries existed in body but missing from Index).
+
+## 2026-05-12
+
+- Ran `/darwin-skill` full pipeline on cwd repo (real skill source, not `~/.claude/skills/` runtime mirror). Baseline scored 23 personal skills + 5 broken gstack symlinks excluded. Avg baseline 75.6.
+- Phase 2 round 1 on bottom 5: status 45.3→76.2 (+30.9), refactor 48.4→74.3 (+25.9), plugin-check 59.2→76.8 (+17.6), skills-perso 66.4→80.1 (+13.7), commit-change 69.6→83.5 (+13.9). All KEEP. Avg 58.0→78.2 (+20.2/skill).
+- Rounds 2-3 skipped — diminishing returns past round 1 on dispatcher pattern. graphify (29.0, 62KB SKILL.md) deferred to Phase 2.5 exploratory rewrite per user.
+- Pattern observed: thin-dispatcher round-1 invariant = fallback + frontmatter triggers. Replicable across the 4 dispatchers tested. Captured as LRN-017.
+- Methodology gotcha: darwin eval subagents drift on total math (factor-10 errors, D8 weight 7 vs 25). Direction reliable, magnitude noisy. Captured as LRN-018. Recompute totals in main thread going forward.
+- BDR-015: broken gstack symlinks (5 dirs) excluded from darwin scope — external ownership + missing targets.
+- BLK-003: `scripts/screenshot.mjs` hardcoded macOS path → PNG cards skipped on Linux. Markdown report + 5 new test-prompts.json + 5 optimized SKILL.md only. Upstream issue, workaround in place.
+- Branch `auto-optimize/20260512-1319` merged via `--no-ff` to master. 6 commits land. Report at `.claude/audits/DARWIN-SKILL-2026-05-12.md`. results.tsv at `~/.agents/skills/darwin-skill/results.tsv` (33 rows).
+- Pre-existing uncommitted `agents/doc-syncer.md` (mtime 15:33, before session) NOT touched — left for the work session that owns it.

+ 33 - 0
.claude/memory/learnings.md

@@ -36,6 +36,8 @@ rules:
 | LRN-014 | 2026-05-11 | Pandoc base gfm strips header id attrs — need `gfm+gfm_auto_identifiers` | any MD→HTML/PDF with cross-references (`[§4](#nap)`) via pandoc |
 | LRN-015 | 2026-05-11 | BrightLocal Free Tools retired 2026 — Moz Local Citation Checker is free replacement | client SEO/NAP docs — re-validate tool URLs + free-tier status annually |
 | LRN-016 | 2026-05-11 | Pandoc GFM checkbox markup breaks adjacent-sibling CSS — target `li > input` directly | styling task-list checkboxes in pandoc-rendered HTML/PDF |
+| LRN-017 | 2026-05-12 | Thin-dispatcher SKILL.md round-1 win = fallback + frontmatter triggers (+15 to +30) | any `/darwin-skill` round-1 on a dispatcher SKILL.md |
+| LRN-018 | 2026-05-12 | Darwin eval subagents drift on total math — recompute in main thread | any subagent-driven SKILL.md rescore |
 
 ---
 
@@ -232,3 +234,34 @@ rules:
   - Render checklist with realistic content (`<a>`, `<code>`, `<strong>`) before signing off — bare text bullets won't surface the bug.
   - Symptom signature: rendered PDF has overlapping inline elements ONLY in task lists — points to a sibling-selector rule firing on inline content.
 - **Reference**: `skills/client-handover/resources/branding/zenquality.css` `li > input[type="checkbox"]` rule + `li.task-list-item::before` (lines 372–410). Commit `465fe9e`.
+
+---
+
+## LRN-017 — Thin-dispatcher SKILL.md round-1 win = fallback + frontmatter triggers (+15 to +30)
+
+- **Date**: 2026-05-12
+- **Pattern**: thin-dispatcher SKILL.md (delegates to `agents/<x>.md`, body 15-30 lines, no inline workflow) scores low on darwin rubric (45-70) because dims D2/D3/D4/D5 punish empty body. Round-1 universal fix:
+  1. Add fallback clause — `If $HOME/.claude/agents/<x>.md unreachable, emit "<X> agent missing." and STOP. Never improvise — silent behavior change is unsafe.`
+  2. Add triggers to frontmatter `description` — explicit `Triggers: "<keyword>", "<synonym>", "<i18n variant>".`
+  3. For destructive skills (refactor, commit-change): add safety rationale + pre-flight check stub.
+  Δ +13 to +31 observed: status 45.3→76.2 (+30.9), refactor 48.4→74.3 (+25.9), plugin-check 59.2→76.8 (+17.6), commit-change 69.6→83.5 (+13.9). 150% byte cap tight — trim aggressively.
+- **Context**: `/darwin-skill` run 2026-05-12, branch `auto-optimize/20260512-1319` merged to master, 5 commits. skills-perso (66.4→80.1, +13.7) NOT a dispatcher — different patch (Known-limits subsection on the heuristic).
+- **Future application**:
+  - Any darwin round-1 on a dispatcher SKILL.md → skip diagnosis, apply this template directly. Saves one eval cycle.
+  - After round 1, gains flatten near 75-80 → pivot to next-lowest skill, do not grind rounds 2-3 on same target.
+  - For thin originals (<500B), 150% cap is the binding constraint — pre-trim drafts before committing.
+- **Reference**: `.claude/audits/DARWIN-SKILL-2026-05-12.md`. Commits `512df48`..`134561d`. results.tsv at `~/.agents/skills/darwin-skill/results.tsv`.
+
+---
+
+## LRN-018 — Darwin eval subagents drift on total math — recompute in main thread
+
+- **Date**: 2026-05-12
+- **Pattern**: analyzer subagents asked to score SKILL.md and compute weighted total drift on the formula. Two recurring errors: (a) divide `Σ(dim×weight)` by `100` instead of `10` (off by factor 10 — produces 6.17 instead of 61.7, then sometimes the subagent silently re-multiplies); (b) use D8 weight 7 instead of the spec value 25 (status: spec says D8 weight = 25, easy to confuse with D4 weight = 7). Per-dim judgments themselves stable across runs; computed totals unreliable.
+- **Context**: 5 round-1 evals during darwin 2026-05-12. Refactor subagent computed 743÷10 correctly in scratch but wrote `617/100 = 61.7` — actual correct total 74.3. Subsequent prompts explicitly stating "D8 weight is 25" cleared the second error.
+- **Future application**:
+  - Prompt subagent for dim scores only, not weighted total. Main thread computes `Σ(dim_i × weight_i) / 10` deterministically.
+  - If subagent must compute, include weight table in prompt AND show example computation for one row.
+  - When comparing baseline vs round-N, use main-thread recomputed totals on BOTH sides, not the two subagents' self-reported numbers.
+  - Score recalibration between baseline subagent and round-1 subagent is real (independent re-anchoring) — first-round Δ tends to overstate improvement. Direction reliable, magnitude noisy.
+- **Reference**: see "Methodology notes" section of `.claude/audits/DARWIN-SKILL-2026-05-12.md`.