diff --git a/.claude/memory/blockers.md b/.claude/memory/blockers.md index bfee0e5..eab7db2 100644 --- a/.claude/memory/blockers.md +++ b/.claude/memory/blockers.md @@ -22,6 +22,7 @@ rules: |----|------|---------|--------| | BLK-001 | 2026-04-22 | `rtk curl` breaks JSON pipelines | upstream | | BLK-002 | 2026-04-23 | `rmdir` denied in sandbox on empty directory | resolved | +| BLK-003 | 2026-05-12 | `scripts/screenshot.mjs` hardcoded macOS path blocks PNG cards on Linux | upstream | --- @@ -44,4 +45,15 @@ rules: - **Solution**: - This session: `git rm tasks/*.md` handled files individually (via `git rm`, cleared gate). Git auto-detected renames to `.claude/tasks/`, so `tasks/` directory removed implicitly at commit time. - If dir persists empty after `git rm`: ask user to run `rmdir tasks` manually. -- **Status**: resolved (fixed via `git rm` + rename auto-detection; no `rmdir` needed in practice). \ No newline at end of file +- **Status**: resolved (fixed via `git rm` + rename auto-detection; no `rmdir` needed in practice). +## BLK-003 — `scripts/screenshot.mjs` hardcoded macOS path blocks PNG cards on Linux + +- **Date**: 2026-05-12 +- **Friction**: `/darwin-skill` Phase 3 generates result cards via `node ~/.agents/skills/darwin-skill/scripts/screenshot.mjs `. On Linux: script fails immediately — `require('/Users/alchain/.npm-global/lib/node_modules/playwright/node_modules/playwright-core')` resolves to a non-existent macOS user path. No PNG cards produced; Phase 3 falls back to markdown report only. +- **Real cause**: upstream `alchaincyf/darwin-skill` author dev'd on macOS, shipped absolute path to their own homedir's global npm install of playwright. Zero portability layer (no PATH lookup, no `playwright` bare require, no fallback to `npx`). +- **Solution**: + - Workaround (used 2026-05-12): skip PNG generation, deliver markdown + HTML cards (HTML viewable in browser without playwright). + - Local patch: `npm i -g playwright` then replace `require('/Users/alchain/...')` with `require('playwright')`. Two lines edit. + - Spec-documented fallback: `npx playwright screenshot "file:///path/to/card.html#" out.png --viewport-size=960,1280 --wait-for-timeout=2000` — works without modifying the file, costs ~150MB chromium download. + - PR upstream to `github.com/alchaincyf/darwin-skill` once tested. +- **Status**: upstream (third-party skill at `~/.agents/skills/darwin-skill/scripts/screenshot.mjs`, not in any of our repos). diff --git a/.claude/memory/decisions.md b/.claude/memory/decisions.md index f6402ef..58700c0 100644 --- a/.claude/memory/decisions.md +++ b/.claude/memory/decisions.md @@ -36,6 +36,7 @@ rules: | BDR-012 | 2026-05-07 | client-handover cover: white bg + green accents + PNG logo default | accepted | | BDR-013 | 2026-05-11 | client-handover: 6-chapter doc — promote scores §2 + NAP §4 | accepted | | BDR-014 | 2026-05-11 | Personal SKILL.md descriptions: "Use when [triggers]…" pattern + 1024-char spec limit | accepted | +| BDR-015 | 2026-05-12 | Exclude broken gstack symlinks from /darwin-skill scope (external ownership) | accepted | --- @@ -264,3 +265,20 @@ rules: - Orchestrators still describe orchestration role explicitly (e.g. client-handover: "Multi-agent orchestrator: dispatches the client-handover-writer agent which spawns parallel /seo + /harden subagents") — that's role identification, not workflow summary. - Other 10 personal skills (analyze, bugfix, code-clean, commit-change, feat, hotfix, plugin-check, refactor, status, skills-perso) still partially summarize workflow but stay under 1024 chars. Not retrofitted in this pass — flagged for follow-up only if shortcut symptoms observed. - **Reference**: commit `1da6a31`, 8 SKILL.md files (client-handover, doc, geo, seo, validate, ship-feature, init-project, onboard), superpowers:writing-skills "CSO" section, agentskills.io/specification. + +--- + +## BDR-015 — Exclude broken gstack symlinks from /darwin-skill scope (external ownership) + +- **Date**: 2026-05-12 +- **Status**: accepted +- **Decision**: 5 dirs in `~/Documents/claude/skills/` whose `SKILL.md` symlinks point to non-existent gstack paths (`skills-external/gstack//SKILL.md` missing) — `benchmark-models`, `context-restore`, `context-save`, `make-pdf`, `plan-tune` — are excluded from `/darwin-skill` baseline + optimization. Marked `status=error` in `results.tsv` with note `broken gstack symlink — out of scope`. NOT scored, NOT optimized, NOT deleted. +- **Why**: darwin-skill constraint #1 forbids changing a skill's core function — implies external/gstack-owned skills are out of scope. Symlinks resolve to `skills-external/gstack` which is third-party submodule. Plus the targets are broken — gstack's actual layout (`benchmark/`, `health/`, `qa/`, etc.) doesn't include these 5 names, suggesting upstream rename or removal. Repairing them is a separate triage task, not darwin's concern. +- **Alternatives rejected**: + - Fix symlinks first then darwin-optimize → out of scope, blocks the optimization queue on gstack archaeology. + - Score them with `FILE_NOT_FOUND` and include in averages → biases stats, mixes signal with infrastructure issue. + - Optimize the gstack source files directly → external ownership, never modify. + - Delete the broken symlinks → would obscure that the user once expected these to exist; leave for triage. +- **Caveats**: + - If/when symlinks are repaired (real gstack target exists), re-run baseline to bring them in scope. + - Bigger picture: `benchmark-models` looks like a deliberate rename of gstack's `benchmark` to disambiguate from the gstack-skill called `/benchmark`. Could be a planned migration that stalled. Worth a one-line ticket separate from darwin. diff --git a/.claude/memory/journal.md b/.claude/memory/journal.md index ec16e70..f69f901 100644 --- a/.claude/memory/journal.md +++ b/.claude/memory/journal.md @@ -87,3 +87,15 @@ rules: - NAP checklist polish (commit `abd2612`): added "Description courte" field + replaced retired BrightLocal Free Tools with Moz Local Citation Checker (LRN-015). - CSS bugfix (commit `465fe9e`): pandoc GFM checkbox markup `
  • text…
  • ` has no wrapper class, adjacent-sibling rule `li input + *` yanks ``/`` siblings out of flow. Fixed by targeting `li > input[type="checkbox"]` directly. Captured as LRN-016. - 4 atomic commits `b15b275..1da6a31` via `/commit-change`. Decisions BDR-013, BDR-014 + learnings LRN-014, LRN-015, LRN-016 capitalized. Pre-existing BDR-012 + LRN-013 Index rows backfilled (prior session entries existed in body but missing from Index). + +## 2026-05-12 + +- Ran `/darwin-skill` full pipeline on cwd repo (real skill source, not `~/.claude/skills/` runtime mirror). Baseline scored 23 personal skills + 5 broken gstack symlinks excluded. Avg baseline 75.6. +- Phase 2 round 1 on bottom 5: status 45.3→76.2 (+30.9), refactor 48.4→74.3 (+25.9), plugin-check 59.2→76.8 (+17.6), skills-perso 66.4→80.1 (+13.7), commit-change 69.6→83.5 (+13.9). All KEEP. Avg 58.0→78.2 (+20.2/skill). +- Rounds 2-3 skipped — diminishing returns past round 1 on dispatcher pattern. graphify (29.0, 62KB SKILL.md) deferred to Phase 2.5 exploratory rewrite per user. +- Pattern observed: thin-dispatcher round-1 invariant = fallback + frontmatter triggers. Replicable across the 4 dispatchers tested. Captured as LRN-017. +- Methodology gotcha: darwin eval subagents drift on total math (factor-10 errors, D8 weight 7 vs 25). Direction reliable, magnitude noisy. Captured as LRN-018. Recompute totals in main thread going forward. +- BDR-015: broken gstack symlinks (5 dirs) excluded from darwin scope — external ownership + missing targets. +- BLK-003: `scripts/screenshot.mjs` hardcoded macOS path → PNG cards skipped on Linux. Markdown report + 5 new test-prompts.json + 5 optimized SKILL.md only. Upstream issue, workaround in place. +- Branch `auto-optimize/20260512-1319` merged via `--no-ff` to master. 6 commits land. Report at `.claude/audits/DARWIN-SKILL-2026-05-12.md`. results.tsv at `~/.agents/skills/darwin-skill/results.tsv` (33 rows). +- Pre-existing uncommitted `agents/doc-syncer.md` (mtime 15:33, before session) NOT touched — left for the work session that owns it. diff --git a/.claude/memory/learnings.md b/.claude/memory/learnings.md index fdd9bdf..b592bc9 100644 --- a/.claude/memory/learnings.md +++ b/.claude/memory/learnings.md @@ -36,6 +36,8 @@ rules: | LRN-014 | 2026-05-11 | Pandoc base gfm strips header id attrs — need `gfm+gfm_auto_identifiers` | any MD→HTML/PDF with cross-references (`[§4](#nap)`) via pandoc | | LRN-015 | 2026-05-11 | BrightLocal Free Tools retired 2026 — Moz Local Citation Checker is free replacement | client SEO/NAP docs — re-validate tool URLs + free-tier status annually | | LRN-016 | 2026-05-11 | Pandoc GFM checkbox markup breaks adjacent-sibling CSS — target `li > input` directly | styling task-list checkboxes in pandoc-rendered HTML/PDF | +| LRN-017 | 2026-05-12 | Thin-dispatcher SKILL.md round-1 win = fallback + frontmatter triggers (+15 to +30) | any `/darwin-skill` round-1 on a dispatcher SKILL.md | +| LRN-018 | 2026-05-12 | Darwin eval subagents drift on total math — recompute in main thread | any subagent-driven SKILL.md rescore | --- @@ -232,3 +234,34 @@ rules: - Render checklist with realistic content (``, ``, ``) before signing off — bare text bullets won't surface the bug. - Symptom signature: rendered PDF has overlapping inline elements ONLY in task lists — points to a sibling-selector rule firing on inline content. - **Reference**: `skills/client-handover/resources/branding/zenquality.css` `li > input[type="checkbox"]` rule + `li.task-list-item::before` (lines 372–410). Commit `465fe9e`. + +--- + +## LRN-017 — Thin-dispatcher SKILL.md round-1 win = fallback + frontmatter triggers (+15 to +30) + +- **Date**: 2026-05-12 +- **Pattern**: thin-dispatcher SKILL.md (delegates to `agents/.md`, body 15-30 lines, no inline workflow) scores low on darwin rubric (45-70) because dims D2/D3/D4/D5 punish empty body. Round-1 universal fix: + 1. Add fallback clause — `If $HOME/.claude/agents/.md unreachable, emit " agent missing." and STOP. Never improvise — silent behavior change is unsafe.` + 2. Add triggers to frontmatter `description` — explicit `Triggers: "", "", "".` + 3. For destructive skills (refactor, commit-change): add safety rationale + pre-flight check stub. + Δ +13 to +31 observed: status 45.3→76.2 (+30.9), refactor 48.4→74.3 (+25.9), plugin-check 59.2→76.8 (+17.6), commit-change 69.6→83.5 (+13.9). 150% byte cap tight — trim aggressively. +- **Context**: `/darwin-skill` run 2026-05-12, branch `auto-optimize/20260512-1319` merged to master, 5 commits. skills-perso (66.4→80.1, +13.7) NOT a dispatcher — different patch (Known-limits subsection on the heuristic). +- **Future application**: + - Any darwin round-1 on a dispatcher SKILL.md → skip diagnosis, apply this template directly. Saves one eval cycle. + - After round 1, gains flatten near 75-80 → pivot to next-lowest skill, do not grind rounds 2-3 on same target. + - For thin originals (<500B), 150% cap is the binding constraint — pre-trim drafts before committing. +- **Reference**: `.claude/audits/DARWIN-SKILL-2026-05-12.md`. Commits `512df48`..`134561d`. results.tsv at `~/.agents/skills/darwin-skill/results.tsv`. + +--- + +## LRN-018 — Darwin eval subagents drift on total math — recompute in main thread + +- **Date**: 2026-05-12 +- **Pattern**: analyzer subagents asked to score SKILL.md and compute weighted total drift on the formula. Two recurring errors: (a) divide `Σ(dim×weight)` by `100` instead of `10` (off by factor 10 — produces 6.17 instead of 61.7, then sometimes the subagent silently re-multiplies); (b) use D8 weight 7 instead of the spec value 25 (status: spec says D8 weight = 25, easy to confuse with D4 weight = 7). Per-dim judgments themselves stable across runs; computed totals unreliable. +- **Context**: 5 round-1 evals during darwin 2026-05-12. Refactor subagent computed 743÷10 correctly in scratch but wrote `617/100 = 61.7` — actual correct total 74.3. Subsequent prompts explicitly stating "D8 weight is 25" cleared the second error. +- **Future application**: + - Prompt subagent for dim scores only, not weighted total. Main thread computes `Σ(dim_i × weight_i) / 10` deterministically. + - If subagent must compute, include weight table in prompt AND show example computation for one row. + - When comparing baseline vs round-N, use main-thread recomputed totals on BOTH sides, not the two subagents' self-reported numbers. + - Score recalibration between baseline subagent and round-1 subagent is real (independent re-anchoring) — first-round Δ tends to overstate improvement. Direction reliable, magnitude noisy. +- **Reference**: see "Methodology notes" section of `.claude/audits/DARWIN-SKILL-2026-05-12.md`.