diff --git a/.claude/memory/evals.md b/.claude/memory/evals.md index 0d55720..66ec3cd 100644 --- a/.claude/memory/evals.md +++ b/.claude/memory/evals.md @@ -30,6 +30,7 @@ rules: | EVAL-007 | 2026-06-26 | Coupled-capitalize machinery — TDD 13 + e2e, surgical scope proven | keep | | EVAL-008 | 2026-06-27 | Doc-sync coupled machinery — 28/28 real-exec, swap-sweep caught prior debt | keep | | EVAL-009 | 2026-06-27 | deploy skill subagent-driven build: multi-stage review + pressure-test net-positive | keep | +| EVAL-010 | 2026-06-29 | prune-memory hardening: RED-7 deterministic fix + RED-8 accept + 34-row index backfill | keep | --- @@ -113,3 +114,9 @@ rules: - **method**: per-task review (sonnet; opus on the keystone) + writing-skills pressure-test (fresh agent on a `PENDING.json`+moved-HEAD fixture) + final whole-branch review (opus). - **anomalies**: (1) the PLAN's code carried 3 latent bugs — missing `git add` for new files, SC2086 unquoted `$viol`, comment-before-shebang SC1128 — all caught by the implementer's TDD+shellcheck gate → plan-code is a DRAFT, the test gate is load-bearing. (2) the final whole-branch review caught 2 Important seam-bugs INVISIBLE to per-task reviews: target-repo `.claude/`-ignored silent no-op ([[LRN-066]]) + `NEXT.sh`-absence non-regeneration → holistic review earns its keep. (3) pressure-test confirmed the cold-resume discipline holds under temptation (the agent excluded the moved-HEAD `0034`). (4) a reviewer subagent bugged out once (user killed it) → re-dispatched clean (transient, not a finding). - **action**: keep. Multi-stage adversarial review + a behavioral pressure-test caught classes of bug single-pass review misses — worth the cost on a keystone skill. + +## EVAL-010 — prune-memory hardening (RED-7 fix + RED-8 accept + 34-row index backfill) +- **Date**: 2026-06-29 +- **method**: read-first cartography (sub-agent, confirmed) → RED-7 closed by a DETERMINISTIC test ([[LRN-046]]) + STEP-2 example fictionalized → RED-8 re-reviewed, consciously accepted ([[LRN-047]]) → 34 missing Index rows composed + inserted in ID-sorted slots → STEP-4 verify zero MISSING/ORPHAN; deterministic suite all-green, shellcheck clean. +- **anomalies**: (1) RED-7 test FALSE-GREEN caught in real time — ugrep parsed `-9..` as an option → empty → green; fixed via /usr/bin/grep ([[LRN-074]]). The RED was WATCHED, not assumed. (2) RED-7 premise verified: LRN-014/016 ARE complementary → the old example modeled a WRONG merge, not just primed it. (3) backfill: 4/5 title-derived Applies-to (the awk-missed entries) missed a real future-app nuance on re-read → corrected before insert (without the 5-check, 4 Index rows would have diverged from source, engraved forever). (4) almost wrote a colliding EVAL-009 (deploy) — read the file first → EVAL-010. (5) pre-existing LRN-021 Index row out of ID-order → moved. +- **action**: keep. RED-7 GREEN (deterministic), RED-8 documented-accept, drift 34→0. [[LRN-073]] + [[LRN-074]] engraved — 2 pattern-families this session (fail-silent [[LRN-066]]/[[LRN-071]] + command-assumption [[LRN-074]]). diff --git a/.claude/memory/journal.md b/.claude/memory/journal.md index deef675..5decc46 100644 --- a/.claude/memory/journal.md +++ b/.claude/memory/journal.md @@ -234,4 +234,11 @@ rules: ## 2026-06-29 (cont. 2) — BLK-011 resolved by REMOVAL (init-project GSD bootstrap) - User challenge reframed the chantier: don't plumb a commit for the stranded ROADMAP — ask if gsd belongs at init AT ALL. Read REFUTED both my option-premises (gsd ≫ roadmap; TODO ≠ gsd ROADMAP) but conclusion A (remove STEP 12) held for a STRONGER reason: speculative auto-bootstrap of an unused multi-session engine at creation is bad per se. Best fix = NEGATIVE diff ([[LRN-072]]). - Removed init-project STEP 12 (+ header 12→11-step, 10c note, 4 USAGE coherence fixes). Coherence sweep = zero dangling STEP-12 refs (the "test" for a removal). Deliberate gsd use KEPT (onboarder PHASE 6, plugin-advisor, status-reporter). [[BLK-011]] → resolved. -- Branch `bugfix/blk-011-gsd-roadmap`; no finish yet (awaiting signal). +- Branch `bugfix/blk-011-gsd-roadmap`; FINISHED → develop (`ce4391a`) on explicit signal; pushed develop to origin (6 commits, SSH). + +## 2026-06-29 (cont. 3) — prune-memory hardening (RED-7/8 + index backfill) +- Read-first cartography (confirmed my own measurements). RED-7 (example-priming): the STEP-2 example named live LRN-014+016 and modeled merging them — verified COMPLEMENTARY, a merge the skill forbids. Fix = fictionalize example to 9xx + DETERMINISTIC test ([[LRN-046]], not flaky behavioral). [[LRN-073]]. +- RED-7 test caught its OWN false-green in real time: ugrep parsed `-9..` as an option → empty → green; fixed via /usr/bin/grep. 4th command-assumption miss this session → [[LRN-074]] (2nd engraved pattern-family, alongside fail-silent [[LRN-066]]/[[LRN-071]]). +- RED-8 (added-negation): consciously ACCEPTED as documented limit ([[LRN-047]] — FP-prone guard worse than honest limit on a destructive skill). +- Index backfill: 34 missing rows (decisions 11, learnings 21, blockers 2) composed + ID-sorted insert; drift 34→0, STEP-4 verify OK. Re-read the 5 awk-missed Applies-to → 4 corrected a nuance the title dropped. Moved pre-existing out-of-order LRN-021. [[EVAL-010]]. +- Branch `bugfix/prune-memory-hardening`; no finish yet (awaiting signal). LAST of 3 chantiers. diff --git a/.claude/memory/learnings.md b/.claude/memory/learnings.md index bd7f0ad..051ebb5 100644 --- a/.claude/memory/learnings.md +++ b/.claude/memory/learnings.md @@ -92,6 +92,8 @@ rules: | LRN-070 | 2026-06-29 | clean-tree-gated migration blocked by a dirty submodule → diagnose pointer-vs-content; for a local edit use `submodule..ignore=dirty`, never blind reset | migrating/releasing a superproject whose submodule carries intentional local edits | | LRN-071 | 2026-06-29 | fail-loud must cover the helper's OWN commit, not just its inputs — 3rd occurrence of the swallowed-commit pattern (a failed op masked by a later returning-0 statement) | any helper whose return value gates a downstream "success" — audit every fallible internal op propagates, esp. the commit | | LRN-072 | 2026-06-29 | a stranded-artifact bug can be fixed by NOT creating the artifact (negative diff), not by plumbing its commit — if the producing step is speculative/unused, delete it | a stranded/duplicated/uncommitted-artifact bug — before building machinery, ask if the PRODUCING step is wanted; speculative-at-creation → remove, deliberate-on-demand → keep | +| LRN-073 | 2026-06-29 | a skill's worked-example must use FICTIONAL ids, never live registry ids (they prime real-data behavior) | any skill/agent with a worked example over the SAME data it operates on — use reserved/fictional ids; test deterministically that no live id appears | +| LRN-074 | 2026-06-29 | system `grep`/`awk` may be ugrep/mawk: don't assume flag-parsing, use `/usr/bin/grep`, watch the RED go red (4th command-assumption miss this session) | any shell test/guard riding on grep/awk/sed semantics — pin `/usr/bin/`, run the assertion, confirm it reds on the defect before trusting green | --- @@ -828,3 +830,13 @@ rules: - **pattern**: 3rd member of the post-FINISH-artifact class (memory, docs, GSD ROADMAP) — but UNLIKE the first two (real artifacts ALWAYS produced → couple a commit), the GSD artifact came from a SPECULATIVE, opt-in, rarely-used producer (init-project auto-bootstrapping a multi-session engine at project creation). The reflex fix (reorder + build `gsd-commit.sh` + tests) would have added machinery to faithfully commit an artifact nobody uses. The right fix was a NEGATIVE diff: delete the producer → orphan never created → bug dissolves, zero new code (BLK-011). - **the refutation that got there**: the framing "ROADMAP redundant with TODO" was WRONG (gsd ≫ roadmap = state machine/crash-recovery/cost/parallel/worktree; TODO ≠ gsd ROADMAP = different altitude + consumer). Reading REFUTED both premises, yet the CONCLUSION (remove the step) held for a STRONGER reason: speculatively scaffolding a heavy engine the sole user doesn't use, at creation, is bad per se. Right answer, reason corrected before engraving — change the QUESTION before changing the code. - **future application**: a stranded / duplicated / uncommitted-artifact bug → BEFORE building machinery to handle the artifact, ask whether the step that PRODUCES it is actually used / wanted / non-speculative. Speculative or unused (esp. a personal/single-user repo) → DELETE the producer; the cleanest fix is the absent one. Distinguish speculative-at-creation (REMOVE) from deliberate-on-demand (KEEP). Family: [[BLK-010]], [[BLK-011]], [[BDR-036]]. + +## LRN-073 — a skill's worked-example must use FICTIONAL ids, never live registry ids (they prime real-data behavior) +- **pattern**: prune-memory's STEP-2 plan example named real LRN-014 + LRN-016 ("merge these"). A real-data run merged exactly that pair — though they're COMPLEMENTARY (header-ids vs checkbox-CSS), a merge its own rule forbids. Example ids that match live entries, in context at audit time, PRIME the action: you can't tell "judged correctly" from "pattern-matched its own example". +- **fix**: fictionalize example ids (9xx — can't match a live registry) + make the example model a CORRECT action. Lock it DETERMINISTICALLY ([[LRN-046]]): assert the example carries only fictional ids — not a flaky behavioral "did priming fire" test (RED-7). +- **future application**: any skill/agent whose instructions contain a worked example over the SAME data it operates on (registries/files/records) → use reserved/fictional identifiers; test deterministically that no live id appears in the example block. + +## LRN-074 — system `grep`/`awk` may be ugrep/mawk: don't assume flag-parsing, use `/usr/bin/grep`, watch the RED go red +- **pattern**: a RED-7 test used `grep -vE '-9[0-9][0-9]$'`; the system grep is UGREP → parsed the leading `-9..` as an OPTION → errored → empty → FALSE GREEN (a RED that never goes red). Caught only because the output was READ, not assumed. 4th time this session an assumed command behavior was false on execution (after `set -o pipefail` + `grep -q` SIGPIPE, …). The skill's own verify already hard-codes `/usr/bin/grep` (line 189) for this exact reason — re-learned. +- **fix**: `/usr/bin/grep` (GNU) where GNU semantics matter; avoid leading-dash regex args (or use `-e`/`--`); never trust the system tool is GNU/POSIX (mawk≠gawk, ugrep≠grep). +- **future application**: any shell test/guard whose correctness rides on grep/awk/sed semantics → pin `/usr/bin/` AND run the assertion, confirming it goes red on the defect before trusting green. Execute, don't assume command behavior. RECURRENT motif — audit any "assumed tool behavior" the way the fail-silent family ([[LRN-066]]/[[LRN-071]]) is audited. diff --git a/.claude/tasks/TODO.md b/.claude/tasks/TODO.md index f2089f8..a87ecff 100644 --- a/.claude/tasks/TODO.md +++ b/.claude/tasks/TODO.md @@ -296,4 +296,13 @@ stronger reason: speculative auto-bootstrap of an unused engine at creation is b - [x] Ref-coherence sweep ("test" for a removal) — header 12→11-step, 10c note, 4 USAGE refs; zero dangling STEP-12 refs repo-wide - [x] Scope guardrails — deliberate gsd use KEPT (onboarder PHASE 6, plugin-advisor, status-reporter) - [x] Capitalize — [[BLK-011]] resolved (true reason + premise trace) + [[LRN-072]] + CHANGELOG Removed + journal 2026-06-29 (cont. 2) -- [ ] FINISH — merge bugfix/blk-011-gsd-roadmap → develop (awaiting explicit human signal) +- [x] FINISH — merged bugfix/blk-011-gsd-roadmap → develop (`ce4391a`); develop pushed to origin (6 commits, SSH) + +## 2026-06-29 — prune-memory hardening (RED-7/8 + index backfill) [branch bugfix/prune-memory-hardening] +LAST of 3 chantiers. Read-first cartography confirmed RED-7/8 + measured 34-row index drift. +- [x] RED-7 (example-priming) — fictionalized STEP-2 example to 9xx ids (live ids primed a wrong merge of complementary LRN-014/016); DETERMINISTIC test (run-deterministic.sh) per [[LRN-046]]. Caught its own ugrep false-green → /usr/bin/grep ([[LRN-074]]). [[LRN-073]] +- [x] RED-8 (added-negation inversion) — consciously ACCEPTED as documented limit in BACKLOG ([[LRN-047]]); no fragile guard built +- [x] Index backfill — 34 missing rows (decisions 11, learnings 21, blockers 2) composed + ID-sorted insert; drift 34→0, STEP-4 verify OK; moved pre-existing out-of-order LRN-021 +- [x] Capitalize — [[LRN-073]] + [[LRN-074]] + [[EVAL-010]] + journal 2026-06-29 (cont. 3) +- [ ] FINISH — merge bugfix/prune-memory-hardening → develop (awaiting explicit human signal) +- [ ] PUSH — develop → origin after the 3 chantiers land (awaiting explicit human signal)