diff --git a/.claude/memory/blockers.md b/.claude/memory/blockers.md index e9a1e56..8f6e83b 100644 --- a/.claude/memory/blockers.md +++ b/.claude/memory/blockers.md @@ -27,6 +27,8 @@ rules: | BLK-005 | 2026-05-21 | gstack submodule rename (checkpoint→context-save) breaks profile entries | resolved | | BLK-006 | 2026-05-21 | `profile.sh current` false-negative via `~/.claude` symlink (`cd` not `cd -P`) | resolved | | BLK-007 | 2026-06-02 | 6 gstack source skills (ios-*, spec) unlinked post-bump — invisible to profiles + `gstack on` | resolved | +| BLK-008 | 2026-06-23 | gstack ./setup on Ubuntu 26.04: Playwright chromium unsupported → gstack browser (/browse, /qa, screenshots) silently dead | resolved (211c7d4) | +| BLK-009 | 2026-06-25 | user-level path-scoped rules (`paths:` frontmatter in `~/.claude/rules/`) never inject — broken in CC 2.1.190 (#21858) | upstream, open | | BLK-010 | 2026-06-27 | init-project: scaffold (STEP 5) + bootstrap README (5b) have no deterministic commit owner; worktree `add -b` on unborn HEAD | resolved (uncommitted) | | BLK-011 | 2026-06-27 | init-project STEP 13 GSD post-FINISH creates ROADMAP.md → stranded doc (3rd post-FINISH artifact) | resolved (STEP 12 removed) | | BLK-012 | 2026-06-29 | gitflow_init half-applied: socle-commit failure swallowed → hook activated on partial run → re-run self-blocks | resolved | diff --git a/.claude/memory/decisions.md b/.claude/memory/decisions.md index 781e2a5..a24aac8 100644 --- a/.claude/memory/decisions.md +++ b/.claude/memory/decisions.md @@ -42,8 +42,19 @@ rules: | BDR-018 | 2026-06-02 | `profile gstack on/off` verb — toggle gstack keeping active-profile label | accepted | | BDR-019 | 2026-06-09 | Remove `disable-model-invocation` repo-wide — align skills with CLAUDE.md routing | accepted | | BDR-020 | 2026-06-11 | `/audit-delta`: per-axis SHA markers + always-on fix gate + unreachable-first-run = full report-only | accepted | +| BDR-021 | 2026-06-27 | CLAUDE.md restructure: contradiction purge, project-specific sections labeled, critical sections never compressed | accepted | | BDR-022 | 2026-06-18 | doc-syncer scoped to public docs; `.claude/` + `CLAUDE.md` read-only context, never targets; conventions + clean mode | accepted | | BDR-023 | 2026-06-19 | Merge /close into /capitalize — 2 modes + TODO reconcile; /close alias | accepted | +| BDR-024 | 2026-06-27 | `profile show --plain` = claude-free parse contract for the design gate | accepted | +| BDR-025 | 2026-06-27 | Design gate profile-based; remedy `/profile design`; magic required-but-manual; unknown → fail-visible; claude via PATH-repair | accepted | +| BDR-026 | 2026-06-27 | Secret source-of-truth outside the repo (`~/.claude/.env`) reached via a `repo/.env` symlink | accepted | +| BDR-027 | 2026-06-27 | Minimal npm-via-nvm bootstrap over a centralized prereq lib | accepted | +| BDR-028 | 2026-06-27 | Hand-curated config install-immutable (auto-revert guard) + de-vendor installer-managed skills | accepted | +| BDR-029 | 2026-06-27 | Installer auto-fixes gstack browser on an OS newer than its pinned Playwright supports | accepted | +| BDR-030 | 2026-06-27 | gstack skills activated ON-DEMAND per profile, not pre-installed; OFF by default stays | accepted | +| BDR-031 | 2026-06-27 | global CLAUDE.md lightening = COMPRESSION, not path-scope / externalization | accepted | +| BDR-032 | 2026-06-27 | skill `/validate` → `/web-validate` (rename user surface, keep internals) | accepted | +| BDR-033 | 2026-06-27 | design-gate §4: anim-lib suggestion — suggest-only, non-blocking, stateless 1-line | accepted | | BDR-034 | 2026-06-26 | Coupled-capitalize invariant v1 — memory commit auto per dev flow (Frame 2) | accepted | | BDR-035 | 2026-06-26 | Analyze-before-plan invariant v1 — read-before bookend of coupled-capitalize | accepted | | BDR-036 | 2026-06-27 | Doc-sync coupled invariant — commit docs doc-syncer patches (twin of BDR-034, BUILT not reordered) | accepted | diff --git a/.claude/memory/evals.md b/.claude/memory/evals.md index 0d55720..66ec3cd 100644 --- a/.claude/memory/evals.md +++ b/.claude/memory/evals.md @@ -30,6 +30,7 @@ rules: | EVAL-007 | 2026-06-26 | Coupled-capitalize machinery — TDD 13 + e2e, surgical scope proven | keep | | EVAL-008 | 2026-06-27 | Doc-sync coupled machinery — 28/28 real-exec, swap-sweep caught prior debt | keep | | EVAL-009 | 2026-06-27 | deploy skill subagent-driven build: multi-stage review + pressure-test net-positive | keep | +| EVAL-010 | 2026-06-29 | prune-memory hardening: RED-7 deterministic fix + RED-8 accept + 34-row index backfill | keep | --- @@ -113,3 +114,9 @@ rules: - **method**: per-task review (sonnet; opus on the keystone) + writing-skills pressure-test (fresh agent on a `PENDING.json`+moved-HEAD fixture) + final whole-branch review (opus). - **anomalies**: (1) the PLAN's code carried 3 latent bugs — missing `git add` for new files, SC2086 unquoted `$viol`, comment-before-shebang SC1128 — all caught by the implementer's TDD+shellcheck gate → plan-code is a DRAFT, the test gate is load-bearing. (2) the final whole-branch review caught 2 Important seam-bugs INVISIBLE to per-task reviews: target-repo `.claude/`-ignored silent no-op ([[LRN-066]]) + `NEXT.sh`-absence non-regeneration → holistic review earns its keep. (3) pressure-test confirmed the cold-resume discipline holds under temptation (the agent excluded the moved-HEAD `0034`). (4) a reviewer subagent bugged out once (user killed it) → re-dispatched clean (transient, not a finding). - **action**: keep. Multi-stage adversarial review + a behavioral pressure-test caught classes of bug single-pass review misses — worth the cost on a keystone skill. + +## EVAL-010 — prune-memory hardening (RED-7 fix + RED-8 accept + 34-row index backfill) +- **Date**: 2026-06-29 +- **method**: read-first cartography (sub-agent, confirmed) → RED-7 closed by a DETERMINISTIC test ([[LRN-046]]) + STEP-2 example fictionalized → RED-8 re-reviewed, consciously accepted ([[LRN-047]]) → 34 missing Index rows composed + inserted in ID-sorted slots → STEP-4 verify zero MISSING/ORPHAN; deterministic suite all-green, shellcheck clean. +- **anomalies**: (1) RED-7 test FALSE-GREEN caught in real time — ugrep parsed `-9..` as an option → empty → green; fixed via /usr/bin/grep ([[LRN-074]]). The RED was WATCHED, not assumed. (2) RED-7 premise verified: LRN-014/016 ARE complementary → the old example modeled a WRONG merge, not just primed it. (3) backfill: 4/5 title-derived Applies-to (the awk-missed entries) missed a real future-app nuance on re-read → corrected before insert (without the 5-check, 4 Index rows would have diverged from source, engraved forever). (4) almost wrote a colliding EVAL-009 (deploy) — read the file first → EVAL-010. (5) pre-existing LRN-021 Index row out of ID-order → moved. +- **action**: keep. RED-7 GREEN (deterministic), RED-8 documented-accept, drift 34→0. [[LRN-073]] + [[LRN-074]] engraved — 2 pattern-families this session (fail-silent [[LRN-066]]/[[LRN-071]] + command-assumption [[LRN-074]]). diff --git a/.claude/memory/journal.md b/.claude/memory/journal.md index deef675..5decc46 100644 --- a/.claude/memory/journal.md +++ b/.claude/memory/journal.md @@ -234,4 +234,11 @@ rules: ## 2026-06-29 (cont. 2) — BLK-011 resolved by REMOVAL (init-project GSD bootstrap) - User challenge reframed the chantier: don't plumb a commit for the stranded ROADMAP — ask if gsd belongs at init AT ALL. Read REFUTED both my option-premises (gsd ≫ roadmap; TODO ≠ gsd ROADMAP) but conclusion A (remove STEP 12) held for a STRONGER reason: speculative auto-bootstrap of an unused multi-session engine at creation is bad per se. Best fix = NEGATIVE diff ([[LRN-072]]). - Removed init-project STEP 12 (+ header 12→11-step, 10c note, 4 USAGE coherence fixes). Coherence sweep = zero dangling STEP-12 refs (the "test" for a removal). Deliberate gsd use KEPT (onboarder PHASE 6, plugin-advisor, status-reporter). [[BLK-011]] → resolved. -- Branch `bugfix/blk-011-gsd-roadmap`; no finish yet (awaiting signal). +- Branch `bugfix/blk-011-gsd-roadmap`; FINISHED → develop (`ce4391a`) on explicit signal; pushed develop to origin (6 commits, SSH). + +## 2026-06-29 (cont. 3) — prune-memory hardening (RED-7/8 + index backfill) +- Read-first cartography (confirmed my own measurements). RED-7 (example-priming): the STEP-2 example named live LRN-014+016 and modeled merging them — verified COMPLEMENTARY, a merge the skill forbids. Fix = fictionalize example to 9xx + DETERMINISTIC test ([[LRN-046]], not flaky behavioral). [[LRN-073]]. +- RED-7 test caught its OWN false-green in real time: ugrep parsed `-9..` as an option → empty → green; fixed via /usr/bin/grep. 4th command-assumption miss this session → [[LRN-074]] (2nd engraved pattern-family, alongside fail-silent [[LRN-066]]/[[LRN-071]]). +- RED-8 (added-negation): consciously ACCEPTED as documented limit ([[LRN-047]] — FP-prone guard worse than honest limit on a destructive skill). +- Index backfill: 34 missing rows (decisions 11, learnings 21, blockers 2) composed + ID-sorted insert; drift 34→0, STEP-4 verify OK. Re-read the 5 awk-missed Applies-to → 4 corrected a nuance the title dropped. Moved pre-existing out-of-order LRN-021. [[EVAL-010]]. +- Branch `bugfix/prune-memory-hardening`; no finish yet (awaiting signal). LAST of 3 chantiers. diff --git a/.claude/memory/learnings.md b/.claude/memory/learnings.md index 60b911f..051ebb5 100644 --- a/.claude/memory/learnings.md +++ b/.claude/memory/learnings.md @@ -28,7 +28,6 @@ rules: | LRN-006 | 2026-05-03 | `caveman-shrink` (and any MCP middleware proxy) non-functional without upstream wrapper | any MCP middleware/proxy package — never `claude mcp add` it bare | | LRN-007 | 2026-05-06 | `toggle-external.sh enable` missed source-only state (3rd lifecycle case) | toggle scripts for tools with separate install + symlink steps | | LRN-008 | 2026-05-06 | Biggest skill-quality wins from edge-case tables, not workflow rewrites | any skill <85 — first check for FAILURE PATHS / EDGE CASES / ERROR HANDLING section | -| LRN-021 | 2026-05-20 | Refactor commands→skills must sweep `~/.claude/commands/` for orphan wrappers | any refactor moving `agents/foo.md` → `skills/foo/SKILL.md`; onboard/init-project audits | | LRN-009 | 2026-05-06 | Dry-run scoring noise wrongly triggers reverts on already-strong skills | darwin-skill ratchet on skills >91 — relax or use real subagent eval | | LRN-010 | 2026-05-06 | `~/.claude/skills,agents` symlink to Documents/claude — git from `~/.claude` fails | any optimization or batch edit on personal skills/agents | | LRN-011 | 2026-05-07 | Single subagent emits N independently-gated scores → labeled extraction + axis-aware loop + per-axis escalation | any audit pipeline shipping multiple gated metrics from one subagent | @@ -40,15 +39,37 @@ rules: | LRN-017 | 2026-05-12 | Thin-dispatcher SKILL.md round-1 win = fallback + frontmatter triggers (+15 to +30) | any `/darwin-skill` round-1 on a dispatcher SKILL.md | | LRN-018 | 2026-05-12 | Darwin eval subagents drift on total math — recompute in main thread | any subagent-driven SKILL.md rescore | | LRN-019 | 2026-05-15 | Deployable-project doc split: README dev-quickstart + DEPLOY 14-section prod-VPS topology | any onboard/doc-syncer/scaffold producing docs for a deployable project | +| LRN-020 | 2026-05-18 | profile-sentinel-collision: literal labels in cmd output must not match profile filenames | a CLI reporting a real named-identifier OR a "nothing applied" state — keep sentinels outside the namespace (string-eq consumers break) | +| LRN-021 | 2026-05-20 | Refactor commands→skills must sweep `~/.claude/commands/` for orphan wrappers | any refactor moving `agents/foo.md` → `skills/foo/SKILL.md`; onboard/init-project audits | +| LRN-022 | 2026-05-21 | audit `lib/profiles/*.profile` against the gstack skill list after every submodule bump | any gstack submodule bump / external-skill-source move; a "missing:" warning = upstream rename/deletion (link.sh can't fix) | +| LRN-023 | 2026-05-21 | scripts invoked via symlink must resolve `$REPO` with `cd -P` (physical), not logical `cd` | any script invoked via a symlink that derives its repo root from `$BASH_SOURCE` (cd -P/realpath; Python .resolve()) | | LRN-024 | 2026-06-02 | New sibling command sharing logic → extract helper + refactor existing caller, never copy-paste; assert pre/post state equality | adding a subcommand/branch reusing logic inline in a peer command | | LRN-025 | 2026-06-02 | `.gitignore` gstack allowlist must cover ALL toggleable skills (incl. parked) — else enabling one = untracked git noise | any toggle that moves local-symlink skills into a tracked dir; post-submodule-bump reconcile | | LRN-026 | 2026-06-09 | `disable-model-invocation: false` = ENABLED not blocking; only `true` blocks (model + orchestrator); binary, no per-caller | Claude Code skill frontmatter; deciding self-route/chain vs human-only entry point | | LRN-027 | 2026-06-11 | Agents improvise audit boundaries from file dates when no machine state — periodic skills need machine-readable state file, never inference | any recurring/periodic skill needing "since last run" semantics | +| LRN-028 | 2026-06-11 | "no-skill" subagent baselines invalid when the skill is installed globally | any A/B skill eval / TDD RED baseline / darwin with-vs-without — control must REMOVE the capability, not omit mention | +| LRN-029 | 2026-06-11 | an edit adding an exception to a blanket rule will contradict it — counterbalanced blind judges catch it | skill/doc/spec edits adding a branch/exception; scoring any self-modified artifact (counterbalanced blind judges) | | LRN-030 | 2026-06-18 | Opus 4.8 under-delegates subagents/memory/custom-tools by default — counter via explicit CLAUDE.md fan-out rule | any Opus 4.8 session; tuning delegation; inline-vs-subagent decision | | LRN-031 | 2026-06-19 | Skill value = gate + anti-noise + determinism, not re-coding what a capable agent does free | building/reviewing any skill; writing-skills TDD fixture design | +| LRN-032 | 2026-06-19 | a rule has a domain; applying it outside = category error — check artifact type first | invoking a limit/convention/style rule — confirm it governs THIS artifact class | +| LRN-033 | 2026-06-19 | multibyte separator breaks `printf %-Ns` byte-width padding — pad via `${#}` char-count | aligning any column with non-ASCII (·, —, box-drawing, accents) | +| LRN-034 | 2026-06-21 | narrated state ≠ ground truth; the missed alarm was internal contradiction — verify vs git | anyone asserts "X is done" — verify (git/file/grep) before building on it | +| LRN-035 | 2026-06-21 | honest dedup: name-mention ≠ definition-instance; a dosage rule can make "dedup" a no-op | any "X repeated N times → factor it" — audit what each occurrence IS | +| LRN-036 | 2026-06-21 | `command -v ` in a shelled-out script depends on PATH carrying the cli's bin, not the alias | any script shelling out a CLI from a hook/subshell | +| LRN-037 | 2026-06-21 | verify the load-bearing scenario on the REAL subject in REAL context, not a stub/logic argument | any "fixed/works" claim on a critical path — produce the real run output | +| LRN-038 | 2026-06-23 | Playwright host-platform override for distros newer than its hardcoded support list | any pinned tool with an OS allowlist breaking on a fresh OS upgrade | +| LRN-039 | 2026-06-23 | installers drift hand-curated config → snapshot+trap-restore guard; anchor gitignore for pollution | audit a fresh install with `git status` right after `make install` | +| LRN-040 | 2026-06-23 | OS newer than a pinned tool = TWO layers (version build + security policy) | "tool X broke after an OS upgrade" — check both build-support and OS hardening | +| LRN-041 | 2026-06-23 | a check reading a symlink an earlier install step makes → false negative if that step's precondition unmet | any "X not found in FILE" where FILE is a symlink/derived path | +| LRN-042 | 2026-06-23 | `npx skills add` / gstack `./setup` resolve install target RELATIVE TO CWD — repo CWD = wrong dir | before any `npx add` / ` init` that materializes a dotfile dir, set CWD | +| LRN-043 | 2026-06-25 | CLAUDE.md skill-routing: cut name-obvious lines, keep only non-derivable signal + dense catch-all | compressing any routing/dispatch table whose entries the model sees elsewhere | +| LRN-044 | 2026-06-25 | Edit/Write refuse to write THROUGH a symlink — pass the resolved real path | before editing any `~/.claude/...` config file — resolve it first | +| LRN-045 | 2026-06-25 | renaming a command: audit exact-name leak-guard / forbidden-token regexes | when renaming, grep the BARE old token inside regex/test/gate files | | LRN-046 | 2026-06-25 | Destructive skill: deterministic oracle (byte-identical / count census) > semantic judge | any destructive/irreversible skill; behavioral-oracle TDD | | LRN-047 | 2026-06-25 | A noisy safety guard (13/13 FP) = a guard you learn to ignore = risk → refine, don't tolerate | any guard/alert/lint that can false-positive | | LRN-048 | 2026-06-25 | A "0/OK/pass" must prove it LOOKED (counted both sides), else verify hard-wired to pass | any verify/test/lint reporting success | +| LRN-049 | 2026-06-25 | non-destructive repeated nudge: stateless-minimal surface > state marker (conditional on stakes) | any repeated advisory in a stateless surface — bound noise before reaching for a marker | +| LRN-050 | 2026-06-25 | on a symlinked/live file, show-before-write is the ONLY control gate | before editing any file — check if it is live, treat pre-write diff as an approval gate | | LRN-051 | 2026-06-26 | `git commit -- pathspec` strict on no-match → filter scoped commits to changed paths | any scoped-commit automation | | LRN-052 | 2026-06-26 | Hash-anchoring: 2 cases it does NOT apply (pre-code founding, squash-merge) | capitalizing founding/arch decisions; squash repos | | LRN-053 | 2026-06-26 | Read-before teeth = verifiable disposition in the artifact, not the act of reading | any read-before / check-before wiring | @@ -71,6 +92,8 @@ rules: | LRN-070 | 2026-06-29 | clean-tree-gated migration blocked by a dirty submodule → diagnose pointer-vs-content; for a local edit use `submodule..ignore=dirty`, never blind reset | migrating/releasing a superproject whose submodule carries intentional local edits | | LRN-071 | 2026-06-29 | fail-loud must cover the helper's OWN commit, not just its inputs — 3rd occurrence of the swallowed-commit pattern (a failed op masked by a later returning-0 statement) | any helper whose return value gates a downstream "success" — audit every fallible internal op propagates, esp. the commit | | LRN-072 | 2026-06-29 | a stranded-artifact bug can be fixed by NOT creating the artifact (negative diff), not by plumbing its commit — if the producing step is speculative/unused, delete it | a stranded/duplicated/uncommitted-artifact bug — before building machinery, ask if the PRODUCING step is wanted; speculative-at-creation → remove, deliberate-on-demand → keep | +| LRN-073 | 2026-06-29 | a skill's worked-example must use FICTIONAL ids, never live registry ids (they prime real-data behavior) | any skill/agent with a worked example over the SAME data it operates on — use reserved/fictional ids; test deterministically that no live id appears | +| LRN-074 | 2026-06-29 | system `grep`/`awk` may be ugrep/mawk: don't assume flag-parsing, use `/usr/bin/grep`, watch the RED go red (4th command-assumption miss this session) | any shell test/guard riding on grep/awk/sed semantics — pin `/usr/bin/`, run the assertion, confirm it reds on the defect before trusting green | --- @@ -807,3 +830,13 @@ rules: - **pattern**: 3rd member of the post-FINISH-artifact class (memory, docs, GSD ROADMAP) — but UNLIKE the first two (real artifacts ALWAYS produced → couple a commit), the GSD artifact came from a SPECULATIVE, opt-in, rarely-used producer (init-project auto-bootstrapping a multi-session engine at project creation). The reflex fix (reorder + build `gsd-commit.sh` + tests) would have added machinery to faithfully commit an artifact nobody uses. The right fix was a NEGATIVE diff: delete the producer → orphan never created → bug dissolves, zero new code (BLK-011). - **the refutation that got there**: the framing "ROADMAP redundant with TODO" was WRONG (gsd ≫ roadmap = state machine/crash-recovery/cost/parallel/worktree; TODO ≠ gsd ROADMAP = different altitude + consumer). Reading REFUTED both premises, yet the CONCLUSION (remove the step) held for a STRONGER reason: speculatively scaffolding a heavy engine the sole user doesn't use, at creation, is bad per se. Right answer, reason corrected before engraving — change the QUESTION before changing the code. - **future application**: a stranded / duplicated / uncommitted-artifact bug → BEFORE building machinery to handle the artifact, ask whether the step that PRODUCES it is actually used / wanted / non-speculative. Speculative or unused (esp. a personal/single-user repo) → DELETE the producer; the cleanest fix is the absent one. Distinguish speculative-at-creation (REMOVE) from deliberate-on-demand (KEEP). Family: [[BLK-010]], [[BLK-011]], [[BDR-036]]. + +## LRN-073 — a skill's worked-example must use FICTIONAL ids, never live registry ids (they prime real-data behavior) +- **pattern**: prune-memory's STEP-2 plan example named real LRN-014 + LRN-016 ("merge these"). A real-data run merged exactly that pair — though they're COMPLEMENTARY (header-ids vs checkbox-CSS), a merge its own rule forbids. Example ids that match live entries, in context at audit time, PRIME the action: you can't tell "judged correctly" from "pattern-matched its own example". +- **fix**: fictionalize example ids (9xx — can't match a live registry) + make the example model a CORRECT action. Lock it DETERMINISTICALLY ([[LRN-046]]): assert the example carries only fictional ids — not a flaky behavioral "did priming fire" test (RED-7). +- **future application**: any skill/agent whose instructions contain a worked example over the SAME data it operates on (registries/files/records) → use reserved/fictional identifiers; test deterministically that no live id appears in the example block. + +## LRN-074 — system `grep`/`awk` may be ugrep/mawk: don't assume flag-parsing, use `/usr/bin/grep`, watch the RED go red +- **pattern**: a RED-7 test used `grep -vE '-9[0-9][0-9]$'`; the system grep is UGREP → parsed the leading `-9..` as an OPTION → errored → empty → FALSE GREEN (a RED that never goes red). Caught only because the output was READ, not assumed. 4th time this session an assumed command behavior was false on execution (after `set -o pipefail` + `grep -q` SIGPIPE, …). The skill's own verify already hard-codes `/usr/bin/grep` (line 189) for this exact reason — re-learned. +- **fix**: `/usr/bin/grep` (GNU) where GNU semantics matter; avoid leading-dash regex args (or use `-e`/`--`); never trust the system tool is GNU/POSIX (mawk≠gawk, ugrep≠grep). +- **future application**: any shell test/guard whose correctness rides on grep/awk/sed semantics → pin `/usr/bin/` AND run the assertion, confirming it goes red on the defect before trusting green. Execute, don't assume command behavior. RECURRENT motif — audit any "assumed tool behavior" the way the fail-silent family ([[LRN-066]]/[[LRN-071]]) is audited. diff --git a/.claude/tasks/TODO.md b/.claude/tasks/TODO.md index f2089f8..a87ecff 100644 --- a/.claude/tasks/TODO.md +++ b/.claude/tasks/TODO.md @@ -296,4 +296,13 @@ stronger reason: speculative auto-bootstrap of an unused engine at creation is b - [x] Ref-coherence sweep ("test" for a removal) — header 12→11-step, 10c note, 4 USAGE refs; zero dangling STEP-12 refs repo-wide - [x] Scope guardrails — deliberate gsd use KEPT (onboarder PHASE 6, plugin-advisor, status-reporter) - [x] Capitalize — [[BLK-011]] resolved (true reason + premise trace) + [[LRN-072]] + CHANGELOG Removed + journal 2026-06-29 (cont. 2) -- [ ] FINISH — merge bugfix/blk-011-gsd-roadmap → develop (awaiting explicit human signal) +- [x] FINISH — merged bugfix/blk-011-gsd-roadmap → develop (`ce4391a`); develop pushed to origin (6 commits, SSH) + +## 2026-06-29 — prune-memory hardening (RED-7/8 + index backfill) [branch bugfix/prune-memory-hardening] +LAST of 3 chantiers. Read-first cartography confirmed RED-7/8 + measured 34-row index drift. +- [x] RED-7 (example-priming) — fictionalized STEP-2 example to 9xx ids (live ids primed a wrong merge of complementary LRN-014/016); DETERMINISTIC test (run-deterministic.sh) per [[LRN-046]]. Caught its own ugrep false-green → /usr/bin/grep ([[LRN-074]]). [[LRN-073]] +- [x] RED-8 (added-negation inversion) — consciously ACCEPTED as documented limit in BACKLOG ([[LRN-047]]); no fragile guard built +- [x] Index backfill — 34 missing rows (decisions 11, learnings 21, blockers 2) composed + ID-sorted insert; drift 34→0, STEP-4 verify OK; moved pre-existing out-of-order LRN-021 +- [x] Capitalize — [[LRN-073]] + [[LRN-074]] + [[EVAL-010]] + journal 2026-06-29 (cont. 3) +- [ ] FINISH — merge bugfix/prune-memory-hardening → develop (awaiting explicit human signal) +- [ ] PUSH — develop → origin after the 3 chantiers land (awaiting explicit human signal) diff --git a/skills/prune-memory/SKILL.md b/skills/prune-memory/SKILL.md index 26af0af..9a26e33 100644 --- a/skills/prune-memory/SKILL.md +++ b/skills/prune-memory/SKILL.md @@ -112,21 +112,23 @@ Print one block per registry. Example: ``` PRUNE PLAN — decisions.md (N entries → M after if approved) +(IDs below are FICTIONAL — 9xx range, never a live registry entry — so this + worked example cannot PRIME a real prune. Apply the same shapes to real IDs.) [A. Obsolete — mark superseded] - BDR-003 — Gitignore wildcard pattern — status: proposed since 2026-03-12 + BDR-901 — example proposed decision — status: proposed since <90+ days ago> → mark: status: deprecated (no follow-up after 90 days) - BDR-011 — Client handover 4-chapter — body says superseded by BDR-013 - → fix Index: status = "superseded by BDR-013" + BDR-902 — example decision — body says superseded by BDR-903 + → fix Index: status = "superseded by BDR-903" [B. Similar — merge] - LRN-014 + LRN-016 — both pandoc rendering quirks - → propose: merge into NEW LRN-017 ("Pandoc rendering quirks") - with both bodies appended + caveman pass; sources marked - status: superseded by LRN-017 + LRN-901 + LRN-902 — SAME concept (two notes on the identical bug) + → propose: merge into NEW LRN-904 with both bodies appended + + caveman pass; sources marked status: superseded by LRN-904 + (merge ONLY same-concept entries — complementary/different-angle stays split) [C. Bloated — inline caveman rewrite] - BDR-011 — body 612 words, filler density 7.2% → ~380 expected (-38%) + BDR-902 — body 612 words, filler density 7.2% → ~380 expected (-38%) [D. Index drift] (none) diff --git a/skills/prune-memory/tests/BACKLOG.md b/skills/prune-memory/tests/BACKLOG.md index 2528e03..7d26046 100644 --- a/skills/prune-memory/tests/BACKLOG.md +++ b/skills/prune-memory/tests/BACKLOG.md @@ -19,7 +19,15 @@ behavior on real registries. - GREEN: fictionalize the SKILL.md example (obviously-fake IDs, or an explicit "hypothetical" framing) so example IDs cannot match real entries. -Status: filed, not built. Surfaced by the real-data A-measurement. +Status: RESOLVED 2026-06-29. VERIFY-FIRST done — the real LRN-014 (pandoc header-id +stripping) and LRN-016 (pandoc checkbox CSS overlap) are COMPLEMENTARY (different +angles), NOT overlapping: the SKILL.md example modeled a *wrong* merge AND used live +IDs that primed it on real data. GREEN: the whole STEP-2 example fictionalized to 9xx +IDs (cannot match any live registry) + the merge example now models a same-concept +merge with an explicit "merge ONLY same-concept" note. Closed by a DETERMINISTIC test +(run-deterministic.sh RED-7: the STEP-2 example must carry only 9xx ids) — not the +flaky behavioral fixture originally proposed, per LRN-046 (deterministic oracle > +semantic judge on a destructive skill). Test caught its own ugrep false-green first. ## RED-8 (candidate) — added-negation inversion (documented limit, not a test yet) The RED-5 fidelity guard flags negation/permanent token DROPS; it cannot catch @@ -35,4 +43,12 @@ requires ADDING a word, contrary to an operation that shortens. - RED (if pursued): assert no op INCREASES an existing entry's negation count. - Caveat: must exclude new/merged-entry ids (HEAD count 0 -> N is legitimate), so an increase-check needs care to avoid its own false positives. -Status: documented limit, not built (low practical risk + non-trivial FP risk). +Status: CONSCIOUSLY ACCEPTED as a documented limit 2026-06-29 (re-reviewed, not built). +Rationale held on re-read: (1) remote — caveman/merge SUBTRACT tokens; authoring a new +negation runs against the operation; no evidence in the real-data measurement (the +"+7 not/no" in EVAL-006 is new/merged-entry ids going 0→N, NOT an existing entry +inverted). (2) An FP-safe increase-check is non-trivial: the census only emits non-zero +counts, so a 0→1 ADD produces a working-line with NO HEAD-line to compare — catching it +needs the HEAD entry-id set to exclude legitimately-new/merged ids. A noisy increase-check += a guard you learn to ignore (LRN-047), worse than the honest documented limit on a +destructive skill. Revisit only if a real inversion is ever observed. diff --git a/skills/prune-memory/tests/run-deterministic.sh b/skills/prune-memory/tests/run-deterministic.sh index 7744e60..d8ad2ce 100644 --- a/skills/prune-memory/tests/run-deterministic.sh +++ b/skills/prune-memory/tests/run-deterministic.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6. +# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6, RED-7. # Each MUST be red on the current (v1) skill. Pure mechanical oracles, # no LLM. Faithful: RED-2/RED-6 execute the REAL bash blocks extracted # from SKILL.md (no copy that could drift). @@ -83,6 +83,21 @@ else green 6 "verify does not false-orphan the title-less heading" fi +# ---- RED-7: STEP 2 plan example must use FICTIONAL ids, never live registry ids +# Live ids in the worked example PRIME the skill to act on those exact entries on +# real data (observed 2026-06-25: it merged the example's LRN-014 + LRN-016 on the +# live learnings.md, though they are complementary, not overlapping). Fictional ids +# (9xx) cannot match a real registry. Reads SKILL.md only — sandbox-safe. +# /usr/bin/grep (not the system grep, which may be ugrep — a leading-dash pattern +# like `-9..` is then misparsed as an option, erroring to an empty + FALSE GREEN). +ex7="$(awk '/^PRUNE PLAN/{f=1} f{print} /^Approve per category/{f=0; exit}' "$SKILL")" +bad7="$(printf '%s\n' "$ex7" | /usr/bin/grep -oE '(BDR|LRN|BLK|EVAL)-[0-9]+' | /usr/bin/grep -vE '9[0-9][0-9]$' | sort -u | tr '\n' ' ')" +if [ -n "${bad7// /}" ]; then + red 7 "STEP 2 example uses LIVE-range ids (prime real-data ops): ${bad7% }" +else + green 7 "STEP 2 example uses only fictional (9xx) ids" +fi + echo "----" if [ "$fail" -eq 0 ]; then echo "SUITE: all GREEN"