diff --git a/skills/prune-memory/SKILL.md b/skills/prune-memory/SKILL.md index 26af0af..9a26e33 100644 --- a/skills/prune-memory/SKILL.md +++ b/skills/prune-memory/SKILL.md @@ -112,21 +112,23 @@ Print one block per registry. Example: ``` PRUNE PLAN — decisions.md (N entries → M after if approved) +(IDs below are FICTIONAL — 9xx range, never a live registry entry — so this + worked example cannot PRIME a real prune. Apply the same shapes to real IDs.) [A. Obsolete — mark superseded] - BDR-003 — Gitignore wildcard pattern — status: proposed since 2026-03-12 + BDR-901 — example proposed decision — status: proposed since <90+ days ago> → mark: status: deprecated (no follow-up after 90 days) - BDR-011 — Client handover 4-chapter — body says superseded by BDR-013 - → fix Index: status = "superseded by BDR-013" + BDR-902 — example decision — body says superseded by BDR-903 + → fix Index: status = "superseded by BDR-903" [B. Similar — merge] - LRN-014 + LRN-016 — both pandoc rendering quirks - → propose: merge into NEW LRN-017 ("Pandoc rendering quirks") - with both bodies appended + caveman pass; sources marked - status: superseded by LRN-017 + LRN-901 + LRN-902 — SAME concept (two notes on the identical bug) + → propose: merge into NEW LRN-904 with both bodies appended + + caveman pass; sources marked status: superseded by LRN-904 + (merge ONLY same-concept entries — complementary/different-angle stays split) [C. Bloated — inline caveman rewrite] - BDR-011 — body 612 words, filler density 7.2% → ~380 expected (-38%) + BDR-902 — body 612 words, filler density 7.2% → ~380 expected (-38%) [D. Index drift] (none) diff --git a/skills/prune-memory/tests/BACKLOG.md b/skills/prune-memory/tests/BACKLOG.md index 2528e03..7d26046 100644 --- a/skills/prune-memory/tests/BACKLOG.md +++ b/skills/prune-memory/tests/BACKLOG.md @@ -19,7 +19,15 @@ behavior on real registries. - GREEN: fictionalize the SKILL.md example (obviously-fake IDs, or an explicit "hypothetical" framing) so example IDs cannot match real entries. -Status: filed, not built. Surfaced by the real-data A-measurement. +Status: RESOLVED 2026-06-29. VERIFY-FIRST done — the real LRN-014 (pandoc header-id +stripping) and LRN-016 (pandoc checkbox CSS overlap) are COMPLEMENTARY (different +angles), NOT overlapping: the SKILL.md example modeled a *wrong* merge AND used live +IDs that primed it on real data. GREEN: the whole STEP-2 example fictionalized to 9xx +IDs (cannot match any live registry) + the merge example now models a same-concept +merge with an explicit "merge ONLY same-concept" note. Closed by a DETERMINISTIC test +(run-deterministic.sh RED-7: the STEP-2 example must carry only 9xx ids) — not the +flaky behavioral fixture originally proposed, per LRN-046 (deterministic oracle > +semantic judge on a destructive skill). Test caught its own ugrep false-green first. ## RED-8 (candidate) — added-negation inversion (documented limit, not a test yet) The RED-5 fidelity guard flags negation/permanent token DROPS; it cannot catch @@ -35,4 +43,12 @@ requires ADDING a word, contrary to an operation that shortens. - RED (if pursued): assert no op INCREASES an existing entry's negation count. - Caveat: must exclude new/merged-entry ids (HEAD count 0 -> N is legitimate), so an increase-check needs care to avoid its own false positives. -Status: documented limit, not built (low practical risk + non-trivial FP risk). +Status: CONSCIOUSLY ACCEPTED as a documented limit 2026-06-29 (re-reviewed, not built). +Rationale held on re-read: (1) remote — caveman/merge SUBTRACT tokens; authoring a new +negation runs against the operation; no evidence in the real-data measurement (the +"+7 not/no" in EVAL-006 is new/merged-entry ids going 0→N, NOT an existing entry +inverted). (2) An FP-safe increase-check is non-trivial: the census only emits non-zero +counts, so a 0→1 ADD produces a working-line with NO HEAD-line to compare — catching it +needs the HEAD entry-id set to exclude legitimately-new/merged ids. A noisy increase-check += a guard you learn to ignore (LRN-047), worse than the honest documented limit on a +destructive skill. Revisit only if a real inversion is ever observed. diff --git a/skills/prune-memory/tests/run-deterministic.sh b/skills/prune-memory/tests/run-deterministic.sh index 7744e60..d8ad2ce 100644 --- a/skills/prune-memory/tests/run-deterministic.sh +++ b/skills/prune-memory/tests/run-deterministic.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6. +# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6, RED-7. # Each MUST be red on the current (v1) skill. Pure mechanical oracles, # no LLM. Faithful: RED-2/RED-6 execute the REAL bash blocks extracted # from SKILL.md (no copy that could drift). @@ -83,6 +83,21 @@ else green 6 "verify does not false-orphan the title-less heading" fi +# ---- RED-7: STEP 2 plan example must use FICTIONAL ids, never live registry ids +# Live ids in the worked example PRIME the skill to act on those exact entries on +# real data (observed 2026-06-25: it merged the example's LRN-014 + LRN-016 on the +# live learnings.md, though they are complementary, not overlapping). Fictional ids +# (9xx) cannot match a real registry. Reads SKILL.md only — sandbox-safe. +# /usr/bin/grep (not the system grep, which may be ugrep — a leading-dash pattern +# like `-9..` is then misparsed as an option, erroring to an empty + FALSE GREEN). +ex7="$(awk '/^PRUNE PLAN/{f=1} f{print} /^Approve per category/{f=0; exit}' "$SKILL")" +bad7="$(printf '%s\n' "$ex7" | /usr/bin/grep -oE '(BDR|LRN|BLK|EVAL)-[0-9]+' | /usr/bin/grep -vE '9[0-9][0-9]$' | sort -u | tr '\n' ' ')" +if [ -n "${bad7// /}" ]; then + red 7 "STEP 2 example uses LIVE-range ids (prime real-data ops): ${bad7% }" +else + green 7 "STEP 2 example uses only fictional (9xx) ids" +fi + echo "----" if [ "$fail" -eq 0 ]; then echo "SUITE: all GREEN"