fix(prune-memory): RED-7 fictional example IDs + RED-8 accepted limit
RED-7 (example-priming): the STEP-2 worked example named live IDs (LRN-014 + LRN-016) and modeled merging them — but they are complementary (header-ids vs checkbox-CSS), a merge the skill's own rule forbids. Live IDs in an example prime the skill to act on those exact entries on real data. Fictionalized the whole STEP-2 example to 9xx IDs (cannot match a live registry); the merge example now models a same-concept merge. Closed by a DETERMINISTIC test (run-deterministic.sh RED-7: the example must carry only 9xx ids) per LRN-046, not a flaky behavioral fixture. The test caught its own ugrep false-green first (a leading-dash pattern parsed as an option) — fixed via /usr/bin/grep, the same dodge the skill's verify already uses at line 189. RED-8 (added-negation inversion): re-reviewed, consciously accepted as a documented limit in BACKLOG — remote (compression subtracts tokens), and an FP-safe increase check is non-trivial (needs the HEAD entry-id set to exclude legit new/merged 0->N); a noisy guard is worse than the honest limit on a destructive skill (LRN-047). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01C6bUdvHnajCNzgVQefZowj
This commit is contained in:
parent
ce4391a62f
commit
5821ce2017
@ -112,21 +112,23 @@ Print one block per registry. Example:
|
||||
|
||||
```
|
||||
PRUNE PLAN — decisions.md (N entries → M after if approved)
|
||||
(IDs below are FICTIONAL — 9xx range, never a live registry entry — so this
|
||||
worked example cannot PRIME a real prune. Apply the same shapes to real IDs.)
|
||||
|
||||
[A. Obsolete — mark superseded]
|
||||
BDR-003 — Gitignore wildcard pattern — status: proposed since 2026-03-12
|
||||
BDR-901 — example proposed decision — status: proposed since <90+ days ago>
|
||||
→ mark: status: deprecated (no follow-up after 90 days)
|
||||
BDR-011 — Client handover 4-chapter — body says superseded by BDR-013
|
||||
→ fix Index: status = "superseded by BDR-013"
|
||||
BDR-902 — example decision — body says superseded by BDR-903
|
||||
→ fix Index: status = "superseded by BDR-903"
|
||||
|
||||
[B. Similar — merge]
|
||||
LRN-014 + LRN-016 — both pandoc rendering quirks
|
||||
→ propose: merge into NEW LRN-017 ("Pandoc rendering quirks")
|
||||
with both bodies appended + caveman pass; sources marked
|
||||
status: superseded by LRN-017
|
||||
LRN-901 + LRN-902 — SAME concept (two notes on the identical bug)
|
||||
→ propose: merge into NEW LRN-904 with both bodies appended +
|
||||
caveman pass; sources marked status: superseded by LRN-904
|
||||
(merge ONLY same-concept entries — complementary/different-angle stays split)
|
||||
|
||||
[C. Bloated — inline caveman rewrite]
|
||||
BDR-011 — body 612 words, filler density 7.2% → ~380 expected (-38%)
|
||||
BDR-902 — body 612 words, filler density 7.2% → ~380 expected (-38%)
|
||||
|
||||
[D. Index drift]
|
||||
(none)
|
||||
|
||||
@ -19,7 +19,15 @@ behavior on real registries.
|
||||
- GREEN: fictionalize the SKILL.md example (obviously-fake IDs, or an
|
||||
explicit "hypothetical" framing) so example IDs cannot match real entries.
|
||||
|
||||
Status: filed, not built. Surfaced by the real-data A-measurement.
|
||||
Status: RESOLVED 2026-06-29. VERIFY-FIRST done — the real LRN-014 (pandoc header-id
|
||||
stripping) and LRN-016 (pandoc checkbox CSS overlap) are COMPLEMENTARY (different
|
||||
angles), NOT overlapping: the SKILL.md example modeled a *wrong* merge AND used live
|
||||
IDs that primed it on real data. GREEN: the whole STEP-2 example fictionalized to 9xx
|
||||
IDs (cannot match any live registry) + the merge example now models a same-concept
|
||||
merge with an explicit "merge ONLY same-concept" note. Closed by a DETERMINISTIC test
|
||||
(run-deterministic.sh RED-7: the STEP-2 example must carry only 9xx ids) — not the
|
||||
flaky behavioral fixture originally proposed, per LRN-046 (deterministic oracle >
|
||||
semantic judge on a destructive skill). Test caught its own ugrep false-green first.
|
||||
|
||||
## RED-8 (candidate) — added-negation inversion (documented limit, not a test yet)
|
||||
The RED-5 fidelity guard flags negation/permanent token DROPS; it cannot catch
|
||||
@ -35,4 +43,12 @@ requires ADDING a word, contrary to an operation that shortens.
|
||||
- RED (if pursued): assert no op INCREASES an existing entry's negation count.
|
||||
- Caveat: must exclude new/merged-entry ids (HEAD count 0 -> N is legitimate),
|
||||
so an increase-check needs care to avoid its own false positives.
|
||||
Status: documented limit, not built (low practical risk + non-trivial FP risk).
|
||||
Status: CONSCIOUSLY ACCEPTED as a documented limit 2026-06-29 (re-reviewed, not built).
|
||||
Rationale held on re-read: (1) remote — caveman/merge SUBTRACT tokens; authoring a new
|
||||
negation runs against the operation; no evidence in the real-data measurement (the
|
||||
"+7 not/no" in EVAL-006 is new/merged-entry ids going 0→N, NOT an existing entry
|
||||
inverted). (2) An FP-safe increase-check is non-trivial: the census only emits non-zero
|
||||
counts, so a 0→1 ADD produces a working-line with NO HEAD-line to compare — catching it
|
||||
needs the HEAD entry-id set to exclude legitimately-new/merged ids. A noisy increase-check
|
||||
= a guard you learn to ignore (LRN-047), worse than the honest documented limit on a
|
||||
destructive skill. Revisit only if a real inversion is ever observed.
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
#!/usr/bin/env bash
|
||||
# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6.
|
||||
# Deterministic RED suite for /prune-memory — RED-1, RED-2, RED-5, RED-6, RED-7.
|
||||
# Each MUST be red on the current (v1) skill. Pure mechanical oracles,
|
||||
# no LLM. Faithful: RED-2/RED-6 execute the REAL bash blocks extracted
|
||||
# from SKILL.md (no copy that could drift).
|
||||
@ -83,6 +83,21 @@ else
|
||||
green 6 "verify does not false-orphan the title-less heading"
|
||||
fi
|
||||
|
||||
# ---- RED-7: STEP 2 plan example must use FICTIONAL ids, never live registry ids
|
||||
# Live ids in the worked example PRIME the skill to act on those exact entries on
|
||||
# real data (observed 2026-06-25: it merged the example's LRN-014 + LRN-016 on the
|
||||
# live learnings.md, though they are complementary, not overlapping). Fictional ids
|
||||
# (9xx) cannot match a real registry. Reads SKILL.md only — sandbox-safe.
|
||||
# /usr/bin/grep (not the system grep, which may be ugrep — a leading-dash pattern
|
||||
# like `-9..` is then misparsed as an option, erroring to an empty + FALSE GREEN).
|
||||
ex7="$(awk '/^PRUNE PLAN/{f=1} f{print} /^Approve per category/{f=0; exit}' "$SKILL")"
|
||||
bad7="$(printf '%s\n' "$ex7" | /usr/bin/grep -oE '(BDR|LRN|BLK|EVAL)-[0-9]+' | /usr/bin/grep -vE '9[0-9][0-9]$' | sort -u | tr '\n' ' ')"
|
||||
if [ -n "${bad7// /}" ]; then
|
||||
red 7 "STEP 2 example uses LIVE-range ids (prime real-data ops): ${bad7% }"
|
||||
else
|
||||
green 7 "STEP 2 example uses only fictional (9xx) ids"
|
||||
fi
|
||||
|
||||
echo "----"
|
||||
if [ "$fail" -eq 0 ]; then
|
||||
echo "SUITE: all GREEN"
|
||||
|
||||
Loading…
Reference in New Issue
Block a user