Only destructive skill, previously untested. A RED suite (tests/) proved 6 dangers; each closed by a deterministic guard: - RED-1 removed false "Fixed in v1.1 (TDD found it)" verify claim - RED-2 STEP 0 dirty-tree is now a real exit 1 (was a prose-only STOP) - RED-3 STEP 3.4 negation-sentence verbatim guard (no silent inversion) - RED-4 STEP 1-A collapse safety-critical exception (NEVER/ALWAYS/PERMANENT) - RED-5 STEP 4 fidelity census (count-based, per-entry x per-category) - RED-6 STEP 4 trailing-space false-ORPHAN fix Tests: run-deterministic.sh (all-green), run-behavioral.md, fixtures, BACKLOG (RED-7/RED-8 open). Validated on the real learnings.md: 0 fidelity false-positive vs 13, scope held, registry reverted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd
39 lines
2.1 KiB
Markdown
39 lines
2.1 KiB
Markdown
# prune-memory — test backlog (future REDs)
|
|
|
|
## RED-7 (candidate) — example-priming in the merge pass
|
|
Observed during the 2026-06-25 real-data measurement on the live
|
|
`learnings.md`: the skill merged **LRN-014 + LRN-016** — the EXACT pair
|
|
named as the worked example in `SKILL.md` STEP 2
|
|
("LRN-014 + LRN-016 — both pandoc rendering quirks → merge into NEW
|
|
LRN-017").
|
|
|
|
Hypothesis: the skill's own illustrative example PRIMED the merge on real
|
|
data, rather than a genuine content overlap between those two entries.
|
|
|
|
If confirmed, this is a design defect: a skill's example must not steer its
|
|
behavior on real registries.
|
|
- VERIFY FIRST: read the real LRN-014 / LRN-016 — do they actually overlap,
|
|
or did the example drive the merge?
|
|
- RED (if priming confirmed): fixture with entries at LRN-014/016 that do
|
|
NOT overlap (distinct topics) → assert the skill does NOT merge them.
|
|
- GREEN: fictionalize the SKILL.md example (obviously-fake IDs, or an
|
|
explicit "hypothetical" framing) so example IDs cannot match real entries.
|
|
|
|
Status: filed, not built. Surfaced by the real-data A-measurement.
|
|
|
|
## RED-8 (candidate) — added-negation inversion (documented limit, not a test yet)
|
|
The RED-5 fidelity guard flags negation/permanent token DROPS; it cannot catch
|
|
an ADDED negation that inverts meaning ("X works" -> "X never works") — that is
|
|
a count INCREASE. The STEP 3.4 NEGATION GUARD only protects sentences that
|
|
ALREADY contain a negation, so it does not stop a non-negation sentence being
|
|
rewritten WITH a negation. So NEITHER guard closes this case — a real hole,
|
|
documented honestly rather than claimed covered.
|
|
|
|
Practically remote: caveman compression and merge SUBTRACT tokens (drop filler);
|
|
they do not author new negations. Producing "X never works" from "X works"
|
|
requires ADDING a word, contrary to an operation that shortens.
|
|
- RED (if pursued): assert no op INCREASES an existing entry's negation count.
|
|
- Caveat: must exclude new/merged-entry ids (HEAD count 0 -> N is legitimate),
|
|
so an increase-check needs care to avoid its own false positives.
|
|
Status: documented limit, not built (low practical risk + non-trivial FP risk).
|