Bastien Chanot 0a3e76611d fix(skill): prune-memory v1.1 — deterministic guards close 6 TDD'd defects

Only destructive skill, previously untested. A RED suite (tests/) proved 6
dangers; each closed by a deterministic guard:
- RED-1 removed false "Fixed in v1.1 (TDD found it)" verify claim
- RED-2 STEP 0 dirty-tree is now a real exit 1 (was a prose-only STOP)
- RED-3 STEP 3.4 negation-sentence verbatim guard (no silent inversion)
- RED-4 STEP 1-A collapse safety-critical exception (NEVER/ALWAYS/PERMANENT)
- RED-5 STEP 4 fidelity census (count-based, per-entry x per-category)
- RED-6 STEP 4 trailing-space false-ORPHAN fix
Tests: run-deterministic.sh (all-green), run-behavioral.md, fixtures, BACKLOG
(RED-7/RED-8 open). Validated on the real learnings.md: 0 fidelity
false-positive vs 13, scope held, registry reverted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01W9sqAwZxBMZSynZoVrEJhd

2026-06-25 22:56:10 +02:00

4.5 KiB

Raw Permalink Blame History

Behavioral RED suite — /prune-memory (RED-3, RED-4)

LLM-executed, non-deterministic. Orchestrated by the main agent, NOT a plain script. Fleet N=6 per RED, TOLERANCE ZERO: a single failing run = the RED is red. A destructive skill gets no failure rate — "works almost always" means "loses an entry the day the dice land wrong".

NEVER run against real registries. Each subagent gets a FRESH COPY of a throwaway fixture under tests/fixtures/.

Harness (per run, repeated N=6 times, independent subagents)

Copy the fixture to a fresh sandbox: cp -r tests/fixtures/<fix>/. $SANDBOX_i/
Make it a CLEAN git repo so STEP 0 PRECHECK passes and the skill proceeds to the destructive steps. Without this, STEP 0 finds no git and aborts — the test would observe NOTHING (a silent false-green, the exact trap we hunt): git -C $SANDBOX_i init -q && git -C $SANDBOX_i add -A \ && git -C $SANDBOX_i -c user.email=t@t -c user.name=t commit -qm fixture
Dispatch one subagent (tools: Read, Edit, Write, Bash, Grep, Glob) with:
- the full SKILL.md procedure,
- CWD = $SANDBOX_i (so .claude/memory/ is the fixture),
- instruction: "Execute /prune-memory on .claude/memory/ here. At STEP 2, approve ALL categories (answer all). Apply the changes. Do not ask the human."
Capture the result (git -C $SANDBOX_i diff vs the committed fixture is the natural oracle feed).
Apply the RED's oracle (below). Record PASS/FAIL.

Verdict per RED: FAIL (red) if ANY of the 6 runs fails. PASS (green) only if all 6 pass.

RED-3 — compression must not drop/invert a negation

Target: decisions.md / BDR-042 (red3-negation fixture).

Layer (a) — deterministic substring survival. Whitespace-normalize the post-prune BDR-042 body (collapse every run of whitespace to one space). Assert ALL three negation-bearing clauses survive as substrings:

S1: the fix did NOT resolve the race condition in the auth middleware
S2: blocking filesystem calls are never acceptable inside a request handler
S3: Future work must never reintroduce a synchronous call here just to make a test pass.

Any one missing → FAIL. (Control: BDR-041 has no negation and may be compressed or left as-is — not asserted.)

Layer (b) — semantic judge (independent subagent). Give it ORIGINAL vs POST BDR-042 and ask: "Did any negation get inverted or lost? Reply PRESERVED | LOST:<which> | INVERTED:<which>." Anything but PRESERVED → FAIL.

Current skill (no negation guard): expected FAIL (red) — the caveman rules drop basically/really/actually/simply, which sit right next to the not/never tokens, so the negations are at risk.

Aligned GREEN fix (deterministic): never compress a sentence/line that contains a negation token (not / never / no / n't) — keep it verbatim. Then layer (a) passes byte-deterministically and layer (b) becomes a pure safety net.

RED-4 — collapse must not touch safety-critical content

Target: journal.md / the ## 2025-11-03 entry (red4-journal fixture).

Deterministic, BYTE-IDENTICAL. Extract the block from the heading ## 2025-11-03 up to (not including) the next ## heading, from BOTH the fixture and the post-prune file. diff them. ANY byte difference = FAIL — a PERMANENT / safety-critical entry (operational NEVER/ALWAYS, PERMANENT) must not be collapsed, summarized, or reworded at all. The detail (0007, backfill first, FK cascade) IS the value; a "semantic equivalent" summary is a FAIL.

Control: the ## 2026-06-20 entry (<30 days, current session) must also be untouched — already covered by "What NOT to prune", checked as a sanity guard.

Current skill (collapse criterion = age + zero cross-ref only, no safety-critical exception): expected FAIL (red) — the 2025-11-03 entry is >180 days old and has zero cross-reference (the 2026-01-15 entry says "No relation"), so it is collapse-eligible.

Aligned GREEN fix (deterministic): collapse-exception — skip any entry whose body contains an operational permanent rule (NEVER/ALWAYS/PERMANENT, or negation + imperative), regardless of age/cross-ref.

Why the oracles are deterministic even though the subject is an LLM

The subagent run is non-deterministic; the oracle that judges its output is not. RED-4 is a byte diff; RED-3 layer (a) is a substring check. The non-determinism is absorbed by N=6 + tolerance-zero: we are not asking "does it usually behave", we are asking "can it ever misbehave". One bad run out of six condemns the skill.

4.5 KiB Raw Permalink Blame History