chore(memory): capitalize /reconcile — BDR-041 + LRN-075/076/077 + EVAL-011

TODO write-back (chantier /reconcile subtasks ticked, S3 split honestly) + registry entries (body + Index rows) for the shipped skill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01C6bUdvHnajCNzgVQefZowj
2026-06-30 13:42:24 +02:00 · 2026-06-30 13:42:24 +02:00 · 6b512be9af
commit 6b512be9af
parent 82e6322a9f
5 changed files with 55 additions and 9 deletions
--- a/.claude/memory/decisions.md
+++ b/.claude/memory/decisions.md
@ -62,6 +62,7 @@ rules:
 | BDR-038 | 2026-06-27 | deploy skill: per-project learning runbook, two-moment cold-resume | accepted |
 | BDR-039 | 2026-06-29 | Gitea branch protection = Option-1 owner-pushable, not require-PR | accepted |
 | BDR-040 | 2026-06-29 | doc-syncer MINOR-shape oracle: deterministic floor under LLM's MINOR call | accepted |
+| BDR-041 | 2026-06-30 | /reconcile = deterministic declared-vs-real engine + thin gated skill (reconciler, not lister) | accepted |

 ---

@ -635,3 +636,12 @@ rules:
 - **ENGRAVED LIMIT — do not over-read the guarantee**: oracle catches STRUCTURAL/size significance, NOT semantic. A 3-line edit that CHANGES MEANING, no heading, small → still reads MINOR (rc 0) and auto-commits. Deterministic FLOOR under LLM judgment = REDUCTION of RISK-1's gross cases, NOT elimination. LLM owns the semantic call above the floor; the visible surface ([[BDR-036]]) stays the content backstop.
 - **Scope tranché**: ① oracle + ② [[LRN-071]] masked-commit fix built. ③ branch-guard (doc-commit refusing main/develop) DEFERRED — duplicates the protected-base predicate a 3rd time (lib + gitflow hook + here); migrated repos have the hook → ③ guards a state that shouldn't exist. Reconsider only for repos outside `gitflow init`.
 - **Build**: TDD RED→GREEN. run-doc-shape.sh 19/19 (incl. threshold boundary + env-override) + behavioral Scenario D. Wired doc-syncer STEP A4 + doc-commit.md ACKNOWLEDGMENTS coherence. shellcheck clean.
+
+## BDR-041 — /reconcile = deterministic declared-vs-real engine + thin gated skill (reconciler, not lister)
+- **Date**: 2026-06-30
+- **Status**: accepted
+- **Decision**: `/reconcile` answers "what work is REALLY open?" by confronting DECLARATIVE sources (TODO `[x]`/`[ ]`/`[~]`, registry statuses, `## Index`) against REAL state (git/fs, registry BODY). Split: `lib/reconcile.sh` (deterministic engine — body enumeration, `reconcile_oracle_*`, BLK last-block-wins, lexical deferral sweep, contradiction candidates, pure `reconcile_verdict` kernel) + `skills/reconcile/SKILL.md` (thin orchestration + A/B/C write-back gate). Founding principle = RECURSIVE COHERENCE: never use a declarative source as oracle (Index/checkbox/status/path-name) — enumerate from BODY `## ID` headings, decide done/stale from git/fs. TESTED: run-reconcile.sh T1 reds if engine reads `## Index` (shim → 51≠72, canary LRN-020 dropped). Registries READ-ONLY (curation = /prune-memory).
+- **Why**: RED proved a capable agent reconciles when GUIDED ("use git + justify") but MIRRORS the TODO when not (false positives + missed contradiction), and even guided hits the compound-status trap (BLK-008). Value = determinism + cheapness + gate, NOT teaching ([[LRN-031]], [[LRN-075]]). Mechanical 80% → script; judgment 20% → thin skill (writing-skills: automate mechanical, document judgment).
+- **Alternatives rejected**: monolithic teaching/discipline skill (agents reconcile when guided → no teaching value); `grep '[ ]'` lister (reproduces the lie); trust Index (drift, [[LRN-055]]); blocking write-back gate (friction — A/B/C surface chosen).
+- **Honest limits (graven)**: deferral detection LEXICAL (marked-only; unmarked "à reprendre quand X" missed); contradictions = CANDIDATES (token overlap), surfaced not asserted; cross-repo "not verifiable here"; cross-ref verdicts ("[~] done because chantier X below complete") surfaced, not auto-resolved.
+- **Reference**: `lib/reconcile.sh`, `lib/tests/run-reconcile.sh` (20/20) + `lib/tests/fixtures/` (neutral-named, [[LRN-077]]), `skills/reconcile/SKILL.md`, CLAUDE.md "Skill routing". Born of the 2026-06-29 manual inventory (its known-good oracle). Built via superpowers:writing-skills. See [[LRN-075]], [[LRN-076]], [[EVAL-011]].
--- a/.claude/memory/evals.md
+++ b/.claude/memory/evals.md
@ -31,6 +31,7 @@ rules:
 | EVAL-008 | 2026-06-27 | Doc-sync coupled machinery — 28/28 real-exec, swap-sweep caught prior debt | keep |
 | EVAL-009 | 2026-06-27 | deploy skill subagent-driven build: multi-stage review + pressure-test net-positive | keep |
 | EVAL-010 | 2026-06-29 | prune-memory hardening: RED-7 deterministic fix + RED-8 accept + 34-row index backfill | keep |
+| EVAL-011 | 2026-06-30 | /reconcile build: RED contaminated→corrected (unguided control), GREEN behavioral confirmed, dogfooded on itself | keep |

 ---

@ -120,3 +121,10 @@ rules:
 - **method**: read-first cartography (sub-agent, confirmed) → RED-7 closed by a DETERMINISTIC test ([[LRN-046]]) + STEP-2 example fictionalized → RED-8 re-reviewed, consciously accepted ([[LRN-047]]) → 34 missing Index rows composed + inserted in ID-sorted slots → STEP-4 verify zero MISSING/ORPHAN; deterministic suite all-green, shellcheck clean.
 - **anomalies**: (1) RED-7 test FALSE-GREEN caught in real time — ugrep parsed `-9..` as an option → empty → green; fixed via /usr/bin/grep ([[LRN-074]]). The RED was WATCHED, not assumed. (2) RED-7 premise verified: LRN-014/016 ARE complementary → the old example modeled a WRONG merge, not just primed it. (3) backfill: 4/5 title-derived Applies-to (the awk-missed entries) missed a real future-app nuance on re-read → corrected before insert (without the 5-check, 4 Index rows would have diverged from source, engraved forever). (4) almost wrote a colliding EVAL-009 (deploy) — read the file first → EVAL-010. (5) pre-existing LRN-021 Index row out of ID-order → moved.
 - **action**: keep. RED-7 GREEN (deterministic), RED-8 documented-accept, drift 34→0. [[LRN-073]] + [[LRN-074]] engraved — 2 pattern-families this session (fail-silent [[LRN-066]]/[[LRN-071]] + command-assumption [[LRN-074]]).
+
+## EVAL-011 — /reconcile build (TDD): contaminated RED corrected, behavioral GREEN, dogfood on itself
+- **Date**: 2026-06-30
+- **output**: `lib/reconcile.sh` (engine) + `lib/tests/run-reconcile.sh` (20/20) + `lib/tests/fixtures/` (neutral) + `skills/reconcile/SKILL.md` (thin) + CLAUDE.md routing.
+- **method**: 2-arm RED — GUIDED baselines (A/B, "use git + justify") SUCCEEDED = contaminated; UNGUIDED tempting baselines (a4872/a0f68) MIRRORED the TODO = real failure ([[LRN-075]]). RED-B = deterministic Index-ignore with TEETH (shim engine to read Index → reds). GREEN behavioral (a8404): same terrain + skill → verified via engine, stale reported done w/ SHAs, applied A/B/C gate, surfaced cross-ref as candidate. Dogfood: ran on the live repo, found its OWN chantier, marked S3 PARTIAL (routage absent per path oracle), not done.
+- **anomalies**: (1) first baseline LEADING → corrected with an unguided control ([[LRN-075]]). (2) fixture-name false-signal — a0f68 read "pre-reconcile" from the dir name → re-froze fixtures neutral ([[LRN-077]]). (3) harness caught a real bug mid-build: BLK-004 status bleed from BLK-005's header ([[LRN-076]]). (4) META: marked `[x] routage DONE` BEFORE the CLAUDE.md edit succeeded (errored — Read-first) → created the exact declared-vs-real gap `/reconcile` traps, caught by the next verify. The tool's build produced the gap it detects.
+- **action**: keep. RED watched red before green ([[LRN-074]] discipline), bug caught + fixed, behavioral loop closed.
--- a/.claude/memory/journal.md
+++ b/.claude/memory/journal.md
@ -248,3 +248,9 @@ rules:
 - Contradiction caught: chantier `--help` (STEP 0.5 per SKILL.md) contradicts [[BDR-001]] accepted (helper via session-start hook; per-SKILL.md copy REJECTED) → `--help` BLOCKED pending BDR-001 resolution (supersede or re-route).
 - Our OWN manual inventory had an error: line 26 cleanup-machine declared "auto-cleaned next make plugin" but fs shows darwin-skill still present → demoted "done"→"still deferred" after fs check. Proof-by-example the queue needs a RECONCILER (declared-vs-real), not a `[ ]`-grepper.
 - Reconciled TODO (5 ticked + 1 requalify + 1 split, annotated `reconcile 2026-06-29` w/ evidence) + queued `/reconcile` skill chantier (4-cat output, inter-registry contradiction detection, GATED TODO edit). Sequencing: /reconcile FIRST (oracle = today's inventory, perishable) → resolve BDR-001 → --help.
+
+## 2026-06-30 — /reconcile skill shipped (declared-vs-real reconciler)
+- Built `/reconcile` via superpowers:writing-skills (TDD): engine `lib/reconcile.sh` + harness 20/20 + thin gated skill. Recursive coherence (never trust a declarative source, incl. Index) made a TESTED guarantee — T1 reds on an Index-reader shim. [[BDR-041]].
+- RED 2-arm: guided baselines succeed (contaminated) / unguided mirror the TODO (real failure) → value = determinism+gate, not teaching ([[LRN-075]]). GREEN behavioral confirmed; dogfooded on its own chantier (S3 marked partial honestly). [[EVAL-011]].
+- Learnings: unguided-control RED ([[LRN-075]]); last-block-wins status + BLK-004 bleed bug ([[LRN-076]]); neutral fixture names = same symptom/distinct cause as [[LRN-074]] ([[LRN-077]]).
+- Ship: feature/reconcile-skill → develop (gitflow finish). Push to origin gated (ASK).
--- a/.claude/memory/learnings.md
+++ b/.claude/memory/learnings.md
@ -94,6 +94,9 @@ rules:
 | LRN-072 | 2026-06-29 | a stranded-artifact bug can be fixed by NOT creating the artifact (negative diff), not by plumbing its commit — if the producing step is speculative/unused, delete it | a stranded/duplicated/uncommitted-artifact bug — before building machinery, ask if the PRODUCING step is wanted; speculative-at-creation → remove, deliberate-on-demand → keep |
 | LRN-073 | 2026-06-29 | a skill's worked-example must use FICTIONAL ids, never live registry ids (they prime real-data behavior) | any skill/agent with a worked example over the SAME data it operates on — use reserved/fictional ids; test deterministically that no live id appears |
 | LRN-074 | 2026-06-29 | system `grep`/`awk` may be ugrep/mawk: don't assume flag-parsing, use `/usr/bin/grep`, watch the RED go red (4th command-assumption miss this session) | any shell test/guard riding on grep/awk/sed semantics — pin `/usr/bin/<tool>`, run the assertion, confirm it reds on the defect before trusting green |
+| LRN-075 | 2026-06-30 | skill-vs-no-skill RED must test the UNGUIDED control: a "use git + justify" baseline makes a capable agent succeed (contaminated RED); real failure shows only on the tempting prompt | building any skill whose value is determinism/gate over a capable agent — strip guidance AND tempt the failure; control succeeds → don't author (or rescope) |
+| LRN-076 | 2026-06-30 | append-only registry status mutates in place (UPDATE/FINAL blocks): CURRENT status = LAST status line, not Index, not first; range scan inclusive of next header bleeds a sibling's status word | parsing any in-place-mutated status; take last line, bound entry exclusive of next header |
+| LRN-077 | 2026-06-30 | test fixtures must carry NEUTRAL names — a name that telegraphs the answer lets the subject pass by reading the name, not doing the work | designing any test fixture/path; same symptom as [[LRN-074]] (passes for WRONG reason), distinct cause (leaky fixture vs assumed command) |

 ---

@ -840,3 +843,20 @@ rules:
 - **pattern**: a RED-7 test used `grep -vE '-9[0-9][0-9]$'`; the system grep is UGREP → parsed the leading `-9..` as an OPTION → errored → empty → FALSE GREEN (a RED that never goes red). Caught only because the output was READ, not assumed. 4th time this session an assumed command behavior was false on execution (after `set -o pipefail` + `grep -q` SIGPIPE, …). The skill's own verify already hard-codes `/usr/bin/grep` (line 189) for this exact reason — re-learned.
 - **fix**: `/usr/bin/grep` (GNU) where GNU semantics matter; avoid leading-dash regex args (or use `-e`/`--`); never trust the system tool is GNU/POSIX (mawk≠gawk, ugrep≠grep).
 - **future application**: any shell test/guard whose correctness rides on grep/awk/sed semantics → pin `/usr/bin/<tool>` AND run the assertion, confirming it goes red on the defect before trusting green. Execute, don't assume command behavior. RECURRENT motif — audit any "assumed tool behavior" the way the fail-silent family ([[LRN-066]]/[[LRN-071]]) is audited.
+
+## LRN-075 — skill-vs-no-skill RED must test the UNGUIDED control, not a leading one
+- **Date**: 2026-06-30
+- **pattern**: building `/reconcile`, the first baseline ("repo git — use git + justify each item") made a capable agent reconcile correctly → CONTAMINATED RED (writing-skills: control doesn't exhibit the failure → nothing to fix). Real failure surfaced only on the UNGUIDED tempting prompt ("is the queue empty?", todo-pointed, no git hint): agent MIRRORED the TODO — stale `[ ]` reported open (false positives), decisions.md never opened, contradiction missed. Variance: one rep FLAIRED staleness but wrote a disclaimer ("à vérifier") instead of verifying — a disclaimer is not a verification.
+- **why**: skill value was never "teach an agent to reconcile" (it can, guided) — it is determinism + cheapness + gate ([[LRN-031]]), so the answer never depends on phrasing or whether the agent felt like checking git. Confirmed engine-heavy / skill-thin design.
+- **future application**: any skill-vs-no-skill / TDD RED — the control must REMOVE guidance AND tempt the failure ([[LRN-028]] sibling). Unguided control still succeeds → don't author (or rescope to determinism/gate). The skill also helps the agent who SUSPECTS but doesn't verify, not only the ignorant one.
+
+## LRN-076 — append-only registry status mutates in place: LAST block wins
+- **Date**: 2026-06-30
+- **pattern**: a BLK entry evolves via `UPDATE`/`FINAL` blocks (BLK-008: `resolved` → middle `REVERTED, UPSTREAM/open` → `FINAL — RESOLVED`). A guided baseline read the MIDDLE block → reported BLK-008 upstream/open, wrong. Current status = the LAST status-bearing line in the body, never the Index, never the first. Bonus bug the harness caught mid-build: a `sed` range inclusive of the NEXT entry's header bled a sibling's status word (BLK-005 header "...upstream rename" polluted BLK-004) → drop `^## BLK-` header lines before extracting.
+- **future application**: parsing any in-place-mutated status (blockers, revision blocks) — take the LAST status line, bound the entry EXCLUSIVE of the next header. Don't trust Index/first-line.
+
+## LRN-077 — test fixtures must carry NEUTRAL names (pass for the right reason)
+- **Date**: 2026-06-30
+- **pattern**: a baseline agent on a worktree named `wt-pre-reconcile` read "pre-reconcile" FROM THE DIR NAME and inferred staleness — reasoning for the WRONG reason (the name), not the right one (verify git). Fixtures + the GREEN test were re-frozen under NEUTRAL names so the engine reaches truth by querying git, never by reading a path hint.
+- **meta — same symptom, distinct cause as [[LRN-074]]**: 074 = a COMMAND-ASSUMPTION (ugrep parsed `-9..` → false green); 077 = a LEAKY FIXTURE (name telegraphs the answer). Different mechanisms, SAME symptom: the test passes/fails for the wrong reason. Cross-cutting lesson = verify a test passes for the RIGHT reason, not merely that it passes — whether the false signal comes from an assumed command (074) or a leaky fixture (077).
+- **future application**: name fixtures/paths neutrally; for any green, ask "did it pass because the subject did the work, or because something leaked the answer?"
--- a/.claude/tasks/TODO.md
+++ b/.claude/tasks/TODO.md
@ -310,7 +310,7 @@ LAST of 3 chantiers. Read-first cartography confirmed RED-7/8 + measured 34-row
 - [x] FINISH — merged bugfix/prune-memory-hardening → develop — DONE (reconcile 2026-06-29 : merge `73e12be`)
 - [x] PUSH — develop → origin — DONE (reconcile 2026-06-29 : develop == origin/develop, 0 commit en avance)

-## 2026-06-29 — skill /reconcile (RÉCONCILIATEUR file-ouverte ↔ réel) [QUEUED, non bloqué]
+## 2026-06-29 — skill /reconcile (RÉCONCILIATEUR file-ouverte ↔ réel) [BUILT 2026-06-29 — finish pending]
 Genèse : l'inventaire manuel du 2026-06-29 a prouvé que le TODO mentait (5 cases fait-mais-non-coché
 + 1 "auto-nettoyé" qui ne l'était pas + 1 rejeté marqué "deferred"). Cet inventaire EST la spec du
 skill ET son cas de test de référence (résultat manuel connu-bon à reproduire).
@ -335,11 +335,13 @@ SORTIE = les 4 catégories de l'inventaire 2026-06-29 :
    (modifie un fichier tracké → jamais silencieux).

 Subtasks (à détailler au lancement) :
- [ ] Spec : table oracle-par-source (commit existe / branche absente / tree clean / skill linké /
-      statut registre) — chaque "déclaré" a son test réel
- [ ] Décider build : superpowers:writing-skills (TDD, RED = fixture TODO menteur reproduisant les 7 écarts)
- [ ] `skills/reconcile/SKILL.md` + routage CLAUDE.md (triggers : "reconcile", "file vraiment vide ?",
-      "qu'est-ce qui reste ouvert", "inventaire chantiers")
- [ ] Détecteur de contradictions inter-registres (BDR accepted vs chantier qui le contredit)
- [ ] Gate de réconciliation (diff TODO proposé, A/B/C confirm avant edit)
- [ ] Test final = reproduire l'inventaire 2026-06-29 (cat. 1-4 + contradiction BDR-001) comme oracle
+- [x] Spec : table oracle-par-source (commit existe / branche absente / tree clean / skill linké /
+      statut registre) — chaque "déclaré" a son test réel — DONE (lib/reconcile.sh : reconcile_oracle_*)
+- [x] Décider build : superpowers:writing-skills (TDD, RED = fixture TODO menteur reproduisant les 7 écarts) — DONE (RED a4872 miroir + RED-B Index-ignore à dents + GREEN comportemental a8404)
+- [x] `skills/reconcile/SKILL.md` — DONE (skill mince : orchestration + gate A/B/C + limites honnêtes)
+- [x] routage CLAUDE.md (triggers : "reconcile", "file vraiment vide ?",
+      "qu'est-ce qui reste ouvert", "inventaire chantiers") — DONE (CLAUDE.md "Skill routing" + link.sh)
+- [x] Détecteur de contradictions inter-registres (BDR accepted vs chantier qui le contredit) — DONE (reconcile_contradiction_candidates ; surface, n'asserte pas)
+- [x] Gate de réconciliation (diff TODO proposé, A/B/C confirm avant edit) — DONE (SKILL.md "The gate" ; registres read-only)
+- [x] Test final = reproduire l'inventaire 2026-06-29 (cat. 1-4 + contradiction BDR-001) comme oracle — DONE (run-reconcile.sh 20/20, fixtures neutres, RED prouvé rouge avant le vert)
+- NOTE : bâti, non committé (develop +1 `bdfa9bc`, livrables untracked) → finish pending (feature/reconcile-skill)