From 6b512be9af5613ce44c146868d0c3f4bf51311f4 Mon Sep 17 00:00:00 2001 From: Bastien Chanot Date: Tue, 30 Jun 2026 13:42:24 +0200 Subject: [PATCH] =?UTF-8?q?chore(memory):=20capitalize=20/reconcile=20?= =?UTF-8?q?=E2=80=94=20BDR-041=20+=20LRN-075/076/077=20+=20EVAL-011?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TODO write-back (chantier /reconcile subtasks ticked, S3 split honestly) + registry entries (body + Index rows) for the shipped skill. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01C6bUdvHnajCNzgVQefZowj --- .claude/memory/decisions.md | 10 ++++++++++ .claude/memory/evals.md | 8 ++++++++ .claude/memory/journal.md | 6 ++++++ .claude/memory/learnings.md | 20 ++++++++++++++++++++ .claude/tasks/TODO.md | 20 +++++++++++--------- 5 files changed, 55 insertions(+), 9 deletions(-) diff --git a/.claude/memory/decisions.md b/.claude/memory/decisions.md index a24aac8..7770697 100644 --- a/.claude/memory/decisions.md +++ b/.claude/memory/decisions.md @@ -62,6 +62,7 @@ rules: | BDR-038 | 2026-06-27 | deploy skill: per-project learning runbook, two-moment cold-resume | accepted | | BDR-039 | 2026-06-29 | Gitea branch protection = Option-1 owner-pushable, not require-PR | accepted | | BDR-040 | 2026-06-29 | doc-syncer MINOR-shape oracle: deterministic floor under LLM's MINOR call | accepted | +| BDR-041 | 2026-06-30 | /reconcile = deterministic declared-vs-real engine + thin gated skill (reconciler, not lister) | accepted | --- @@ -635,3 +636,12 @@ rules: - **ENGRAVED LIMIT — do not over-read the guarantee**: oracle catches STRUCTURAL/size significance, NOT semantic. A 3-line edit that CHANGES MEANING, no heading, small → still reads MINOR (rc 0) and auto-commits. Deterministic FLOOR under LLM judgment = REDUCTION of RISK-1's gross cases, NOT elimination. LLM owns the semantic call above the floor; the visible surface ([[BDR-036]]) stays the content backstop. - **Scope tranché**: ① oracle + ② [[LRN-071]] masked-commit fix built. ③ branch-guard (doc-commit refusing main/develop) DEFERRED — duplicates the protected-base predicate a 3rd time (lib + gitflow hook + here); migrated repos have the hook → ③ guards a state that shouldn't exist. Reconsider only for repos outside `gitflow init`. - **Build**: TDD RED→GREEN. run-doc-shape.sh 19/19 (incl. threshold boundary + env-override) + behavioral Scenario D. Wired doc-syncer STEP A4 + doc-commit.md ACKNOWLEDGMENTS coherence. shellcheck clean. + +## BDR-041 — /reconcile = deterministic declared-vs-real engine + thin gated skill (reconciler, not lister) +- **Date**: 2026-06-30 +- **Status**: accepted +- **Decision**: `/reconcile` answers "what work is REALLY open?" by confronting DECLARATIVE sources (TODO `[x]`/`[ ]`/`[~]`, registry statuses, `## Index`) against REAL state (git/fs, registry BODY). Split: `lib/reconcile.sh` (deterministic engine — body enumeration, `reconcile_oracle_*`, BLK last-block-wins, lexical deferral sweep, contradiction candidates, pure `reconcile_verdict` kernel) + `skills/reconcile/SKILL.md` (thin orchestration + A/B/C write-back gate). Founding principle = RECURSIVE COHERENCE: never use a declarative source as oracle (Index/checkbox/status/path-name) — enumerate from BODY `## ID` headings, decide done/stale from git/fs. TESTED: run-reconcile.sh T1 reds if engine reads `## Index` (shim → 51≠72, canary LRN-020 dropped). Registries READ-ONLY (curation = /prune-memory). +- **Why**: RED proved a capable agent reconciles when GUIDED ("use git + justify") but MIRRORS the TODO when not (false positives + missed contradiction), and even guided hits the compound-status trap (BLK-008). Value = determinism + cheapness + gate, NOT teaching ([[LRN-031]], [[LRN-075]]). Mechanical 80% → script; judgment 20% → thin skill (writing-skills: automate mechanical, document judgment). +- **Alternatives rejected**: monolithic teaching/discipline skill (agents reconcile when guided → no teaching value); `grep '[ ]'` lister (reproduces the lie); trust Index (drift, [[LRN-055]]); blocking write-back gate (friction — A/B/C surface chosen). +- **Honest limits (graven)**: deferral detection LEXICAL (marked-only; unmarked "à reprendre quand X" missed); contradictions = CANDIDATES (token overlap), surfaced not asserted; cross-repo "not verifiable here"; cross-ref verdicts ("[~] done because chantier X below complete") surfaced, not auto-resolved. +- **Reference**: `lib/reconcile.sh`, `lib/tests/run-reconcile.sh` (20/20) + `lib/tests/fixtures/` (neutral-named, [[LRN-077]]), `skills/reconcile/SKILL.md`, CLAUDE.md "Skill routing". Born of the 2026-06-29 manual inventory (its known-good oracle). Built via superpowers:writing-skills. See [[LRN-075]], [[LRN-076]], [[EVAL-011]]. diff --git a/.claude/memory/evals.md b/.claude/memory/evals.md index 66ec3cd..455b2ba 100644 --- a/.claude/memory/evals.md +++ b/.claude/memory/evals.md @@ -31,6 +31,7 @@ rules: | EVAL-008 | 2026-06-27 | Doc-sync coupled machinery — 28/28 real-exec, swap-sweep caught prior debt | keep | | EVAL-009 | 2026-06-27 | deploy skill subagent-driven build: multi-stage review + pressure-test net-positive | keep | | EVAL-010 | 2026-06-29 | prune-memory hardening: RED-7 deterministic fix + RED-8 accept + 34-row index backfill | keep | +| EVAL-011 | 2026-06-30 | /reconcile build: RED contaminated→corrected (unguided control), GREEN behavioral confirmed, dogfooded on itself | keep | --- @@ -120,3 +121,10 @@ rules: - **method**: read-first cartography (sub-agent, confirmed) → RED-7 closed by a DETERMINISTIC test ([[LRN-046]]) + STEP-2 example fictionalized → RED-8 re-reviewed, consciously accepted ([[LRN-047]]) → 34 missing Index rows composed + inserted in ID-sorted slots → STEP-4 verify zero MISSING/ORPHAN; deterministic suite all-green, shellcheck clean. - **anomalies**: (1) RED-7 test FALSE-GREEN caught in real time — ugrep parsed `-9..` as an option → empty → green; fixed via /usr/bin/grep ([[LRN-074]]). The RED was WATCHED, not assumed. (2) RED-7 premise verified: LRN-014/016 ARE complementary → the old example modeled a WRONG merge, not just primed it. (3) backfill: 4/5 title-derived Applies-to (the awk-missed entries) missed a real future-app nuance on re-read → corrected before insert (without the 5-check, 4 Index rows would have diverged from source, engraved forever). (4) almost wrote a colliding EVAL-009 (deploy) — read the file first → EVAL-010. (5) pre-existing LRN-021 Index row out of ID-order → moved. - **action**: keep. RED-7 GREEN (deterministic), RED-8 documented-accept, drift 34→0. [[LRN-073]] + [[LRN-074]] engraved — 2 pattern-families this session (fail-silent [[LRN-066]]/[[LRN-071]] + command-assumption [[LRN-074]]). + +## EVAL-011 — /reconcile build (TDD): contaminated RED corrected, behavioral GREEN, dogfood on itself +- **Date**: 2026-06-30 +- **output**: `lib/reconcile.sh` (engine) + `lib/tests/run-reconcile.sh` (20/20) + `lib/tests/fixtures/` (neutral) + `skills/reconcile/SKILL.md` (thin) + CLAUDE.md routing. +- **method**: 2-arm RED — GUIDED baselines (A/B, "use git + justify") SUCCEEDED = contaminated; UNGUIDED tempting baselines (a4872/a0f68) MIRRORED the TODO = real failure ([[LRN-075]]). RED-B = deterministic Index-ignore with TEETH (shim engine to read Index → reds). GREEN behavioral (a8404): same terrain + skill → verified via engine, stale reported done w/ SHAs, applied A/B/C gate, surfaced cross-ref as candidate. Dogfood: ran on the live repo, found its OWN chantier, marked S3 PARTIAL (routage absent per path oracle), not done. +- **anomalies**: (1) first baseline LEADING → corrected with an unguided control ([[LRN-075]]). (2) fixture-name false-signal — a0f68 read "pre-reconcile" from the dir name → re-froze fixtures neutral ([[LRN-077]]). (3) harness caught a real bug mid-build: BLK-004 status bleed from BLK-005's header ([[LRN-076]]). (4) META: marked `[x] routage DONE` BEFORE the CLAUDE.md edit succeeded (errored — Read-first) → created the exact declared-vs-real gap `/reconcile` traps, caught by the next verify. The tool's build produced the gap it detects. +- **action**: keep. RED watched red before green ([[LRN-074]] discipline), bug caught + fixed, behavioral loop closed. diff --git a/.claude/memory/journal.md b/.claude/memory/journal.md index 482772e..eacc2e2 100644 --- a/.claude/memory/journal.md +++ b/.claude/memory/journal.md @@ -248,3 +248,9 @@ rules: - Contradiction caught: chantier `--help` (STEP 0.5 per SKILL.md) contradicts [[BDR-001]] accepted (helper via session-start hook; per-SKILL.md copy REJECTED) → `--help` BLOCKED pending BDR-001 resolution (supersede or re-route). - Our OWN manual inventory had an error: line 26 cleanup-machine declared "auto-cleaned next make plugin" but fs shows darwin-skill still present → demoted "done"→"still deferred" after fs check. Proof-by-example the queue needs a RECONCILER (declared-vs-real), not a `[ ]`-grepper. - Reconciled TODO (5 ticked + 1 requalify + 1 split, annotated `reconcile 2026-06-29` w/ evidence) + queued `/reconcile` skill chantier (4-cat output, inter-registry contradiction detection, GATED TODO edit). Sequencing: /reconcile FIRST (oracle = today's inventory, perishable) → resolve BDR-001 → --help. + +## 2026-06-30 — /reconcile skill shipped (declared-vs-real reconciler) +- Built `/reconcile` via superpowers:writing-skills (TDD): engine `lib/reconcile.sh` + harness 20/20 + thin gated skill. Recursive coherence (never trust a declarative source, incl. Index) made a TESTED guarantee — T1 reds on an Index-reader shim. [[BDR-041]]. +- RED 2-arm: guided baselines succeed (contaminated) / unguided mirror the TODO (real failure) → value = determinism+gate, not teaching ([[LRN-075]]). GREEN behavioral confirmed; dogfooded on its own chantier (S3 marked partial honestly). [[EVAL-011]]. +- Learnings: unguided-control RED ([[LRN-075]]); last-block-wins status + BLK-004 bleed bug ([[LRN-076]]); neutral fixture names = same symptom/distinct cause as [[LRN-074]] ([[LRN-077]]). +- Ship: feature/reconcile-skill → develop (gitflow finish). Push to origin gated (ASK). diff --git a/.claude/memory/learnings.md b/.claude/memory/learnings.md index 051ebb5..faabb1d 100644 --- a/.claude/memory/learnings.md +++ b/.claude/memory/learnings.md @@ -94,6 +94,9 @@ rules: | LRN-072 | 2026-06-29 | a stranded-artifact bug can be fixed by NOT creating the artifact (negative diff), not by plumbing its commit — if the producing step is speculative/unused, delete it | a stranded/duplicated/uncommitted-artifact bug — before building machinery, ask if the PRODUCING step is wanted; speculative-at-creation → remove, deliberate-on-demand → keep | | LRN-073 | 2026-06-29 | a skill's worked-example must use FICTIONAL ids, never live registry ids (they prime real-data behavior) | any skill/agent with a worked example over the SAME data it operates on — use reserved/fictional ids; test deterministically that no live id appears | | LRN-074 | 2026-06-29 | system `grep`/`awk` may be ugrep/mawk: don't assume flag-parsing, use `/usr/bin/grep`, watch the RED go red (4th command-assumption miss this session) | any shell test/guard riding on grep/awk/sed semantics — pin `/usr/bin/`, run the assertion, confirm it reds on the defect before trusting green | +| LRN-075 | 2026-06-30 | skill-vs-no-skill RED must test the UNGUIDED control: a "use git + justify" baseline makes a capable agent succeed (contaminated RED); real failure shows only on the tempting prompt | building any skill whose value is determinism/gate over a capable agent — strip guidance AND tempt the failure; control succeeds → don't author (or rescope) | +| LRN-076 | 2026-06-30 | append-only registry status mutates in place (UPDATE/FINAL blocks): CURRENT status = LAST status line, not Index, not first; range scan inclusive of next header bleeds a sibling's status word | parsing any in-place-mutated status; take last line, bound entry exclusive of next header | +| LRN-077 | 2026-06-30 | test fixtures must carry NEUTRAL names — a name that telegraphs the answer lets the subject pass by reading the name, not doing the work | designing any test fixture/path; same symptom as [[LRN-074]] (passes for WRONG reason), distinct cause (leaky fixture vs assumed command) | --- @@ -840,3 +843,20 @@ rules: - **pattern**: a RED-7 test used `grep -vE '-9[0-9][0-9]$'`; the system grep is UGREP → parsed the leading `-9..` as an OPTION → errored → empty → FALSE GREEN (a RED that never goes red). Caught only because the output was READ, not assumed. 4th time this session an assumed command behavior was false on execution (after `set -o pipefail` + `grep -q` SIGPIPE, …). The skill's own verify already hard-codes `/usr/bin/grep` (line 189) for this exact reason — re-learned. - **fix**: `/usr/bin/grep` (GNU) where GNU semantics matter; avoid leading-dash regex args (or use `-e`/`--`); never trust the system tool is GNU/POSIX (mawk≠gawk, ugrep≠grep). - **future application**: any shell test/guard whose correctness rides on grep/awk/sed semantics → pin `/usr/bin/` AND run the assertion, confirming it goes red on the defect before trusting green. Execute, don't assume command behavior. RECURRENT motif — audit any "assumed tool behavior" the way the fail-silent family ([[LRN-066]]/[[LRN-071]]) is audited. + +## LRN-075 — skill-vs-no-skill RED must test the UNGUIDED control, not a leading one +- **Date**: 2026-06-30 +- **pattern**: building `/reconcile`, the first baseline ("repo git — use git + justify each item") made a capable agent reconcile correctly → CONTAMINATED RED (writing-skills: control doesn't exhibit the failure → nothing to fix). Real failure surfaced only on the UNGUIDED tempting prompt ("is the queue empty?", todo-pointed, no git hint): agent MIRRORED the TODO — stale `[ ]` reported open (false positives), decisions.md never opened, contradiction missed. Variance: one rep FLAIRED staleness but wrote a disclaimer ("à vérifier") instead of verifying — a disclaimer is not a verification. +- **why**: skill value was never "teach an agent to reconcile" (it can, guided) — it is determinism + cheapness + gate ([[LRN-031]]), so the answer never depends on phrasing or whether the agent felt like checking git. Confirmed engine-heavy / skill-thin design. +- **future application**: any skill-vs-no-skill / TDD RED — the control must REMOVE guidance AND tempt the failure ([[LRN-028]] sibling). Unguided control still succeeds → don't author (or rescope to determinism/gate). The skill also helps the agent who SUSPECTS but doesn't verify, not only the ignorant one. + +## LRN-076 — append-only registry status mutates in place: LAST block wins +- **Date**: 2026-06-30 +- **pattern**: a BLK entry evolves via `UPDATE`/`FINAL` blocks (BLK-008: `resolved` → middle `REVERTED, UPSTREAM/open` → `FINAL — RESOLVED`). A guided baseline read the MIDDLE block → reported BLK-008 upstream/open, wrong. Current status = the LAST status-bearing line in the body, never the Index, never the first. Bonus bug the harness caught mid-build: a `sed` range inclusive of the NEXT entry's header bled a sibling's status word (BLK-005 header "...upstream rename" polluted BLK-004) → drop `^## BLK-` header lines before extracting. +- **future application**: parsing any in-place-mutated status (blockers, revision blocks) — take the LAST status line, bound the entry EXCLUSIVE of the next header. Don't trust Index/first-line. + +## LRN-077 — test fixtures must carry NEUTRAL names (pass for the right reason) +- **Date**: 2026-06-30 +- **pattern**: a baseline agent on a worktree named `wt-pre-reconcile` read "pre-reconcile" FROM THE DIR NAME and inferred staleness — reasoning for the WRONG reason (the name), not the right one (verify git). Fixtures + the GREEN test were re-frozen under NEUTRAL names so the engine reaches truth by querying git, never by reading a path hint. +- **meta — same symptom, distinct cause as [[LRN-074]]**: 074 = a COMMAND-ASSUMPTION (ugrep parsed `-9..` → false green); 077 = a LEAKY FIXTURE (name telegraphs the answer). Different mechanisms, SAME symptom: the test passes/fails for the wrong reason. Cross-cutting lesson = verify a test passes for the RIGHT reason, not merely that it passes — whether the false signal comes from an assumed command (074) or a leaky fixture (077). +- **future application**: name fixtures/paths neutrally; for any green, ask "did it pass because the subject did the work, or because something leaked the answer?" diff --git a/.claude/tasks/TODO.md b/.claude/tasks/TODO.md index a9dc218..a2845ae 100644 --- a/.claude/tasks/TODO.md +++ b/.claude/tasks/TODO.md @@ -310,7 +310,7 @@ LAST of 3 chantiers. Read-first cartography confirmed RED-7/8 + measured 34-row - [x] FINISH — merged bugfix/prune-memory-hardening → develop — DONE (reconcile 2026-06-29 : merge `73e12be`) - [x] PUSH — develop → origin — DONE (reconcile 2026-06-29 : develop == origin/develop, 0 commit en avance) -## 2026-06-29 — skill /reconcile (RÉCONCILIATEUR file-ouverte ↔ réel) [QUEUED, non bloqué] +## 2026-06-29 — skill /reconcile (RÉCONCILIATEUR file-ouverte ↔ réel) [BUILT 2026-06-29 — finish pending] Genèse : l'inventaire manuel du 2026-06-29 a prouvé que le TODO mentait (5 cases fait-mais-non-coché + 1 "auto-nettoyé" qui ne l'était pas + 1 rejeté marqué "deferred"). Cet inventaire EST la spec du skill ET son cas de test de référence (résultat manuel connu-bon à reproduire). @@ -335,11 +335,13 @@ SORTIE = les 4 catégories de l'inventaire 2026-06-29 : (modifie un fichier tracké → jamais silencieux). Subtasks (à détailler au lancement) : -- [ ] Spec : table oracle-par-source (commit existe / branche absente / tree clean / skill linké / - statut registre) — chaque "déclaré" a son test réel -- [ ] Décider build : superpowers:writing-skills (TDD, RED = fixture TODO menteur reproduisant les 7 écarts) -- [ ] `skills/reconcile/SKILL.md` + routage CLAUDE.md (triggers : "reconcile", "file vraiment vide ?", - "qu'est-ce qui reste ouvert", "inventaire chantiers") -- [ ] Détecteur de contradictions inter-registres (BDR accepted vs chantier qui le contredit) -- [ ] Gate de réconciliation (diff TODO proposé, A/B/C confirm avant edit) -- [ ] Test final = reproduire l'inventaire 2026-06-29 (cat. 1-4 + contradiction BDR-001) comme oracle +- [x] Spec : table oracle-par-source (commit existe / branche absente / tree clean / skill linké / + statut registre) — chaque "déclaré" a son test réel — DONE (lib/reconcile.sh : reconcile_oracle_*) +- [x] Décider build : superpowers:writing-skills (TDD, RED = fixture TODO menteur reproduisant les 7 écarts) — DONE (RED a4872 miroir + RED-B Index-ignore à dents + GREEN comportemental a8404) +- [x] `skills/reconcile/SKILL.md` — DONE (skill mince : orchestration + gate A/B/C + limites honnêtes) +- [x] routage CLAUDE.md (triggers : "reconcile", "file vraiment vide ?", + "qu'est-ce qui reste ouvert", "inventaire chantiers") — DONE (CLAUDE.md "Skill routing" + link.sh) +- [x] Détecteur de contradictions inter-registres (BDR accepted vs chantier qui le contredit) — DONE (reconcile_contradiction_candidates ; surface, n'asserte pas) +- [x] Gate de réconciliation (diff TODO proposé, A/B/C confirm avant edit) — DONE (SKILL.md "The gate" ; registres read-only) +- [x] Test final = reproduire l'inventaire 2026-06-29 (cat. 1-4 + contradiction BDR-001) comme oracle — DONE (run-reconcile.sh 20/20, fixtures neutres, RED prouvé rouge avant le vert) +- NOTE : bâti, non committé (develop +1 `bdfa9bc`, livrables untracked) → finish pending (feature/reconcile-skill)