chore(memory): BDR-044 + LRN-083 + auto-skill-dispatch won't-build — capitalize
BDR-044: auto-skill-dispatch chantier retired won't-build — 3rd measured moot of the session (after --help, darwin re-baseline). Cartography showed L1 (superpowers "1%->MUST invoke") already over-determines routing -> reframed from "does it route" (yes) to DISCERNMENT; risk inverted under->over. Measured in real fresh sessions (8 prompts/3 classes): clear->route, ambiguous->ask, trivial->abstain — model discriminates, no over-routing. Adding L2 prose = phantom value + degradation risk. LRN-083: subagents are an invalid instrument for measuring main-loop spontaneous routing (SUBAGENT-STOP + delegated framing pin to the no-route floor) — retired the 0/6 subagent RED. LRN-080 corroborated (3-in-a-row). TODO -> won't-build. Claude composed all -> trailers (4th application of LRN-081). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017KWG7sXg94LXX1gddCGBvM
This commit is contained in:
parent
efe33b76c5
commit
53bd7beee8
@ -65,6 +65,7 @@ rules:
|
||||
| BDR-041 | 2026-06-30 | /reconcile = deterministic declared-vs-real engine + thin gated skill (reconciler, not lister) | accepted |
|
||||
| BDR-042 | 2026-06-30 | /release-candidate = thin orchestrator over gitflow release; the tag lives in the skill, not the lib | accepted |
|
||||
| BDR-043 | 2026-06-30 | BDR-015 trigger cleared — 5 ex-broken gstack symlinks repaired → darwin re-baseline back in scope (unblocked, NOT run) | accepted |
|
||||
| BDR-044 | 2026-06-30 | auto-skill-dispatch won't-build — under-routing fear inverted to over-routing by cartography, then measured: model discriminates (clear→route, ambiguous→ask, trivial→abstain) | accepted · won't-build |
|
||||
|
||||
---
|
||||
|
||||
@ -666,3 +667,16 @@ rules:
|
||||
- **Action (NOT done)**: verify `~/.agents/skills/darwin-skill/results.tsv` still marks these 5 `status=error` ("broken gstack symlink — out of scope"); if so, re-run darwin baseline to bring them in. Status = UNBLOCKED, execution PENDING — do NOT read as "re-baselined".
|
||||
- **Distinct from [[BLK-007]]**: BLK-007/`f928a53` (2026-06-02) = a DIFFERENT symlink episode (`spec` + 5 iOS device-farm skills, source-only after a submodule bump; fixed by linking `spec`, skipping iOS). NOT the 5 of BDR-015 — kept separate to avoid a false causal link.
|
||||
- **Reference**: VÉRIF audit (subagent, filesystem-only, 2026-06-30). [[BDR-015]] caveat. darwin eval log `results.tsv`.
|
||||
|
||||
---
|
||||
|
||||
## BDR-044 — auto-skill-dispatch won't-build: under→over reframe, measured — model already discriminates
|
||||
- **Date**: 2026-06-30
|
||||
- **Status**: accepted · won't-build
|
||||
- **Decision**: do NOT add L2 routing prose to CLAUDE.md for "auto-trigger skills on intent". Chantier retired won't-build — 3rd measured moot of the session (after [[BDR-001]] --help + [[BDR-043]]/[[LRN-082]] darwin re-baseline).
|
||||
- **Why — the dependent variable inverted**: the initial fear was UNDER-routing (model ignores skills, does the task by hand). Cartography refuted it — routing is a STACK and L1 (superpowers "1% chance → you MUST invoke") already SUR-determines invocation → "does it route?" = "already yes". The real open question became DISCERNMENT (clear→route, ambiguous→ASK, trivial→abstain), and the real hazard inverted to OVER-routing. Measured in REAL fresh main-loop sessions (8 prompts, 3 classes): CLEAR→routes ✓, AMBIGUOUS→asks (refuses to guess, investigates to ask a USEFUL question) ✓, TRIVIAL→abstains ✓. The L1-vs-Workflow-rules textual tension ("1% → MUST invoke" vs "ask one question if needed / pragmatic on trivial") is resolved well in behavior — the model balances. Adding L2 bounding prose = phantom value AND risks DEGRADING an already-good discernment.
|
||||
- **Alternatives rejected**:
|
||||
- Add a routing-reinforcement instruction (original intent) → phantom value: L1 already over-determines routing; more mandate worsens the only real risk (over-routing).
|
||||
- Add an over-routing bound (clear→route / ambiguous→ask / trivial→abstain) at L2 → measurement shows the model ALREADY does this; codifying it risks perturbing it, zero upside.
|
||||
- Keyword hook on intent verbs → too noisy — the design-hook mis-fired on "design" in "auto-skill-dispatch" 3× this session; intent verbs (corrige/crée) are everywhere.
|
||||
- **Reference**: cartography L0–L4 + discernment-RED (user-run, fresh sessions). Subagent under-routing RED RETIRED as non-discriminating ([[LRN-083]]). [[LRN-080]] (measure-first), [[LRN-049]] (bound noise). TODO "auto-skill-dispatch" → won't-build.
|
||||
|
||||
@ -280,3 +280,10 @@ rules:
|
||||
- Searched for results.tsv instead of assuming its state → GONE (wiped by 23/06 make-plugin reinstall; was a local May-2026 artifact, not shipped upstream). No darwin baseline survives at all → not even a re-baseline, a fresh-from-zero one.
|
||||
- BDR-043 cleared only motif (a) of BDR-015's TWO exclusion grounds (symlinks repaired ✅, 0 broken); motif (b) external-ownership INTACT — 5 resolve to skills-external/gstack/ (submodule), darwin edits SKILL.md → would dirty submodule ([[LRN-070]]). Re-baseline = unactionable score = phantom value. Twin of --help ([[LRN-080]]), distinct mechanism (residual motif vs absent value).
|
||||
- Decision A (won't-run): TODO (b) → resolved-MOOT (not done, not open). LRN-082 capitalized (multi-motif trigger lesson). The "montre la table avant de décider" gate paid off — looking found the table gone instead of assuming status=error.
|
||||
|
||||
## 2026-06-30 (cont.) — BLOC2 auto-skill-dispatch → WON'T-BUILD (discernment measured)
|
||||
- Cartography: routing = STACK L0(design-hook)→L1(superpowers "1%→MUST invoke", dominant)→L2(CLAUDE.md prose)→L3(frontmatter)→L4([[BDR-019]]). L1 over-determines invocation → "auto-call?" = already yes.
|
||||
- Reframe C (user): real question = DISCERNMENT not "does it route"; risk inverts under→OVER-routing (L1 mandate vs Workflow "ask if needed / pragmatic on trivial").
|
||||
- Subagent RED (6 reps, toy tasks) → 0/6 routed → RETIRED as non-discriminating (SUBAGENT-STOP + delegated framing = floor artifact, not signal); did NOT report as a number → [[LRN-083]].
|
||||
- Discernment-RED in REAL fresh sessions (user-run, 8 prompts / 3 classes): CLEAR→route ✓, AMBIGUOUS→ask (refuses to guess, investigates for a useful Q) ✓, TRIVIAL→abstain ✓. Over-routing risk does NOT materialize — model balances L1 vs Workflow rules.
|
||||
- Verdict: WON'T-BUILD ([[BDR-044]]) — 3rd measured moot of the session (--help, darwin re-baseline, auto-skill-dispatch). LRN-083 capitalized; [[LRN-080]] corroborated (3-in-a-row → measure-first sweep heuristic). TODO auto-skill-dispatch → won't-build. ALL actionables soldés.
|
||||
|
||||
@ -102,6 +102,7 @@ rules:
|
||||
| LRN-080 | 2026-06-30 | before adding an instruction "to make the model do X", measure if it ALREADY does X — universal conventions (--help…) it often does; the behavioral RED can KILL the chantier (phantom value) | proposing any global instruction to elicit a behavior; CLAUDE.md additions |
|
||||
| LRN-081 | 2026-06-30 | Claude commit trailers (Co-Authored-By + Claude-Session) only on Claude-COMPOSED content; a commit merely STAGING user-authored text gets none — staging ≠ authorship | committing on the user's behalf; memory-commit.sh appends trailers by default |
|
||||
| LRN-082 | 2026-06-30 | Trigger-cleared on a multi-motif exclusion lifts only the named motif — re-check the others before acting | any "exclusion lifted / precondition cleared" — verify ALL grounds, not just the named one |
|
||||
| LRN-083 | 2026-06-30 | subagents are an INVALID instrument for measuring main-loop spontaneous routing — SUBAGENT-STOP + delegated framing pin them to the no-route floor | any RED of whether the MAIN loop self-invokes; use fresh main-loop sessions, observe via the human |
|
||||
|
||||
---
|
||||
|
||||
@ -882,6 +883,7 @@ rules:
|
||||
- **pattern**: the --help chantier (implement [[BDR-001]] as a global CLAUDE.md instruction "on --help → render help + stop") was KILLED by its behavioral RED. Before writing a line, measured the control (6 reps, `/web-validate` + `/harden`, no instruction): **6/6 already rendered rich help AND stopped without dispatching** — the supposedly-absent behavior was fully present. Residual value = format consistency across 6 divergent shapes → not worth ~5 lines in a compressed CLAUDE.md on a solo repo. A phantom-value addition avoided.
|
||||
- **why it matters**: [[LRN-075]] (test the UNGUIDED control) paying off one chantier later — measuring the RED before building is what caught it. For UNIVERSAL conventions the model already honors (--help, common flags, standard shapes), a "teach it to do X" instruction buys nothing but tokens; the only thing left to buy is consistency, which must clear its own ROI bar.
|
||||
- **future application**: before adding any global instruction to ELICIT a behavior, run the behavioral control first — does the model already do it unaided? If yes, the only remaining value is standardization; price it honestly vs the cost (esp. a compressed CLAUDE.md). Often: don't add it.
|
||||
- **corroboration 2026-06-30**: 3 consecutive "make the model do X" chantiers — --help ([[BDR-001]]), darwin re-baseline ([[BDR-043]]/[[LRN-082]]), auto-skill-dispatch ([[BDR-044]]) — ALL measured won't-build/moot. A backlog of "add instruction to elicit behavior Y" has a high phantom-value rate (universal conventions + aggressive existing mandates like superpowers L1 already elicit Y) → sweep such backlogs measure-first, expect kills.
|
||||
|
||||
## LRN-081 — Commit trailers: Claude-COMPOSED content only, never on staging of user-authored text
|
||||
- **Date**: 2026-06-30
|
||||
@ -897,3 +899,10 @@ rules:
|
||||
- **why it matters**: [[BDR-015]] excluded 5 gstack skills from /darwin-skill on TWO grounds — (a) broken symlinks AND (b) external ownership (never modify a third-party submodule). [[BDR-043]] cleared (a) only (symlinks repaired, 0 broken) → marked re-baseline "unblocked". (b) intact: darwin optimizes by EDITING SKILL.md → would edit the gstack submodule = forbidden ([[LRN-070]]). Re-baseline = a score we can't act on → phantom value.
|
||||
- **context**: 2026-06-30 — measure-first: searched for results.tsv instead of assuming → GONE (wiped by 23/06 make-plugin reinstall) → no baseline survives + (b) never lifted → action resolved-MOOT, not run. Twin of [[LRN-080]] (--help): trigger fired, measurement showed phantom value (distinct mechanism: there value-absent, here residual-motif).
|
||||
- **future application**: before acting on any "exclusion lifted / precondition cleared", enumerate ALL original grounds and verify EACH is gone — not just the one the trigger names. Cleared-A says nothing about B.
|
||||
|
||||
## LRN-083 — Subagents are an INVALID instrument for measuring MAIN-LOOP spontaneous routing
|
||||
- **Date**: 2026-06-30
|
||||
- **pattern**: to measure whether the MAIN loop self-invokes a skill on implicit intent, dispatched subagents are non-discriminating — SUBAGENT-STOP tells them to SKIP the L1 routing mandate, and a delegated-execute framing suppresses meta-routing → they hand-do the task regardless of how strong/weak the main-loop prose is. Result pins to the no-route FLOOR (artifact, not signal). Complement of [[LRN-028]] (there subagents OVER-saw installed skills, invalidating a no-skill baseline; here they UNDER-route, invalidating a routing-measurement) — both = subagent ≠ main-loop condition.
|
||||
- **why it matters**: a 0/N subagent RED reads as "under-triggers → build the chantier" but is the [[LRN-028]] trap — the instrument can't tell strong prose from weak. Concluding from it = a pass/fail for the WRONG reason ([[LRN-074]]/[[LRN-077]]).
|
||||
- **context**: 2026-06-30 auto-skill-dispatch RED. 6 subagents on toy implicit-intent tasks → 0/6 routed → RETIRED as non-discriminating, NOT reported as a number. Reframed; measured instead in REAL fresh main-loop sessions.
|
||||
- **future application**: measure main-loop spontaneous routing/discernment in FRESH main-loop sessions (full L0–L4, no SUBAGENT-STOP, real user-turn). Observable instrument = the HUMAN typing the prompts + watching live — cron/schedule-spawned fresh sessions are the right CONDITION but UNOBSERVABLE to the orchestrator (they notify the owner, not the dispatcher), so they can't be the measurement vehicle. Never substitute a subagent for a fresh session in a routing RED. See [[LRN-028]], [[LRN-075]], [[LRN-080]].
|
||||
|
||||
@ -367,8 +367,9 @@ Subtasks (à détailler au lancement) :
|
||||
- [x] routage CLAUDE.md — présent (~/.claude/CLAUDE.md "Cut a release → release-candidate")
|
||||
- [x] test — prouvé par la release réelle 4.0.0 : fan-out main (709facf) + develop (4a00a60) + tag v4.0.0
|
||||
|
||||
## [NEXT — MESURE D'ABORD] Auto-déclenchement des skills par intention
|
||||
> ⏭️ NEXT, mais CADRÉ : **pas de design avant la mesure**. Jumeau méthodologique de [[BDR-001]] `--help` (won't-build après RED) — même piège architectural, même garde-fou [[LRN-080]] (mesurer avant d'instruire) + [[LRN-049]] (borner le bruit avant le marqueur). Les subtasks ci-dessous s'arrêtent à la mesure ; le design ne s'ouvre QUE si le RED valide la valeur.
|
||||
## Auto-déclenchement des skills par intention [WON'T-BUILD 2026-06-30 — mesuré : Claude discrimine déjà (3 classes)]
|
||||
> ⛔ WON'T-BUILD (2026-06-30) : 3e moot de la série (après [[BDR-001]] --help + [[BDR-043]]/[[LRN-082]] darwin re-baseline). Cartographie : routing = STACK L0(design-hook)→L1(superpowers « 1%→MUST invoke », dominant)→L2(prose CLAUDE.md)→L3(frontmatter)→L4(BDR-019). L1 SUR-détermine déjà l'invocation → « auto-call ? » = déjà oui. Reframe C : la vraie question = DISCERNEMENT, risque inversé under→**OVER**-routing. Mesure en VRAIES sessions fraîches (8 prompts / 3 classes) : CLEAR→route ✓, AMBIGUË→demande (refuse de deviner, investigue pour une question utile) ✓, TRIVIALE→s'abstient ✓. Le sur-routing soupçonné (L1 vs règles Workflow) NE se matérialise PAS — le modèle équilibre. Prose de bornage L2 = valeur fantôme + risque de DÉGRADER un discernement déjà bon. Voir [[BDR-044]] (reframe + verdict), [[LRN-083]] (RED sous-agent invalide), [[LRN-080]] (mesure-first, corroboré 3-in-a-row). RED sous-agent initial (0/6) RETIRÉ comme non-discriminant (plancher artefact). Design + subtasks ci-dessous = historique, non actionnables.
|
||||
> ⏭️ (historique) NEXT, mais CADRÉ : **pas de design avant la mesure**. Jumeau méthodologique de [[BDR-001]] `--help` (won't-build après RED) — même piège architectural, même garde-fou [[LRN-080]] (mesurer avant d'instruire) + [[LRN-049]] (borner le bruit avant le marqueur). Les subtasks ci-dessous s'arrêtent à la mesure ; le design ne s'ouvre QUE si le RED valide la valeur.
|
||||
|
||||
**Contrainte architecturale (établie pour `--help`, non négociable) :**
|
||||
Aucun mécanisme n'intercepte le message utilisateur pour *lancer* un skill. La harness ne route pas avant que le modèle réponde — un skill n'est invoqué QUE par le modèle (outil Skill). Donc « auto-call déterministe » = IMPOSSIBLE. Le seul levier sur l'invocation elle-même = instruire le MODÈLE à reconnaître l'intention et appeler le bon skill → **conformité-modèle, PAS déterminisme**. C'est une instruction de routage CLAUDE.md, pas un mécanisme.
|
||||
@ -377,10 +378,10 @@ Aucun mécanisme n'intercepte le message utilisateur pour *lancer* un skill. La
|
||||
**Substrat déjà en place :** [[BDR-019]] a retiré `disable-model-invocation` repo-wide → le modèle PEUT déjà self-router vers les skills (défaut = activé ; user l'avait vécu live : intention feature détectée, `ship-feature` voulu, jadis bloqué). Et la section « Skill routing » de CLAUDE.md existe déjà. Donc la **baseline du RED = le routage CLAUDE.md ACTUEL tel quel** ; le chantier n'a de valeur que si le RED prouve que cette prose SOUS-déclenche sur intention claire (exactement la logique --help : baseline = convention déjà là, question = est-ce qu'instruire en plus change quoi que ce soit).
|
||||
|
||||
**Le chantier COMMENCE par (rien d'autre avant) :**
|
||||
- [ ] (a) **Cartographier** le routage CLAUDE.md actuel — quels signaux → quels skills sont déjà censés router (« Skill routing » + « Design work » + descriptions de skills). État des lieux factuel, pas de jugement.
|
||||
- [ ] (b) **RED comportemental** ([[LRN-080]]) — prompts d'intention IMPLICITE, naturalistes, SANS instruction renforcée : « il y a un bug, debug », « on va créer X », « corrige ceci », « refactor ce module », « cut a release »… → le modèle invoque-t-il le bon skill, ou fait-il la tâche à la main en ignorant le skill ? N reps, plusieurs intents distincts.
|
||||
- [x] (a) **Cartographier** le routage CLAUDE.md actuel — quels signaux → quels skills sont déjà censés router (« Skill routing » + « Design work » + descriptions de skills). État des lieux factuel, pas de jugement.
|
||||
- [x] (b) **RED comportemental** ([[LRN-080]]) — prompts d'intention IMPLICITE, naturalistes, SANS instruction renforcée : « il y a un bug, debug », « on va créer X », « corrige ceci », « refactor ce module », « cut a release »… → le modèle invoque-t-il le bon skill, ou fait-il la tâche à la main en ignorant le skill ? N reps, plusieurs intents distincts.
|
||||
- Garde-fou RED : **ne PAS amorcer**. Sessions fraîches / sous-agents, prompts naturels, zéro mention de « skill » / « routage » / « test » dans le prompt mesuré (sinon le modèle route parce qu'il SAIT qu'on le teste — contamination). Le RED `--help` était mécanique donc peu sensible à l'amorçage ; l'intent-routing l'est beaucoup plus → rigueur supérieure requise.
|
||||
- [ ] (c) **Décider selon le RED** :
|
||||
- [x] (c) **Décider selon le RED** :
|
||||
- déjà bon (comme --help) → chantier MINCE, voire won't-build ; capitaliser le constat (3e état : mesuré non-rentable, ni fait ni ouvert).
|
||||
- sous-déclenche → vraie valeur : renforcer la **prose de routage** (levier modèle) sur signaux CLAIRS uniquement — PAS un hook keyword (trop bruyant, cf. nuance ci-dessus).
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user