Compare commits

...

19 Commits

Author SHA1 Message Date
Bastien Chanot
b03cb0b910 chore(memory): BDR-030 + LRN-042 + journal + TODO
Capitalize the install-self-sufficient / gstack-on-demand session:
- BDR-030: gstack skills activated on-demand per profile, OFF by default.
- LRN-042: npx skills add / setup resolve target relative to CWD — run
  from $HOME or artifacts land in the repo tree, unreachable by link.sh.
- journal 2026-06-23 line + TODO task block reconciled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0169vjUD1sP9Nx4ZiCa8wvAw
2026-06-24 14:22:47 +02:00
Bastien Chanot
0b92935d6d feat(profile): activate gstack skills on-demand per profile
gstack stays OFF by default (no per-skill symlink in skills/, zero context
cost). enable_skill now gains a gstack branch: a skill absent from skills/
and skills-disabled/ but present in the skills-external/gstack submodule is
symlinked in on demand when a profile lists it; disable_gstack_not_in()
parks it again on an unrelated profile.

This makes `set full` (which lists 35 gstack skills) work without 35 bogus
"missing — try: bash link.sh" warnings, without abandoning the OFF-by-default
policy. The old remedy message was wrong (link.sh never creates gstack
skills) and is replaced with submodule-aware messages.

Refs BDR-030.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0169vjUD1sP9Nx4ZiCa8wvAw
2026-06-24 14:22:36 +02:00
Bastien Chanot
29c4c9ea67 fix(install): make install self-sufficient + npx skills add from $HOME
Two root causes found via the install log (install-20260623-181416.log):

A. install.sh runs link.sh BEFORE install-plugins.sh, and install-plugins
   never re-linked, so npx/external skill symlinks were missing on a fresh
   run. Add a final Step 10 that re-runs link.sh (idempotent), so
   `make plugin`/`make install` finish with nothing left to link by hand.

B. `npx skills add` resolves its target (.agents/skills, skills-lock.json)
   relative to the CWD. Run from the repo (which carries gitignored .agents/
   and .claude/), skills landed in $REPO/.agents/skills instead of
   $HOME/.agents/skills where link.sh looks — self-reinforcing once
   $REPO/.agents exists. Run `skills add` from $HOME in both install and
   update paths, and clean the stray repo-local skills dirs (gitignored,
   safe to rm).

Refs LRN-042.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0169vjUD1sP9Nx4ZiCa8wvAw
2026-06-24 14:22:25 +02:00
Bastien Chanot
ed5b54e87e chore(graphify): update skill to v0.8.45
Bump 0.8.13 -> 0.8.45. Extract the SKILL.md monolith (~530 lines) into
references/ for progressive disclosure: github-and-merge, transcribe,
extraction-spec, exports, update, query, add-watch, hooks. SKILL.md now
points to each reference and loads it only on the path that needs it.

Inline fixes carried by the new version: empty-extraction guard before
any write (#1392), shrink-guard ordering so GRAPH_REPORT/analysis never
describe a graph.json that was refused (#479), root= relativization for
build/manifest parity across clones (#1361/#1417), stale-cache cleanup
and code-only semantic pre-write (#1392), edge-direction preserving
merge (#801). Adds FalkorDB export (--falkordb/--falkordb-push) and
rewrites the frontmatter description (drops the obsolete trigger: field).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0169vjUD1sP9Nx4ZiCa8wvAw
2026-06-24 14:22:14 +02:00
Bastien Chanot
6516b85f0f chore(memory): EVAL-005 — obsolete effort alias missed (cross-config audit gap)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 17:54:52 +02:00
Bastien Chanot
d0a3740de5 feat(install): remove obsolete claude --effort max alias
Effort is set in settings.json ("effortLevel": "xhigh") — the source of
truth. The CLI alias `claude --effort max` was redundant and, worse, would
OVERRIDE settings.json (forcing max over xhigh). Step 9 no longer adds it and
now strips it (and the older CLAUDE_EFFORT env) from the shell profile if
present, cleaning orphaned comment lines.

(A dtach `cc` launcher was prototyped here and dropped — deferred to a later
sprint per the user.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 17:54:52 +02:00
Bastien Chanot
960f0f92ce chore(memory): LRN-041 — MAGIC_API_KEY symlink false-negative
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 17:30:40 +02:00
Bastien Chanot
1b028cbc25 fix(install): MAGIC_API_KEY false-negative when repo/.env symlink missing
The magic check + link_env grep'd `^MAGIC_API_KEY=` on $REPO/.env, but on a
fresh machine ~/.claude/.env is often created AFTER link.sh runs, so the
repo/.env symlink (which toggle-external.sh sources) is never made — the key
looks absent though it's set, and the warning misleadingly points at
~/.claude/.env.

- install-plugins.sh: self-heal — if ~/.claude/.env exists but repo/.env is
  missing, create the symlink before checking. Accurate message.
- Both: tolerate optional `export ` + leading whitespace and require a
  non-empty value (regex sanity-tested), so common .env formats match.

Immediate fix for an affected machine: `make link`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 17:30:09 +02:00
Bastien Chanot
735b62a002 chore(memory): BDR-029 + LRN-040 + BLK-008 resolved (gstack browser on Ubuntu 26.04)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 16:59:23 +02:00
Bastien Chanot
3b8ffb17b1 feat(install): auto-enable gstack browser on Ubuntu 24.04+
Two OS-too-new layers blocked gstack's browser on Ubuntu 26.04; handle both
from the installer so a fresh `make install` works without manual steps:

1. Playwright version — gstack pins 1.58.x which has no browser build for
   ubuntu>24.04 ("does not support chromium on ubuntu26.04"). New
   gstack_bump_playwright_if_unsupported() runs before ./setup: if the
   pinned Playwright's support list lacks the running distro, `bun add
   playwright@latest` in the submodule (1.61 supports 26.04), then ./setup's
   frozen-lockfile install picks it up and rebuilds the browse binary against
   it. Idempotent (skips when already supported). Edits the submodule locally
   — goes dirty, reset by `git submodule update`, re-applied next install.

2. Chromium sandbox — Ubuntu 24.04+ restricts unprivileged user namespaces
   via AppArmor, so Chromium aborts "No usable sandbox". Persist gstack's
   documented opt-out GSTACK_CHROMIUM_NO_SANDBOX=1 to the shell profile, gated
   on the exact sysctl (kernel.apparmor_restrict_unprivileged_userns=1) so it
   only triggers where the restriction is active.

Verified end-to-end on Ubuntu 26.04: gstack browse drives a real page
(Navigated 200). See BDR-029 / LRN-040 / BLK-008.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 16:58:30 +02:00
Bastien Chanot
637b8379b1 chore(memory): correct BLK-008 + LRN-038 — Playwright override reverted (hangs)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 16:24:44 +02:00
Bastien Chanot
b9c3937cd0 revert(install): drop Playwright host-platform override — it hangs on 26.04
The PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64 pin (211c7d4) made
Playwright 1.58.2 stop erroring and instead download a Chrome-for-Testing
fallback build — but that download reaches 100% and then HANGS at extraction
on Ubuntu 26.04 (reproduced on a real machine + here: chrome binary never
materializes, no headless-shell download starts). Net effect: the override
turned a 0.5s fast-fail into an indefinite hang that blocks `make install` /
`make plugin` (user had to Ctrl+C).

Reverting restores the original behavior: gstack's ./setup fast-fails the
browser install (non-fatal — gstack is OFF by default, browser only needed
for /browse, /qa, screenshots) and the install completes. Replaced the code
with a NOTE explaining the dead end. Real fix is upstream: gstack bumping
Playwright to a version that supports the OS. See BLK-008.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 16:23:52 +02:00
Bastien Chanot
cba0672749 chore(memory): BDR-028 + LRN-039 (installer config drift guard + de-vendor)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 15:29:47 +02:00
Bastien Chanot
7de8761836 chore(repo): stop tracking installer-managed files
Three paths are (re)generated by every install/update and should never be
committed:
- skills-external/frontend-design/ — install-plugins.sh Step 8b and
  update-all.sh cp the latest SKILL.md from the example-skills plugin cache
  over it, so it churned a diff each time Anthropic shipped an update. The
  source is always re-synced (example-skills is always installed), so no
  vendored copy is needed.
- .agents/ and skills-lock.json — `npx skills add` (darwin-skill) installs
  at project scope into the repo. Our own agents live in agents/ (no dot)
  and stay tracked; the dotted pollution dir is anchored-ignored (/.agents/).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 15:29:01 +02:00
Bastien Chanot
51afe9bd19 build(install): auto-revert curated config after install
graphify's installer rewrites CLAUDE.md + .claude/settings.json (clobbers
the curated graphify section, drops the "This repo only" header, injects
aggressive MANDATORY pre-tool hooks) and `claude plugin install` flips
enable-states in settings.json. These 3 files are hand-curated, never
installer-owned.

Snapshot them at the top of install-plugins.sh and restore on EXIT (trap)
so `make install` / `make plugin` leaves them exactly as found. Pre-existing
local edits are preserved; only installer drift is undone. Verified with an
isolated drift→restore test. update-all.sh needs no guard — it only runs
`claude plugin update` (no enable flips) and never re-runs graphify's
CLAUDE.md/settings integration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 15:29:01 +02:00
Bastien Chanot
4e178dc393 chore(memory): BDR-027 + LRN-038 + BLK-008 (install revert + Ubuntu 26.04 chromium)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 14:11:52 +02:00
Bastien Chanot
2194b11329 fix(install): install jq prerequisite (active hooks require it)
jq is used 18+ times in always-on hooks (statusline.sh, rtk-rewrite.sh)
but was never installed by any script — it only worked because dev
machines happened to have it; a bare machine breaks at hook-run time.
Add it to Step 1 (same inline pattern as the other prereqs) and to
doctor.sh at fail level.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 14:09:55 +02:00
Bastien Chanot
211c7d4594 fix(install): pin Playwright host-platform on Ubuntu >24.04
Ubuntu 26.04 (and any release newer than 24.04) isn't in Playwright
1.58.2's supported-build list, so gstack's ./setup aborted with
"Playwright does not support chromium on ubuntu26.04-x64".

Add gated helper playwright_platform_override() (ubuntu >24.04 → echoes
ubuntu24.04-<arch>, else nothing). Export PLAYWRIGHT_HOST_PLATFORM_OVERRIDE
before gstack ./setup (install-time download) and persist it to the shell
profile (runtime browser launch). Playwright then pulls a compatible
Chrome-for-Testing fallback build instead of erroring.

Verified on Ubuntu 26.04: override emitted correctly (no var leak), CfT
build resolves all shared libs (ldd) and renders headless. No submodule
edits — purely an env pin from the wrapper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 14:09:12 +02:00
Bastien Chanot
b6cc8b1a86 fix(install): nvm fallback when node/npm missing
Fresh machine had no npm → install.sh err-exited before the Claude Code
CLI install could run. Instead of aborting, bootstrap the current LTS via
nvm (v0.39.7) → `nvm install --lts` when node or npm is absent. Keeps the
>=18 floor + friendly messages on hard failure.

Replaces the reverted lib/install-prereqs.sh centralization with the
minimal targeted fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UyNYwD4UccVw9ZCFZyJX55
2026-06-23 14:09:12 +02:00
25 changed files with 1285 additions and 767 deletions

View File

@ -94,3 +94,15 @@ rules:
- **Solution applied** (NOT full `./setup` — surgical, no side effects): (1) Linked `spec` only — `mkdir skills/spec` + `ln -snf <abs>/skills-external/gstack/spec/SKILL.md skills/spec/SKILL.md`, matching gstack setup:440-476 (per-skill real dir + SKILL.md symlink, name from frontmatter). (2) Added `spec` to `full.profile` + `web-full.profile` planning sections (must be in active profile `full` else `set full` re-disables it). (3) iOS 5 skills deliberately NOT linked — Linux host, device-farm needs Mac daemon + Tailscale + iOS devices = dead skills + token cost. (4) Completed `.gitignore` gstack allowlist: added all 12 missing (`spec`, 5 `ios-*`, 6 parked `document-generate/landing-report/scrape/setup-gbrain/skillify/sync-gbrain`), removed stale `checkpoint` (BLK-005 rename). Reason: `gstack on` (BDR-018) moves parked skills into `skills/` — any gstack skill missing from allowlist = untracked git noise on enable. - **Solution applied** (NOT full `./setup` — surgical, no side effects): (1) Linked `spec` only — `mkdir skills/spec` + `ln -snf <abs>/skills-external/gstack/spec/SKILL.md skills/spec/SKILL.md`, matching gstack setup:440-476 (per-skill real dir + SKILL.md symlink, name from frontmatter). (2) Added `spec` to `full.profile` + `web-full.profile` planning sections (must be in active profile `full` else `set full` re-disables it). (3) iOS 5 skills deliberately NOT linked — Linux host, device-farm needs Mac daemon + Tailscale + iOS devices = dead skills + token cost. (4) Completed `.gitignore` gstack allowlist: added all 12 missing (`spec`, 5 `ios-*`, 6 parked `document-generate/landing-report/scrape/setup-gbrain/skillify/sync-gbrain`), removed stale `checkpoint` (BLK-005 rename). Reason: `gstack on` (BDR-018) moves parked skills into `skills/` — any gstack skill missing from allowlist = untracked git noise on enable.
- **Verified**: `profile show full`+`web-full` → spec enabled; allowlist drift recheck EMPTY; spec skill now visible to Claude. - **Verified**: `profile show full`+`web-full` → spec enabled; allowlist drift recheck EMPTY; spec skill now visible to Claude.
- **Status**: resolved. iOS = intentional exclusion (re-linkable via gstack `./setup` on a Mac). See [[gstack-gitignore-allowlist-completeness]] (LRN-025). - **Status**: resolved. iOS = intentional exclusion (re-linkable via gstack `./setup` on a Mac). See [[gstack-gitignore-allowlist-completeness]] (LRN-025).
## BLK-008 — gstack ./setup fails on Ubuntu 26.04 — Playwright chromium unsupported
- **Date**: 2026-06-23
- **Friction**: fresh Ubuntu 26.04, `make install` / `make plugin` → "Failed to install browsers / ERROR: Playwright does not support chromium on ubuntu26.04-x64" → "GStack ./setup failed". Non-fatal in our wrapper (warn only) but gstack's browser (`/browse`, `/qa`, design screenshots) is silently dead once gstack is enabled.
- **Real cause**: Playwright 1.58.2 (pinned in the gstack submodule) registry lists `ubuntu20.04/22.04/24.04` only; 26.04 released later → not in list → `getHostPlatform` errors. Pure OS-newness, not an install bug.
- **Solution**: gated `export PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64` (ubuntu >24.04 only) before gstack setup + persisted to `.bashrc` for runtime. Playwright then pulls a Chrome-for-Testing fallback build for ubuntu24.04. Verified on 26.04: `ldd` resolves all libs + real headless render OK.
- **Status**: resolved (commit 211c7d4). Residual: exact rev 1208 launch not in-session-tested (sandbox download hung at extraction); proved via sibling rev 1228 same-platform CfT build. Confirm on next real `make plugin`. Proper upstream fix = gstack bumps Playwright to a version that lists ubuntu26.04. See [[LRN-038]].
- **2026-06-23 UPDATE — Solution REVERTED, status downgraded to UPSTREAM/open** (commit b9c3937): the `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE` solution above does NOT work on 26.04. The fallback build downloads to 100% then HANGS at extraction (chrome binary never appears, no headless-shell download starts; reproduced on real machine + sandbox) → turned a 0.5s fast-fail into an install-blocking hang (user Ctrl+C). Reverted to the fast-fail (non-fatal; gstack OFF by default, browser only for /browse,/qa,screenshots). The earlier "verified ldd + headless render" was an isolated test on a sibling already-extracted build (rev 1228) — it masked the rev-1208 install-path hang. **Real fix = upstream**: gstack bumps Playwright to a version that lists ubuntu26.04. Until then gstack's browser is unavailable on 26.04, install completes cleanly. See [[LRN-038]] correction.
- **2026-06-23 FINAL — RESOLVED** (commit 3b8ffb1): gstack browser now works on Ubuntu 26.04. Two layers fixed: (1) bumped gstack's pinned Playwright 1.58.2 → 1.61 (`bun add playwright@latest` in the submodule; 1.61 ships a native ubuntu26.04 build — chromium rev 1228), automated in the installer (`gstack_bump_playwright_if_unsupported`, idempotent, OS-gated); (2) `GSTACK_CHROMIUM_NO_SANDBOX=1` to work around the AppArmor userns restriction (`sysctl kernel.apparmor_restrict_unprivileged_userns=1`), persisted to `.bashrc` + installer Step 9 (sysctl-gated). Verified end-to-end: `browse goto https://example.com` → "Navigated (200)". Caveat: the Playwright bump is a local submodule edit, reset by `git submodule update`, re-applied by the next install. See [[BDR-029]], [[LRN-040]].

View File

@ -451,3 +451,50 @@ rules:
- Secret in `repo/.env`, gitignored (status quo) — one `git add -f` or a `.gitignore` slip leaks it; the secret physically sits in the tree. - Secret in `repo/.env`, gitignored (status quo) — one `git add -f` or a `.gitignore` slip leaks it; the secret physically sits in the tree.
- Scripts read `~/.claude/.env` directly — makes the symlink redundant but rewrites every read path and loses repo-local visibility. - Scripts read `~/.claude/.env` directly — makes the symlink redundant but rewrites every read path and loses repo-local visibility.
- **Reference**: `link.sh` `link_env()`, `.gitignore`, `lib/toggle-external.sh`, `install-plugins.sh`, `.env.example`, commits 131d0bc / f9cc866. Linked to [[BDR-025]] (magic's `MAGIC_API_KEY`, consumed by the gate's required-but-manual class). - **Reference**: `link.sh` `link_env()`, `.gitignore`, `lib/toggle-external.sh`, `install-plugins.sh`, `.env.example`, commits 131d0bc / f9cc866. Linked to [[BDR-025]] (magic's `MAGIC_API_KEY`, consumed by the gate's required-but-manual class).
---
## BDR-027 — Minimal npm-via-nvm bootstrap over centralized prereq lib (reverses the reverted approach)
- **Date**: 2026-06-23
- **Status**: accepted (supersedes the reverted `lib/install-prereqs.sh` centralization, commit 1ddeed1 removed from history)
- **Decision**: the only real bootstrap blocker = `npm` absent on fresh machine. `install.sh` now installs current LTS via nvm (`v0.39.7` → `nvm install --lts`) ONLY when node/npm missing (`install_node_via_nvm`). Keep the inline per-tool prereq blocks in `install-plugins.sh` (no shared `ensure_*` lib). Re-add `jq` inline (Step 1) + `doctor.sh` fail-level — `jq` is an active-hook dep that was never installed.
- **Why**: a 1-function fallback fixes the actual blocker. Folding 9 prereqs into a 245-line lib was scope-creep for "npm missing"; user reverted it. Inline blocks stay readable + co-located with their step.
- **Alternatives rejected**: centralized `lib/install-prereqs.sh` (commit 1ddeed1 — over-engineered for the real blocker, reverted); leave `npm` as a hard `err` (the original bug — aborts before the CLI install).
- **Reference**: `install.sh` `install_node_via_nvm`, `install-plugins.sh` Step 1 jq, `doctor.sh`, commits b6cc8b1 / 2194b11. Linked to [[BLK-008]] (the chromium half of the same fresh-Ubuntu-26.04 session).
---
## BDR-028 — Hand-curated config is install-immutable (auto-revert guard) + de-vendor installer-managed skills
- **Date**: 2026-06-23
- **Status**: accepted
- **Decision**: `install-plugins.sh` snapshots `CLAUDE.md` + `settings.json` + `.claude/settings.json` at start, restores them on EXIT (trap) → installer never mutates hand-curated config. `frontend-design` un-tracked (`git rm --cached` + gitignore `skills-external/frontend-design/`) — re-synced from the example-skills plugin cache every run, so vendoring = pure churn. npx-skills pollution (`/.agents/`, `/skills-lock.json`) gitignored, anchored so our `agents/` stays tracked.
- **Why**: a fresh `make install` drifted all 4: graphify clobbered `CLAUDE.md` (deleted the `# This repo only` header) + injected aggressive MANDATORY pre-tool hooks; `claude plugin install` flipped `example-skills`→true + added `plugin-dev`; frontend-design diffed on every upstream update; darwin-skill polluted repo `.agents/` at project scope. Guard = these files maintained by hand+commit only; gitignore = generated artifacts never tracked.
- **Caveat**: guard makes the 3 config files install-immutable — anything the installer SHOULD add must be committed by hand. Safe today: committed `settings.json` already carries the rtk hook (install skips init). `update-all.sh` needs no guard (only `claude plugin update`, no enable flips, no graphify reconfig).
- **Alternatives rejected**: `git checkout` post-install (nukes legit uncommitted edits, depends on git state); surgical JSON/markdown patching (fragile); accept graphify's generic CLAUDE.md (loses curation).
- **Reference**: `install-plugins.sh` guard block + `restore_curated_configs` trap, `.gitignore`, commits 51afe9b / 7de8761. Linked to [[LRN-039]].
---
## BDR-029 — Installer auto-fixes gstack browser on OS newer than its pinned Playwright supports
- **Date**: 2026-06-23
- **Status**: accepted
- **Decision**: `install-plugins.sh` makes gstack's browser work on too-new distros without manual steps. (1) `gstack_bump_playwright_if_unsupported()` runs before `./setup`: if the pinned Playwright's support list lacks the running distro (grep `node_modules/playwright-core/lib` for the `ubuntuXX.04` tag), `bun add playwright@latest` in the submodule, then `./setup`'s frozen-lockfile install picks it up + rebuilds the browse binary. Idempotent (skips when already supported). (2) Persist `GSTACK_CHROMIUM_NO_SANDBOX=1` to the shell profile, gated on `sysctl kernel.apparmor_restrict_unprivileged_userns=1`.
- **Why**: fresh `make install` on Ubuntu 26.04 must yield a working gstack browser. Submodule pins Playwright 1.58.2; upstream hasn't bumped; can't wait. Local bump in the installer = "just works" + self-heals after a `git submodule update` (re-applies next run).
- **Caveats**: the installer EDITS the submodule (goes dirty each run on a too-new OS) — invasive, but the user chose it over waiting upstream. `bun add playwright@latest` could pull a Playwright that breaks gstack's build → non-fatal (`./setup` fail warns, install continues). The local bump is reset by `git submodule update`. The `.bashrc` env can be wiped if the user restores a hand-managed `.bashrc` (theirs is managed — the first install's lines were already lost that way).
- **Alternatives rejected**: `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE` (fallback build HANGS at extraction — [[BLK-008]]); wait for gstack upstream Playwright bump (no ETA); leave browser unavailable (user wanted it); system chromium + executablePath (needs gstack code change).
- **Reference**: `install-plugins.sh` `gstack_bump_playwright_if_unsupported()` + Step 9 sysctl-gated env, commit 3b8ffb1. Linked to [[LRN-040]], [[BLK-008]].
---
## BDR-030 — gstack skills activated ON-DEMAND per profile, not pre-installed; OFF by default stays
- **Date**: 2026-06-23
- **Status**: accepted
- **Decision**: gstack stays OFF by default (no per-skill symlink in `skills/`, zero context cost) — but `profile.sh set <profile>` that LISTS a gstack skill activates it for that profile. `enable_skill gstack` gained a branch: skill not in `skills/` and not parked in `skills-disabled/` but present in the `skills-external/gstack/<name>` submodule → `ln -s` it into `skills/`. `disable_gstack_not_in()` parks it again when an unrelated profile is set. The gstack/bin + browse/dist infra those skills need is created independently by `link.sh`.
- **Why**: user wanted `make install` self-sufficient AND `set full` (lists 35 gstack skills) to work without 35 `missing — try: bash link.sh` warnings, WITHOUT abandoning gstack's OFF-by-default context-cost policy ([[BDR-029]] install comment). On-demand-per-profile threads both: gstack invisible until a profile needs it, then auto-on for exactly that profile. Source of truth = the submodule (`gstack_skills()` already reads `skills-external/gstack/*/SKILL.md`), so activation needs no gstack `./setup` skill-registration (which this gstack version writes to the WRONG dir anyway — [[LRN-042]]).
- **Caveats**: the symlink form (`skills/<name> -> skills-external/gstack/<name>`) differs from what gstack `./setup` would create (real dir + symlinked SKILL.md) — fine here because `./setup` never populates `skills/` in this layout, so no mixed-form collision. Browse RUNTIME still needs the built binary + sandbox env ([[BDR-029]]) — on-demand makes the skill DISCOVERABLE, not the browser functional on an unsupported OS. The old "try: bash link.sh" message was wrong (link.sh never creates gstack skills) → replaced with submodule-aware messages.
- **Alternatives rejected**: full gstack integration (make `./setup` install into `skills/`) — user picked option 1, too invasive/version-fragile; leave `full` broken with honest 1-line warning — worse UX; pre-symlink all gstack at install — violates OFF-by-default context policy.
- **Reference**: `lib/profile.sh` `GSTACK_SRC` + `enable_skill` gstack branch. Verified: `set full` → 0 missing, 35 on-demand; `minimal`↔`full` cycle re-parks/restores; git clean (gstack symlinks gitignored, [[LRN-025]]). Linked to [[LRN-042]], [[LRN-022]], [[BDR-018]] (gstack on/off verb).

View File

@ -64,3 +64,13 @@ rules:
- **Method**: 5 parallel structure judges (shared rubric file, calibration anchor, lower-score-when-hesitating rule) + 5 behavior tests on fixtures (hotfix, geo, commit-change, status, analyze) + geo fix validated by re-test (0 source edits, `?? .claude/` only) + 2/2 counterbalanced blind judges (safety 3→9). - **Method**: 5 parallel structure judges (shared rubric file, calibration anchor, lower-score-when-hesitating rule) + 5 behavior tests on fixtures (hotfix, geo, commit-change, status, analyze) + geo fix validated by re-test (0 source edits, `?? .claude/` only) + 2/2 counterbalanced blind judges (safety 3→9).
- **Anomalies**: (1) KEY: stub skills (analyze 33.5, hotfix 36.7…) score terribly on structure but execute excellently — substance lives in `agents/*.md`; rubric must judge SKILL.md+agent.md as system, else misleading. (2) geo confirmed live: 2 HTML source files edited unsupervised pre-fix. (3) Self-inflicted: overwrote 5 pre-existing test-prompts.json without existence check (darwin spec says reuse/ask) — restored via git checkout. (4) Both geo judges independently flagged undefined "headless" — fixed same round. - **Anomalies**: (1) KEY: stub skills (analyze 33.5, hotfix 36.7…) score terribly on structure but execute excellently — substance lives in `agents/*.md`; rubric must judge SKILL.md+agent.md as system, else misleading. (2) geo confirmed live: 2 HTML source files edited unsupervised pre-fix. (3) Self-inflicted: overwrote 5 pre-existing test-prompts.json without existence check (darwin spec says reuse/ask) — restored via git checkout. (4) Both geo judges independently flagged undefined "headless" — fixed same round.
- **Action**: keep — bugs real, fixes verified. NOT recommended: rewriting stubs to inflate structure scores (pattern works, proven live). - **Action**: keep — bugs real, fixes verified. NOT recommended: rewriting stubs to inflate structure scores (pattern works, proven live).
---
## EVAL-005 — Obsolete `claude --effort max` alias missed across repeated Step 9 edits
- **Date**: 2026-06-23
- **Output checked**: install-plugins.sh Step 9 kept `alias claude='claude --effort max'` while `settings.json` sets `"effortLevel": "xhigh"` (the source of truth). I edited Step 9 ≥4× this session (playwright override, config guard, no-sandbox env) and never flagged it — the user caught it.
- **Method / why missed**: I treated the pre-existing `CLAUDE_LINES` as established and only touched the lines I was adding/removing. Spotting the redundancy needs cross-referencing TWO config layers (shell alias vs settings.json) — a semantic check I never ran. Masked further: the user's `.bashrc` is hand-managed and the alias line wasn't even present, so it looked inert.
- **Anomaly**: not just dead config — a CLI flag (`--effort max`) silently OVERRIDES the settings.json value (`xhigh`). Real correctness bug.
- **Action**: when editing installer shell-config, audit EACH existing line against the current settings.json / CLAUDE.md source of truth, not only the lines being changed. Removed the alias + added cleanup. General rule: reconcile config to ONE source of truth across env/alias/settings layers.

View File

@ -172,3 +172,21 @@ rules:
- Built via superpowers:writing-skills TDD: RED v1 baseline too easy (passed) → strengthened to RED v2 (pressured) which failed on anti-noise + invented subtask + no gate → GREEN passed. Gate STOP itself untested (non-interactive harness) — flagged as skill Red flag. - Built via superpowers:writing-skills TDD: RED v1 baseline too easy (passed) → strengthened to RED v2 (pressured) which failed on anti-noise + invented subtask + no gate → GREEN passed. Gate STOP itself untested (non-interactive harness) — flagged as skill Red flag.
- LRN-031: skill value = gate + anti-noise + determinism, NOT re-coding what a capable agent does free; if RED baseline passes, harden the fixture before writing. - LRN-031: skill value = gate + anti-noise + determinism, NOT re-coding what a capable agent does free; if RED baseline passes, harden the fixture before writing.
- Docs routing synced (CLAUDE.md table + README + USAGE) in separate commit; caveman-purge WIP in those files left unstaged. Commits 9dc2b83, be0f047, 765e9d7. - Docs routing synced (CLAUDE.md table + README + USAGE) in separate commit; caveman-purge WIP in those files left unstaged. Commits 9dc2b83, be0f047, 765e9d7.
## 2026-06-23
- Reverted commit 1ddeed1 (centralized `lib/install-prereqs.sh`) — over-engineered for the real blocker. Replaced with minimal npm-via-nvm fallback in `install.sh` (b6cc8b1). Re-added `jq` prereq inline + `doctor.sh` fail-level (2194b11). BDR-027.
- Diagnosed gstack chromium fail on Ubuntu 26.04: Playwright 1.58.2 doesn't list 26.04. Fix = gated `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64`, wrapper-only (no submodule edit), install + runtime (211c7d4). Verified ldd + headless render on 26.04. BLK-008, LRN-038.
- Fresh-install audit: `make install` drifted 4 repo files. Root-caused each: graphify installer clobbers `CLAUDE.md` (deletes `# This repo only` header) + injects MANDATORY hooks in `.claude/settings.json`; `claude plugin install` flips `example-skills`→true + adds `plugin-dev` in `settings.json`; example-skills `cp` churns `frontend-design`; `npx skills add` pollutes repo `.agents/` + `skills-lock.json`.
- Fix: reverted current drift (`git checkout` 3 configs); added snapshot+trap-restore guard in `install-plugins.sh` (curated config now install-immutable); de-vendored frontend-design + gitignored `/.agents/` + `/skills-lock.json` (anchored so `agents/` stays tracked). Guard tested drift→restore. Commits 51afe9b / 7de8761. BDR-028, LRN-039.
- gstack chromium fix BACKFIRED: the `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64` pin made `make plugin` HANG at extraction on real 26.04 (download hits 100%, chrome never extracts) — worse than the original 0.5s fast-fail. Reverted (b9c3937). Root: isolated `ldd`+render proof used a sibling already-extracted build (rev 1228), masking the rev-1208 install-path hang. gstack browser stays unavailable on 26.04 (OFF by default); real fix upstream. Corrected BLK-008 + LRN-038.
- gstack browser FIXED on Ubuntu 26.04 (full saga). `git submodule update` would NOT help (latest gstack still pins playwright 1.58.2). Two layers: (1) bumped Playwright→1.61 in submodule (native 26.04 build), (2) GSTACK_CHROMIUM_NO_SANDBOX=1 for AppArmor userns block. Both automated in install-plugins.sh (auto-bump gated on dep support-list grep; env gated on apparmor sysctl) + env to .bashrc. Verified browse drives a real page (200). Discovered user's .bashrc is hand-managed (installer's env lines had been wiped by a restore). Commit 3b8ffb1. BDR-029, LRN-040, BLK-008 resolved.
- Fixed MAGIC_API_KEY false-negative: check grep'd `repo/.env` (symlink), never created because `~/.claude/.env` was made AFTER link.sh on the fresh machine (and `make plugin` skips link.sh). install-plugins.sh now self-heals the symlink + both scripts use a tolerant regex (export/whitespace/non-empty). Immediate fix: `make link`. Sandbox blocked all `.env*` reads → diagnosed via dir listing + synthetic-line regex tests. Commit 1b028cb. LRN-041.
- Removed obsolete `alias claude='claude --effort max'` from install Step 9 — settings.json `effortLevel: xhigh` is the source of truth and the CLI alias would override it (forcing max over xhigh). Step 9 now also strips the alias + old CLAUDE_EFFORT from the profile if present. A dtach `cc` launcher was prototyped then dropped — deferred to a later sprint (per user). Why missed earlier = EVAL-005 (never cross-audited existing Step 9 lines vs settings.json).
- Made install self-sufficient + gstack on-demand per profile (user: "make install doit TOUT installer"). 3 root causes via install log: (A) install.sh ran link.sh BEFORE install-plugins.sh which never re-linked → npx-skill symlinks never created on fresh run; (B) `npx skills add` + gstack `./setup` resolve target relative to CWD → darwin-skill landed in `$REPO/.agents/skills`+`$REPO/.claude/skills`, not `$HOME/.agents/skills` (self-reinforcing once `$REPO/.agents` exists); (C) `profile.sh set full` → 35 "missing — try bash link.sh" (wrong remedy) because gstack OFF + skills never in `skills/`. Fixes: install-plugins.sh runs npx from `$HOME` + cleans parasites + Step 10 final re-link; update-all.sh same npx fix; profile.sh `enable_skill gstack` symlinks on-demand from submodule (gstack OFF default, ON per profile). Verified live: link.sh → darwin OK; `set full` → 0 missing / 35 on-demand; minimal↔full cycle re-parks/restores; git clean. Residual: `$REPO/.claude/skills/darwin-skill` rm blocked by `.claude/` permission guard → auto-cleaned next `make plugin`. BDR-030, LRN-042.

View File

@ -519,3 +519,54 @@ rules:
- **Pattern**: for the load-bearing scenario, run it on the REAL subject in the REAL invocation context (prod path `$HOME/.claude/lib/...`, prod-like PATH), not a stub or a "the code path is correct" argument. A stub proves branch coverage; only the real subject proves the integration. Always add a DISCRIMINATING case — force the failure state; the check must REPORT it, not pass by default (a check that only ever passes proves nothing). - **Pattern**: for the load-bearing scenario, run it on the REAL subject in the REAL invocation context (prod path `$HOME/.claude/lib/...`, prod-like PATH), not a stub or a "the code path is correct" argument. A stub proves branch coverage; only the real subject proves the integration. Always add a DISCRIMINATING case — force the failure state; the check must REPORT it, not pass by default (a check that only ever passes proves nothing).
- **Future application**: any "fixed/works" claim on a critical path → produce the real run output (command + lines + exit code) before capitalizing or shipping; don't summarize ("condition met") in place of the output. Stub/logic = necessary for branch coverage, never sufficient for the integration claim. Most rentable discipline of the whole segment: every refutation came from execution, none from reasoning. - **Future application**: any "fixed/works" claim on a critical path → produce the real run output (command + lines + exit code) before capitalizing or shipping; don't summarize ("condition met") in place of the output. Stub/logic = necessary for branch coverage, never sufficient for the integration claim. Most rentable discipline of the whole segment: every refutation came from execution, none from reasoning.
- **Reference**: design-gate chantier, the `PATH=/usr/bin:/bin` matrix (magic-on → READY/0, magic-off → INCOMPLETE/10), commits 4d19135 / f963318. Linked to [[LRN-036]] (the concrete instance: the PATH cause surfaced only by the real run), [[LRN-034]] (its twin — 034 = don't trust a narrated *claim*; 037 = don't trust a *stub/logic argument* as proof; both demand execution against ground truth). - **Reference**: design-gate chantier, the `PATH=/usr/bin:/bin` matrix (magic-on → READY/0, magic-off → INCOMPLETE/10), commits 4d19135 / f963318. Linked to [[LRN-036]] (the concrete instance: the PATH cause surfaced only by the real run), [[LRN-034]] (its twin — 034 = don't trust a narrated *claim*; 037 = don't trust a *stub/logic argument* as proof; both demand execution against ground truth).
---
## LRN-038 — Playwright host-platform override for distros newer than its hardcoded support list
- **Date**: 2026-06-23
- **Context**: fresh Ubuntu 26.04. gstack `./setup` aborted: "Playwright does not support chromium on ubuntu26.04-x64". Playwright 1.58.2's registry hardcodes `ubuntu20.04/22.04/24.04` only; a newer release → no matching build → hard error. gstack is a pinned submodule (must not edit).
- **Pattern**: `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntuXX.04-<arch>` forces a fallback build. MUST include arch (`x64`/`arm64`) — bare `ubuntu24.04` fails ("does not support … ubuntu24.04"). Set it from the WRAPPER: `export` before the submodule's setup (install-time download) AND persist to the shell profile (runtime launch) — both paths call `getHostPlatform`. No submodule edit. Gate on real OS version (`sort -V` compare) so supported distros are untouched. Test with the LOCAL `./node_modules/.bin/playwright``bunx playwright` pulls the LATEST playwright (different browser revision than the local import), which masks the result.
- **Future application**: any pinned tool that hardcodes an OS allowlist breaks on a fresh OS upgrade. Look for a host-platform override env before bumping/forking the dep. Prove the fallback binary actually runs (`ldd` = no missing libs + a real headless render), not just that the download resolves.
- **Reference**: `install-plugins.sh` `playwright_platform_override()`, commit 211c7d4. Linked to [[BLK-008]].
- **2026-06-23 CORRECTION (override REVERTED, commit b9c3937)**: the override is NOT a usable fix on Ubuntu 26.04. It makes `playwright install` switch to the ubuntu24.04 fallback build, which downloads to 100% then HANGS at extraction (chrome binary never materializes; real machine + sandbox). Turned a 0.5s fast-fail into an install-blocking hang. The isolated proof (`ldd` + headless render) PASSED but used an already-extracted sibling build (rev 1228) — it masked the install-path hang in the real flow (rev 1208). **Sharpened lesson**: proving the binary launches in isolation is NOT proving the install path works — run the ACTUAL install command end-to-end (it must COMPLETE, not just "download resolves" nor "a binary launches"). The override technique stays valid in general, but the EXTRACTION/COMPLETE step is part of "does it work".
---
## LRN-039 — Installers drift hand-curated config → snapshot+trap-restore guard; anchor gitignore for pollution
- **Date**: 2026-06-23
- **Context**: fresh Ubuntu `make install`. 3rd-party installers mutated repo files: graphify rewrote `CLAUDE.md`+hooks (every `graphify install`, Step 7), `claude plugin install` flipped `enabledPlugins`, the example-skills `cp` churned `frontend-design`, `npx skills add` wrote project-scope `.agents/` + `skills-lock.json`.
- **Pattern**: file an installer rewrites but YOU curate → snapshot to a `mktemp -d` at start + `trap restore EXIT` (`cmp -s` before `cp`, revert only real diffs). Preserves pre-existing edits, no git dependency, idempotent, survives early-exit. Pure generated pollution → gitignore. ANCHOR the ignore (`/.agents/`, NOT `.agents/` and NOT `agents`) so it can't catch a legit sibling — our agents live in `agents/` (no dot). Verify with `git check-ignore -v <legit-dir>` that the pattern doesn't over-match.
- **Future application**: audit a fresh install = `git status` right after `make install`; classify every drift as (a) curated → guard, or (b) pollution → anchored gitignore. Never `git checkout` to clean drift (destroys uncommitted work). Prove the guard with an isolated drift→restore test before trusting it.
- **Reference**: `install-plugins.sh` `restore_curated_configs` + EXIT trap, `.gitignore` `/.agents/`, commits 51afe9b / 7de8761. Linked to [[BDR-028]].
---
## LRN-040 — OS newer than a pinned tool supports = TWO distinct layers (version build + security policy)
- **Date**: 2026-06-23
- **Context**: gstack browser on fresh Ubuntu 26.04. Layer 1 = Playwright 1.58.2 ships no browser build for 26.04 → install errors (the host-platform override "fixes" the error but its fallback build HANGS at extraction — dead end, [[BLK-008]]). Layer 2 = even with Playwright 1.61 (native 26.04 build that launches fine in isolation), the real browse path aborts "No usable sandbox" because Ubuntu 24.04+ restricts unprivileged user namespaces via AppArmor.
- **Pattern**: (a) bump the tool PAST the OS-support threshold — don't force the OS to look older (overrides/fallbacks are fragile; prove the install COMPLETES, not just that a binary launches). For a pinned submodule dep: `bun add X@latest` in the submodule, automatable in the installer, idempotent by grepping the dep's support list for the running OS tag before bumping. (b) SEPARATELY handle OS security hardening: Chromium needs `--no-sandbox` where `sysctl kernel.apparmor_restrict_unprivileged_userns=1`; gstack exposes `GSTACK_CHROMIUM_NO_SANDBOX=1` (#1562). Gate persistence on the sysctl, not an OS-version guess.
- **Future application**: "tool X broke after an OS upgrade" → check BOTH (1) does X ship a build / support entry for the new OS (bump if not), and (2) does the new OS's hardening (userns/AppArmor/SELinux) block X at runtime (needs an opt-out flag). Fix one without the other and it still fails. Verify the FULL runtime path (drive a real page) — here the isolated `chromium.launch()` PASSED while the real `browse` path failed on the sandbox.
- **Reference**: `install-plugins.sh`, `.bashrc` `GSTACK_CHROMIUM_NO_SANDBOX=1`, gstack `browse/src/browser-manager.ts` `shouldEnableChromiumSandbox()`, commit 3b8ffb1. Linked to [[BDR-029]], [[BLK-008]], [[LRN-038]].
---
## LRN-041 — A check reading a symlink an EARLIER install step makes → false negative if that step's precondition wasn't met
- **Date**: 2026-06-23
- **Context**: install warned "MAGIC_API_KEY not found in ~/.claude/.env" though the key WAS set there. Root: the check grep'd `$REPO/.env` — a symlink → `~/.claude/.env` ([[BDR-026]]) created by `link.sh`'s `link_env`. On a fresh machine `~/.claude/.env` is created AFTER `link.sh` runs (install first warns "create it"), so the symlink was never made and the key was unreachable via `$REPO/.env`. `make plugin` also never runs `link.sh`. The warning misleadingly blamed `~/.claude/.env`.
- **Pattern**: a check that reads a path PRODUCED by an earlier setup step silently fails when that step's precondition wasn't met yet (target absent → symlink skipped). Fix: read the CANONICAL source and/or self-heal (create the missing symlink when the canonical exists). Env-key greps must tolerate `export `/leading whitespace and require a non-empty value: `^[[:space:]]*(export[[:space:]]+)?KEY=.` — and the message must name the real gap (symlink missing vs key absent), with an actionable hint (`run make link`).
- **Future application**: any "X not found in FILE" where FILE is a symlink/derived path → verify the producing step ran with its precondition, prefer the canonical source, self-heal or give an actionable message. Sandbox note: `.env*` reads were blocked — diagnosed via directory listing + regex tests on SYNTHETIC lines, never reading the secret.
- **Reference**: `install-plugins.sh` magic check (self-heal symlink + tolerant regex), `link.sh` `link_env`, commit 1b028cb. Linked to [[BDR-026]].
---
## LRN-042 — `npx skills add` / gstack `./setup` resolve install target RELATIVE TO CWD — run from repo = wrong dir, breaks `$HOME` symlink assumptions
- **Date**: 2026-06-23
- **Context**: darwin-skill `npx -y skills add` (Step 8.5) + gstack `./setup` (Step 2) both ran with CWD=repo. The `skills` CLI writes to `<cwd>/.agents/skills`; gstack `./setup` likewise wrote per-skill dirs into repo-local `.agents/skills`/`.claude/skills`. So darwin landed in `$REPO/.agents/skills/darwin-skill` + `$REPO/.claude/skills/darwin-skill`, NOT `$HOME/.agents/skills/darwin-skill` where `link.sh` (NPX_EXTERNAL_SKILLS) + `install-plugins.sh` (`_dst`) look → symlink never created, "darwin-skill not installed — run make plugin" though it WAS installed. SELF-REINFORCING: once `$REPO/.agents` exists, every later `skills add` targets it. `find-skills` only worked because an earlier run (before `$REPO/.agents` existed) wrote it to `$HOME`. BDR-028/LRN-039 had already gitignored repo `.agents/`+`skills-lock.json` as "drift noise" — masked the symptom, never saw the install was landing in the WRONG PLACE.
- **Pattern**: a per-user installer that resolves its target relative to CWD (walks up for / creates `.<tool>/` in CWD) silently installs into the project tree when run from a repo that already carries such a dir. Gitignoring the junk hides it but the artifact is unreachable from `$HOME`-based consumers. Fix: run the installer from `$HOME` (`(cd "$HOME" && npx -y skills add …)`) so it targets `$HOME/.agents/skills`; clean up the repo-local copies (gitignored → safe `rm -rf`). Also fix the ordering twin: `link.sh` must re-run AFTER the install steps that produce what it symlinks (install.sh ran link FIRST; install-plugins never re-linked) — added a final `link.sh` step so `make plugin`/`make install` finish self-sufficient.
- **Future application**: before running any `npx <x> add` / `<tool> init` / `setup` that materializes a dotfile dir, set CWD to where the artifact MUST live (usually `$HOME`), don't trust the script's default resolution. When a "X not installed" warning contradicts a "successfully installed" log line → diff the EXPECTED path vs where the log says it wrote (here log line showed `~/Documents/claude/.agents/skills/darwin-skill`). When an installer A produces inputs for symlinker B, B must run after A in the same invocation.
- **Reference**: `install-plugins.sh` Step 8.5 (`cd "$HOME"` + parasite cleanup) + Step 10 (final `link.sh`), `update-all.sh` Step 7.5, log `install-20260623-181416.log:1399`. Extends [[LRN-039]] (BDR-028 — gitignored the symptom) + [[LRN-007]] (toggle-external source-only state) + [[LRN-041]] (install-ordering false-negative). gstack on-demand consumer = [[BDR-030]].

View File

@ -1,5 +1,33 @@
# TODO # TODO
## 2026-06-23 — install self-sufficient + gstack on-demand par profil
Goal: `make install`/`make plugin`/`make update` installent TOUT sans étape
manuelle. Plus le profil-driven gstack on-demand (option 1 user : gstack OFF
par défaut, mais `set <profil>` qui a besoin de gstack l'active pour ce profil).
Root causes trouvées (logs install-20260623-181416.log) :
- Bug A : install.sh lance link.sh (étape 5) AVANT install-plugins.sh (étape 6),
qui n'a jamais re-lancé link.sh → symlinks npx/externes jamais créés au 1er run
(LRN-022 documentait déjà le trou). update-all.sh re-link déjà (L364).
- Bug B : `npx skills add` + gstack ./setup résolvent leur cible relativement au
CWD (repo) → darwin-skill atterrit dans $REPO/.agents/skills + $REPO/.claude/skills
au lieu de $HOME/.agents/skills. Auto-entretenu une fois $REPO/.agents créé.
- Bug C : profile.sh "missing — try: bash link.sh" trompeur (link.sh ne crée pas
les skills gstack) ; full.profile liste 35 skills gstack jamais posés dans skills/.
- [x] Edit 1 — install-plugins.sh Step 8.5 : `npx skills add` depuis $HOME (subshell cd)
- [x] Edit 2 — install-plugins.sh : cleanup parasites $REPO/.agents/skills + $REPO/.claude/skills (gitignorés)
- [x] Edit 3 — install-plugins.sh : Step 10 final re-lance `bash "$REPO/link.sh"` (idempotent)
- [x] Edit 4 — update-all.sh Step 7.5 : `npx skills add` depuis $HOME (même Bug B)
- [x] Edit 5 — lib/profile.sh : GSTACK_SRC var + enable_skill gstack branche on-demand
(symlink skills/<name> → skills-external/gstack/<name>) + message honnête
- [x] Verif — shellcheck/bash -n propres ; migré darwin → $HOME/.agents/skills + `bash link.sh`
(skills/darwin-skill OK) ; `profile.sh set full` → 0 "missing", 35 gstack on-demand ;
cycle minimal↔full OK ; git propre (symlinks gstack gitignorés) ; profil full restauré
- [~] Cleanup machine courante : $REPO/.claude/skills/darwin-skill + .agents/skills VIDE
restent (rm bloqué par garde permission .claude/) → auto-nettoyés au prochain `make plugin`
- [x] Capitalize — LRN-042 (Bug B CWD-relatif) + BDR-030 (gstack on-demand par profil) + journal 2026-06-23
- [ ] Commit (via /commit-change)
## profile.sh — verbe `gstack on|off` ## profile.sh — verbe `gstack on|off`
- [x] Extraire helper `enable_all_gstack()` (boucle de cmd_reset) — anti-duplication - [x] Extraire helper `enable_all_gstack()` (boucle de cmd_reset) — anti-duplication
- [x] Extraire helper `disable_gstack_not_in(prof)` (boucle gstack de cmd_set) — anti-duplication - [x] Extraire helper `disable_gstack_not_in(prof)` (boucle gstack de cmd_set) — anti-duplication

13
.gitignore vendored
View File

@ -119,3 +119,16 @@ desktop.ini
# Profile cache — written by lib/profile.sh, read by hooks/statusline.sh # Profile cache — written by lib/profile.sh, read by hooks/statusline.sh
.active-profile .active-profile
.gstack/ .gstack/
# Frontend Design (Anthropic) — installed/refreshed from the example-skills
# plugin cache by install-plugins.sh (Step 8b) and update-all.sh on every run.
# Not vendored: tracking it produced a repo diff each time Anthropic shipped
# an update. The source is always re-synced, so no offline copy is needed.
skills-external/frontend-design/
# npx `skills add` project-scope artifacts — darwin-skill copies itself into
# the repo's .agents/ and writes skills-lock.json at root. Our own agents live
# in agents/ (no dot) and stay tracked. Anchored to root so only the dotted
# pollution dir is ignored.
/.agents/
/skills-lock.json

View File

@ -127,6 +127,12 @@ else
fail "Node.js not found" fail "Node.js not found"
fi fi
if command -v jq &>/dev/null; then
pass "jq $(jq --version 2>/dev/null | sed 's/^jq-//')"
else
fail "jq not found — statusline & rtk-rewrite hooks require it"
fi
if command -v cargo &>/dev/null; then if command -v cargo &>/dev/null; then
pass "Cargo $(cargo --version | awk '{print $2}')" pass "Cargo $(cargo --version | awk '{print $2}')"
else else

View File

@ -29,6 +29,42 @@ fi
# shellcheck source=lib/detect-plugins.sh # shellcheck source=lib/detect-plugins.sh
source "$REPO/lib/detect-plugins.sh" source "$REPO/lib/detect-plugins.sh"
# ── Guard hand-curated config against installer drift ────────
# graphify's installer (Step 7) rewrites CLAUDE.md + .claude/settings.json
# (clobbers the curated graphify section + injects aggressive MANDATORY
# hooks), and `claude plugin install` (Step 5) flips enable-states in
# settings.json. These 3 files are maintained by hand + commit, never by
# the installer. Snapshot them now and restore on exit so a run leaves them
# exactly as it found them. Pre-existing local edits are preserved; only the
# installer's drift is undone. NOTE: this makes these files install-immutable
# — anything the installer should add to them must be committed by hand.
GUARDED_CONFIGS=("CLAUDE.md" ".claude/settings.json" "settings.json")
CFG_SNAPSHOT="$(mktemp -d 2>/dev/null || true)"
restore_curated_configs() {
[ -n "$CFG_SNAPSHOT" ] || return 0
local f
for f in "${GUARDED_CONFIGS[@]}"; do
if [ -f "$CFG_SNAPSHOT/$f" ] && ! cmp -s "$CFG_SNAPSHOT/$f" "$REPO/$f"; then
cp "$CFG_SNAPSHOT/$f" "$REPO/$f"
info "Reverted installer drift in $f (curated config kept as committed)"
fi
done
rm -rf "$CFG_SNAPSHOT"
}
if [ -n "$CFG_SNAPSHOT" ]; then
for _cfg in "${GUARDED_CONFIGS[@]}"; do
if [ -f "$REPO/$_cfg" ]; then
mkdir -p "$CFG_SNAPSHOT/$(dirname "$_cfg")"
cp "$REPO/$_cfg" "$CFG_SNAPSHOT/$_cfg"
fi
done
trap restore_curated_configs EXIT
else
warn "Config guard disabled (mktemp failed) — CLAUDE.md/settings may drift"
fi
# Read pinned version from plugins.lock.json # Read pinned version from plugins.lock.json
# Usage: pinned_version "rtk" → prints version string or "latest" # Usage: pinned_version "rtk" → prints version string or "latest"
pinned_version() { pinned_version() {
@ -193,6 +229,25 @@ else
fi fi
fi fi
# --- jq (required by active hooks: statusline.sh, rtk-rewrite.sh) ---
if command -v jq &>/dev/null; then
ok "jq $(jq --version 2>/dev/null | sed 's/^jq-//')"
else
info "Installing jq..."
case $OS in
macos) brew install jq ;;
linux-apt) sudo apt-get install -y jq ;;
linux-dnf) sudo dnf install -y jq ;;
linux-pacman) sudo pacman -S --noconfirm jq ;;
*) warn "Cannot auto-install jq on $OS — statusline/rtk hooks need it" ;;
esac
if command -v jq &>/dev/null; then
ok "jq installed"
else
warn "jq install failed — statusline & rtk-rewrite hooks require it"
fi
fi
# --- Claude Code CLI --- # --- Claude Code CLI ---
if command -v claude &>/dev/null; then if command -v claude &>/dev/null; then
ok "Claude Code $(claude --version 2>/dev/null | head -1)" ok "Claude Code $(claude --version 2>/dev/null | head -1)"
@ -203,6 +258,35 @@ fi
echo "" echo ""
# gstack pins Playwright (1.58.x) which only ships browser builds for
# ubuntu<=24.04. On a newer distro the browser install fails ("does not
# support chromium on ubuntuXX.04"). Bump gstack's Playwright to a version
# that supports this OS so ./setup builds the browse binary against it and
# installs a native browser. Fires only when the pinned version genuinely
# lacks support — idempotent across runs. Edits the submodule locally (goes
# dirty); a `git submodule update` resets it and the next install re-applies.
# See BLK-008 / LRN-040.
gstack_bump_playwright_if_unsupported() {
[ -d "$GSTACK_DIR" ] && [ -r /etc/os-release ] || return 0
local ostag pwlib
# shellcheck disable=SC1091
ostag="$(. /etc/os-release 2>/dev/null; [ "${ID:-}" = ubuntu ] && printf 'ubuntu%s' "${VERSION_ID:-}")"
[ -n "$ostag" ] || return 0 # only the known Ubuntu case
pwlib="$GSTACK_DIR/node_modules/playwright-core/lib"
# populate node_modules at the pinned version so we can read its support list
( cd "$GSTACK_DIR" && { bun install --frozen-lockfile >/dev/null 2>&1 || bun install >/dev/null 2>&1; } ) || return 0
if grep -rqs "$ostag" "$pwlib" 2>/dev/null; then
return 0 # pinned Playwright already supports this OS
fi
info "gstack's Playwright lacks $ostag support — bumping to latest (local submodule edit)..."
( cd "$GSTACK_DIR" && bun add playwright@latest >/dev/null 2>&1 )
if grep -rqs "$ostag" "$pwlib" 2>/dev/null; then
ok "gstack Playwright bumped — now supports $ostag (browse binary rebuilt by ./setup)"
else
warn "Playwright bump didn't add $ostag support — gstack browser may stay unavailable"
fi
}
# ============================================================ # ============================================================
# STEP 2 — GSTACK SUBMODULE # STEP 2 — GSTACK SUBMODULE
# ============================================================ # ============================================================
@ -246,6 +330,12 @@ if [ -d "$GSTACK_DIR" ]; then
ok "bun $(bun --version)" ok "bun $(bun --version)"
fi fi
# On a distro newer than gstack's pinned Playwright supports, bump Playwright
# BEFORE ./setup so its frozen-lockfile install picks up the new version and
# the browse binary is rebuilt against it (avoids the "does not support
# chromium" fail). Non-fatal if it can't — gstack is OFF by default.
gstack_bump_playwright_if_unsupported
info "Running GStack setup..." info "Running GStack setup..."
if [ -x "$GSTACK_DIR/setup" ]; then if [ -x "$GSTACK_DIR/setup" ]; then
if (cd "$GSTACK_DIR" && ./setup); then if (cd "$GSTACK_DIR" && ./setup); then
@ -569,6 +659,11 @@ NPX_SKILLS=(
"alchaincyf/find-skills" "alchaincyf/find-skills"
) )
# `skills add` resolves its target (.agents/skills/, skills-lock.json) RELATIVE
# TO THE CWD. Running it from the repo (which carries gitignored .agents/ and
# .claude/ dirs) makes skills land in $REPO/.agents/skills instead of
# $HOME/.agents/skills — where link.sh expects them — and the bug is
# self-reinforcing once $REPO/.agents exists. Always install from $HOME.
if ! command -v npx &>/dev/null; then if ! command -v npx &>/dev/null; then
warn "npx not available — skipping external skills" warn "npx not available — skipping external skills"
else else
@ -579,18 +674,28 @@ else
ok "$_name already installed ($_dst)" ok "$_name already installed ($_dst)"
continue continue
fi fi
info "Installing $_name via: npx -y skills add $_src" info "Installing $_name via: npx -y skills add $_src (from \$HOME)"
if npx -y skills add "$_src" 2>/dev/null; then if (cd "$HOME" && npx -y skills add "$_src" 2>/dev/null); then
if [ -d "$_dst" ]; then if [ -d "$_dst" ]; then
ok "$_name installed" ok "$_name installed"
else else
warn "$_name installed but not at expected path $_dst" warn "$_name installed but not at expected path $_dst"
fi fi
else else
err "$_name install failed — run manually: npx -y skills add $_src" err "$_name install failed — run manually: (cd \"\$HOME\" && npx -y skills add $_src)"
fi fi
done done
fi fi
# Earlier runs (before this CWD fix) scattered skills into the repo's gitignored
# .agents/skills and .claude/skills. They shadow the canonical $HOME copies and
# confuse skill discovery — remove them. Both are gitignored, so this is safe.
for _stray in "$REPO/.agents/skills" "$REPO/.claude/skills"; do
if [ -d "$_stray" ]; then
rm -rf "$_stray"
info "Removed stray repo-local skills dir: $_stray"
fi
done
echo "" echo ""
# ============================================================ # ============================================================
@ -617,8 +722,19 @@ if [ -x "$REPO/lib/toggle-external.sh" ]; then
else else
ok "magic MCP disabled (default)" ok "magic MCP disabled (default)"
fi fi
if [ ! -f "$REPO/.env" ] || ! grep -q '^MAGIC_API_KEY=' "$REPO/.env" 2>/dev/null; then # The key lives in ~/.claude/.env (canonical, BDR-026), reached via the
warn "MAGIC_API_KEY not found in ~/.claude/.env — copy .env.example there and set your key before enabling" # repo/.env symlink that toggle-external.sh sources. Self-heal the common
# fresh-machine case: ~/.claude/.env was created AFTER link.sh ran, so the
# symlink is missing and the key looks absent though it's set.
HOME_ENV="$HOME/.claude/.env"
if [ ! -e "$REPO/.env" ] && [ -f "$HOME_ENV" ]; then
ln -sf "$HOME_ENV" "$REPO/.env" 2>/dev/null \
&& info "Linked repo/.env → ~/.claude/.env (was missing)"
fi
# Tolerate optional `export ` and leading whitespace; require a value.
MAGIC_KEY_RE='^[[:space:]]*(export[[:space:]]+)?MAGIC_API_KEY=.'
if [ ! -f "$REPO/.env" ] || ! grep -qE "$MAGIC_KEY_RE" "$REPO/.env" 2>/dev/null; then
warn "MAGIC_API_KEY not set in ~/.claude/.env — add it (and run 'make link') before enabling magic"
fi fi
else else
warn "lib/toggle-external.sh not found or not executable — skipping" warn "lib/toggle-external.sh not found or not executable — skipping"
@ -642,16 +758,31 @@ fi
[ -z "$SHELL_PROFILE" ] && SHELL_PROFILE="$HOME/.profile" [ -z "$SHELL_PROFILE" ] && SHELL_PROFILE="$HOME/.profile"
CLAUDE_LINES=( CLAUDE_LINES=(
"alias claude='claude --effort max'"
'export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1' 'export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1'
) )
# Clean up old CLAUDE_EFFORT env var if present (replaced by alias) # Ubuntu 24.04+ (and other distros) restrict unprivileged user namespaces via
# AppArmor, which breaks Chromium's sandbox → gstack's browser (/browse, /qa)
# crashes with "No usable sandbox". Persist gstack's documented opt-out, but
# only where the restriction is actually active (precise, distro-agnostic).
if [ "$(sysctl -n kernel.apparmor_restrict_unprivileged_userns 2>/dev/null)" = "1" ]; then
CLAUDE_LINES+=('export GSTACK_CHROMIUM_NO_SANDBOX=1')
fi
# Remove obsolete effort config — effort is now set in settings.json
# ("effortLevel"), which supersedes both the old CLAUDE_EFFORT env var and the
# `claude --effort max` alias (the alias would even override settings.json).
EFFORT_CLEANED=0
if grep -qF 'export CLAUDE_EFFORT=max' "$SHELL_PROFILE" 2>/dev/null; then if grep -qF 'export CLAUDE_EFFORT=max' "$SHELL_PROFILE" 2>/dev/null; then
sed -i '/export CLAUDE_EFFORT=max/d' "$SHELL_PROFILE" sed -i '/export CLAUDE_EFFORT=max/d' "$SHELL_PROFILE"; EFFORT_CLEANED=1
# Also remove orphaned comment lines left by previous installs fi
if grep -qF "alias claude='claude --effort max'" "$SHELL_PROFILE" 2>/dev/null; then
sed -i "\#alias claude='claude --effort max'#d" "$SHELL_PROFILE"; EFFORT_CLEANED=1
fi
if [ "$EFFORT_CLEANED" -eq 1 ]; then
# Remove orphaned comment lines left before the deleted entries
sed -i '/^# Claude Code — added by install-plugins.sh$/{ N; /^\n$/d; }' "$SHELL_PROFILE" sed -i '/^# Claude Code — added by install-plugins.sh$/{ N; /^\n$/d; }' "$SHELL_PROFILE"
info "Removed old CLAUDE_EFFORT=max from $SHELL_PROFILE (replaced by alias)" info "Removed obsolete effort alias/env from $SHELL_PROFILE (effort set in settings.json)"
fi fi
ADDED=0 ADDED=0
@ -674,6 +805,23 @@ if [ "$ADDED" -eq 1 ]; then
fi fi
echo "" echo ""
# ============================================================
# STEP 10 — REFRESH SYMLINKS (final, so this script is self-sufficient)
# ============================================================
# Steps 2/8/8.5 INSTALL skills (gstack submodule, emil/frontend/motion, npx
# darwin/find-skills) that link.sh must symlink into ~/.claude/skills/. Since
# link.sh runs BEFORE this script in install.sh, those symlinks would be missing
# on a fresh run until link.sh is run again by hand. Re-run it here so
# `make plugin` (and `make install`) finish complete — nothing left to do.
echo "── Step 10: Refreshing symlinks (link.sh) ─────────────────"
echo ""
if [ -f "$REPO/link.sh" ]; then
bash "$REPO/link.sh"
else
warn "link.sh not found — run it manually to create skill symlinks"
fi
echo ""
# ============================================================ # ============================================================
# SUMMARY # SUMMARY
# ============================================================ # ============================================================

View File

@ -22,8 +22,23 @@ echo ""
# ── 1. Check prerequisites ── # ── 1. Check prerequisites ──
echo "── Checking prerequisites..." echo "── Checking prerequisites..."
# node + npm drive the Claude Code CLI install below. On a fresh machine
# they may be absent — install the current LTS via nvm instead of aborting.
install_node_via_nvm() {
info "Node.js/npm missing — installing LTS via nvm..."
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
export NVM_DIR="${NVM_DIR:-$HOME/.nvm}"
# shellcheck source=/dev/null
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
nvm install --lts
}
if ! command -v node &>/dev/null || ! command -v npm &>/dev/null; then
install_node_via_nvm
fi
if ! command -v node &>/dev/null; then if ! command -v node &>/dev/null; then
err "Node.js not found. Install it first: https://nodejs.org" err "Node.js install failed — install it manually: https://nodejs.org"
fi fi
NODE_MAJOR=$(node -v | sed 's/v//' | cut -d. -f1) NODE_MAJOR=$(node -v | sed 's/v//' | cut -d. -f1)
@ -33,7 +48,7 @@ fi
ok "Node.js $(node -v)" ok "Node.js $(node -v)"
if ! command -v npm &>/dev/null; then if ! command -v npm &>/dev/null; then
err "npm not found" err "npm not found (expected alongside Node.js)"
fi fi
ok "npm $(npm -v)" ok "npm $(npm -v)"

View File

@ -45,6 +45,7 @@ set -euo pipefail
REPO="$(cd -P "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" REPO="$(cd -P "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
SKILLS_DIR="$REPO/skills" SKILLS_DIR="$REPO/skills"
DISABLED_DIR="$REPO/skills-disabled" DISABLED_DIR="$REPO/skills-disabled"
GSTACK_SRC="$REPO/skills-external/gstack" # gstack submodule — source of truth for gstack skills
PROFILES_DIR="$REPO/lib/profiles" PROFILES_DIR="$REPO/lib/profiles"
TOGGLE_EXTERNAL="$REPO/lib/toggle-external.sh" TOGGLE_EXTERNAL="$REPO/lib/toggle-external.sh"
ACTIVE_CACHE="$REPO/.active-profile" # statusline reads this — keep fast (single-line file, profile name only) ACTIVE_CACHE="$REPO/.active-profile" # statusline reads this — keep fast (single-line file, profile name only)
@ -247,8 +248,19 @@ enable_skill() {
ok "enabled: $skill" ok "enabled: $skill"
elif [ -e "$SKILLS_DIR/$skill" ]; then elif [ -e "$SKILLS_DIR/$skill" ]; then
: # already enabled — silent : # already enabled — silent
elif [ -d "$GSTACK_SRC/$skill" ]; then
# gstack is OFF by default: its skills live only in the submodule,
# never pre-symlinked into skills/. A profile that lists this gstack
# skill activates it on demand by symlinking the submodule skill dir
# in. disable_gstack_not_in() parks it again when an unrelated profile
# is set. The gstack/bin + browse/dist infra it relies on is created
# by link.sh, independent of this.
ln -sf "$GSTACK_SRC/$skill" "$SKILLS_DIR/$skill"
ok "enabled: $skill (gstack on-demand)"
elif [ ! -d "$GSTACK_SRC" ]; then
warn "missing: $skill — gstack submodule absent, run: git submodule update --init"
else else
warn "missing: $skill — try: bash link.sh" warn "missing: $skillnot found in gstack submodule ($GSTACK_SRC)"
fi fi
;; ;;
external|personal) external|personal)

View File

@ -117,7 +117,7 @@ link_env() {
echo " cp \"$REPO/.env.example\" \"$home_env\" && \"\${EDITOR:-nano}\" \"$home_env\"" echo " cp \"$REPO/.env.example\" \"$home_env\" && \"\${EDITOR:-nano}\" \"$home_env\""
return return
fi fi
grep -q '^MAGIC_API_KEY=' "$home_env" 2>/dev/null \ grep -qE '^[[:space:]]*(export[[:space:]]+)?MAGIC_API_KEY=.' "$home_env" 2>/dev/null \
|| echo "⚠️ $home_env has no MAGIC_API_KEY line — magic won't enable until added." || echo "⚠️ $home_env has no MAGIC_API_KEY line — magic won't enable until added."
if [ -L "$repo_env" ]; then if [ -L "$repo_env" ]; then
[ "$(readlink "$repo_env")" = "$home_env" ] && return [ "$(readlink "$repo_env")" = "$home_env" ] && return

View File

@ -1,177 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS

View File

@ -1,42 +0,0 @@
---
name: frontend-design
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
license: Complete terms in LICENSE.txt
---
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
## Design Thinking
Before coding, understand the context and commit to a BOLD aesthetic direction:
- **Purpose**: What problem does this interface solve? Who uses it?
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
- **Constraints**: Technical requirements (framework, performance, accessibility).
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
- Production-grade and functional
- Visually striking and memorable
- Cohesive with a clear aesthetic point-of-view
- Meticulously refined in every detail
## Frontend Aesthetics Guidelines
Focus on:
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.

View File

@ -1 +1 @@
0.8.13 0.8.45

View File

@ -1,7 +1,6 @@
--- ---
name: graphify name: graphify
description: "any input (code, docs, papers, images, videos) to knowledge graph. Use when user asks any question about a codebase, documents, or project content - especially if graphify-out/ exists, treat the question as a /graphify query." description: "Use for any question about a codebase, its architecture, file relationships, or project content — especially when graphify-out/ exists, where the question should be treated as a graphify query first. Turns any input (code, docs, papers, images, videos) into a persistent knowledge graph with god nodes, community detection, and query/path/explain tools."
trigger: /graphify
--- ---
# /graphify # /graphify
@ -27,6 +26,8 @@ Turn any folder of files into a navigable knowledge graph with community detecti
/graphify <path> --graphml # export graph.graphml (Gephi, yEd) /graphify <path> --graphml # export graph.graphml (Gephi, yEd)
/graphify <path> --neo4j # generate graphify-out/cypher.txt for Neo4j /graphify <path> --neo4j # generate graphify-out/cypher.txt for Neo4j
/graphify <path> --neo4j-push bolt://localhost:7687 # push directly to Neo4j /graphify <path> --neo4j-push bolt://localhost:7687 # push directly to Neo4j
/graphify <path> --falkordb # generate graphify-out/cypher.txt for FalkorDB
/graphify <path> --falkordb-push falkordb://localhost:6379 # push directly to FalkorDB
/graphify <path> --mcp # start MCP stdio server for agent access /graphify <path> --mcp # start MCP stdio server for agent access
/graphify <path> --watch # watch folder, auto-rebuild on code changes (no LLM needed) /graphify <path> --watch # watch folder, auto-rebuild on code changes (no LLM needed)
/graphify <path> --wiki # build agent-crawlable wiki (index.md + one article per community) /graphify <path> --wiki # build agent-crawlable wiki (index.md + one article per community)
@ -57,48 +58,9 @@ If the path argument starts with `https://github.com/` or `http://github.com/`,
Follow these steps in order. Do not skip steps. Follow these steps in order. Do not skip steps.
### Step 0 - Clone GitHub repo(s) (only if a GitHub URL was given) ### Step 0 - GitHub repos and multi-path merge (only if a URL or several paths)
**Single repo:** Only when the path is one or more `https://github.com/...` URLs, or several local subfolders to merge. See `references/github-and-merge.md` for the clone, cross-repo merge, and monorepo flow, then continue with the resolved local path. A plain local path skips this step.
```bash
LOCAL_PATH=$(graphify clone <github-url> [--branch <branch>])
# Use LOCAL_PATH as the target for all subsequent steps
```
**Multiple repos (cross-repo graph):**
```bash
# Clone each repo, run the full pipeline on each, then merge
graphify clone <url1> # → ~/.graphify/repos/<owner1>/<repo1>
graphify clone <url2> # → ~/.graphify/repos/<owner2>/<repo2>
# Run /graphify on each local path to produce their graph.json files
# Then merge:
graphify merge-graphs \
~/.graphify/repos/<owner1>/<repo1>/graphify-out/graph.json \
~/.graphify/repos/<owner2>/<repo2>/graphify-out/graph.json \
--out graphify-out/cross-repo-graph.json
```
Graphify clones into `~/.graphify/repos/<owner>/<repo>` and reuses existing clones on repeat runs. Each node in the merged graph carries a `repo` attribute so you can filter by origin.
**Multiple local subfolders (monorepo or multi-service layout):**
The skill pipeline writes all intermediate and final outputs to `graphify-out/` in the current working directory. Running the skill on each subfolder separately will clobber the same output dir. Instead, use the CLI directly for each subfolder — it places `graphify-out/` *inside* the scanned path:
```bash
graphify extract ./core/ # → ./core/graphify-out/graph.json
graphify extract ./service/ # → ./service/graphify-out/graph.json
graphify extract ./platform/ # → ./platform/graphify-out/graph.json
# Add --backend gemini|kimi|openai|deepseek|claude-cli depending on which API key you have set
# Then merge at the project root:
graphify merge-graphs \
./core/graphify-out/graph.json \
./service/graphify-out/graph.json \
./platform/graphify-out/graph.json \
--out graphify-out/graph.json
```
Once `graphify-out/graph.json` exists, the fast path above takes over: any codebase question runs `graphify query` directly on the merged graph — no re-extraction, no size gate.
### Step 1 - Ensure graphify is installed ### Step 1 - Ensure graphify is installed
@ -179,50 +141,9 @@ Then act on it:
- Otherwise rank by count, show the top 5 with file counts, then ask which subfolder to run on. Wait for the user's answer before proceeding. - Otherwise rank by count, show the top 5 with file counts, then ask which subfolder to run on. Wait for the user's answer before proceeding.
- Otherwise: proceed directly to Step 2.5 if video files were detected, or Step 3 if not. - Otherwise: proceed directly to Step 2.5 if video files were detected, or Step 3 if not.
### Step 2.5 - Transcribe video / audio files (only if video files detected) ### Step 2.5 - Video and audio (only if video files detected)
Skip this step entirely if `detect` returned zero `video` files. Skip this step entirely if `detect` returned zero `video` files. When the corpus has video or audio, see `references/transcribe.md` to transcribe them to text first, then treat the transcripts as doc files in Step 3.
Video and audio files cannot be read directly. Transcribe them to text first, then treat the transcripts as doc files in Step 3.
**Strategy:** Read the god nodes from `graphify-out/.graphify_detect.json` (or the analysis file if it exists from a previous run). You are already a language model — write a one-sentence domain hint yourself from those labels. Then pass it to Whisper as the initial prompt. No separate API call needed.
**However**, if the corpus has *only* video files and no other docs/code, use the generic fallback prompt: `"Use proper punctuation and paragraph breaks."`
**Step 1 - Write the Whisper prompt yourself.**
Read the top god node labels from detect output or analysis, then compose a short domain hint sentence, for example:
- Labels: `transformer, attention, encoder, decoder``"Machine learning research on transformer architectures and attention mechanisms. Use proper punctuation and paragraph breaks."`
- Labels: `kubernetes, deployment, pod, helm``"DevOps discussion about Kubernetes deployments and Helm charts. Use proper punctuation and paragraph breaks."`
Set it as `WHISPER_PROMPT` to use in the next command.
**Step 2 - Transcribe:**
```bash
GRAPHIFY_WHISPER_MODEL=base # or whatever --whisper-model the user passed
$(cat graphify-out/.graphify_python) -c "
import json, os
from pathlib import Path
from graphify.transcribe import transcribe_all
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
video_files = detect.get('files', {}).get('video', [])
prompt = os.environ.get('GRAPHIFY_WHISPER_PROMPT', 'Use proper punctuation and paragraph breaks.')
transcript_paths = transcribe_all(video_files, initial_prompt=prompt)
print(json.dumps(transcript_paths, ensure_ascii=False))
" > graphify-out/.graphify_transcripts.json
```
After transcription:
- Read the transcript paths from `graphify-out/.graphify_transcripts.json`
- Add them to the docs list before dispatching semantic subagents in Step 3B
- Print how many transcripts were created: `Transcribed N video file(s) -> treating as docs`
- If transcription fails for a file, print a warning and continue with the rest
**Whisper model:** Default is `base`. If the user passed `--whisper-model <name>`, set `GRAPHIFY_WHISPER_MODEL=<name>` in the environment before running the command above.
### Step 3 - Extract entities and relationships ### Step 3 - Extract entities and relationships
@ -269,7 +190,15 @@ else:
#### Part B - Semantic extraction (parallel subagents) #### Part B - Semantic extraction (parallel subagents)
**Fast path:** If detection found zero docs, papers, and images (code-only corpus), skip Part B entirely and go straight to Part C. AST handles code - there is nothing for semantic subagents to do. **Fast path:** If detection found zero docs, papers, and images (code-only corpus), skip Part B entirely and go straight to Part C. AST handles code - there is nothing for semantic subagents to do. **First write an empty semantic file** so Part C's merge has its input (it reads `.graphify_semantic.json` unconditionally; without this a code-only run hits `FileNotFoundError`):
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
Path('graphify-out/.graphify_semantic.json').write_text(json.dumps({'nodes':[],'edges':[],'hyperedges':[],'input_tokens':0,'output_tokens':0}), encoding='utf-8')
"
```
**MANDATORY: You MUST use the Agent tool here. Reading files yourself one-by-one is forbidden - it is 5-10x slower. If you do not use the Agent tool you are doing this wrong.** **MANDATORY: You MUST use the Agent tool here. Reading files yourself one-by-one is forbidden - it is 5-10x slower. If you do not use the Agent tool you are doing this wrong.**
@ -290,12 +219,19 @@ from graphify.cache import check_semantic_cache
from pathlib import Path from pathlib import Path
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\")) detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
all_files = [f for files in detect['files'].values() for f in files] # Only content files go to semantic extraction. Code is already covered structurally
# by the AST pass (Part A); flattening every category here makes subagents re-read
# every source file (#1392). Video is transcribed to a document in Step 2.5 first.
all_files = [f for cat in ('document', 'paper', 'image') for f in detect['files'].get(cat, [])]
cached_nodes, cached_edges, cached_hyperedges, uncached = check_semantic_cache(all_files) cached_nodes, cached_edges, cached_hyperedges, uncached = check_semantic_cache(all_files)
# Always (re)write the cache file: write hits, else DELETE any leftover from a prior
# run so Part C never merges a stale .graphify_cached.json (#1392).
if cached_nodes or cached_edges or cached_hyperedges: if cached_nodes or cached_edges or cached_hyperedges:
Path('graphify-out/.graphify_cached.json').write_text(json.dumps({'nodes': cached_nodes, 'edges': cached_edges, 'hyperedges': cached_hyperedges}, ensure_ascii=False), encoding=\"utf-8\") Path('graphify-out/.graphify_cached.json').write_text(json.dumps({'nodes': cached_nodes, 'edges': cached_edges, 'hyperedges': cached_hyperedges}, ensure_ascii=False), encoding=\"utf-8\")
else:
Path('graphify-out/.graphify_cached.json').unlink(missing_ok=True)
Path('graphify-out/.graphify_uncached.txt').write_text('\n'.join(uncached), encoding=\"utf-8\") Path('graphify-out/.graphify_uncached.txt').write_text('\n'.join(uncached), encoding=\"utf-8\")
print(f'Cache: {len(all_files)-len(uncached)} files hit, {len(uncached)} files need extraction') print(f'Cache: {len(all_files)-len(uncached)} files hit, {len(uncached)} files need extraction')
" "
@ -325,76 +261,13 @@ Each subagent receives this exact prompt (substitute FILE_LIST, CHUNK_NUM, TOTAL
CHUNK_PATH must be an **absolute** path — derive it before dispatching: CHUNK_PATH must be an **absolute** path — derive it before dispatching:
```bash ```bash
PROJECT_ROOT=$(cat graphify-out/.graphify_root) PROJECT_ROOT=$(pwd) # cwd — where Part C globs graphify-out/ (NOT .graphify_root/scan dir, #1392)
# Then for chunk N: CHUNK_PATH="${PROJECT_ROOT}/graphify-out/.graphify_chunk_0N.json" # Then for chunk N: CHUNK_PATH="${PROJECT_ROOT}/graphify-out/.graphify_chunk_0N.json"
``` ```
Subagent prompt template: Subagent prompt template:
``` See `references/extraction-spec.md` for the exact subagent prompt (JSON schema, node-ID rules, confidence rubric, frontmatter, hyperedge, and vision rules). Load it only here, only when at least one chunk holds a doc, paper, or image; a pure-code corpus has skipped Part B and never reads it. Pass each subagent that prompt verbatim with FILE_LIST, CHUNK_NUM, TOTAL_CHUNKS, DEEP_MODE, and CHUNK_PATH substituted, and have it write the result to CHUNK_PATH.
You are a graphify extraction subagent. Read the files listed and extract a knowledge graph fragment.
Output ONLY valid JSON matching the schema below - no explanation, no markdown fences, no preamble.
Files (chunk CHUNK_NUM of TOTAL_CHUNKS):
FILE_LIST
Rules:
- EXTRACTED: relationship explicit in source (import, call, citation, "see §3.2")
- INFERRED: reasonable inference (shared data structure, implied dependency)
- AMBIGUOUS: uncertain - flag for review, do not omit
Code files: focus on semantic edges AST cannot find (call relationships, shared data, arch patterns).
Do not re-extract imports - AST already has those.
Doc/paper files: extract named concepts, entities, citations. For rationale (WHY decisions were made, trade-offs, design intent): store as a `rationale` attribute on the relevant concept node — do NOT create a separate rationale node or fragment node. Only create a node for something that is itself a named entity or concept. Use `file_type:"rationale"` for concept-like nodes (ideas, principles, mechanisms, design patterns). `file_type` MUST be one of exactly these six values: `code`, `document`, `paper`, `image`, `rationale`, `concept`. Any other value is invalid and will be rejected.
Code files: when adding `calls` edges, source MUST be the caller (the function/class doing the calling), target MUST be the callee. Never reverse this direction.
Image files: use vision to understand what the image IS - do not just OCR.
UI screenshot: layout patterns, design decisions, key elements, purpose.
Chart: metric, trend/insight, data source.
Tweet/post: claim as node, author, concepts mentioned.
Diagram: components and connections.
Research figure: what it demonstrates, method, result.
Handwritten/whiteboard: ideas and arrows, mark uncertain readings AMBIGUOUS.
DEEP_MODE (if --mode deep was given): be aggressive with INFERRED edges - indirect deps,
shared assumptions, latent couplings. Mark uncertain ones AMBIGUOUS instead of omitting.
Semantic similarity: if two concepts in this chunk solve the same problem or represent the same idea without any structural link (no import, no call, no citation), add a `semantically_similar_to` edge marked INFERRED with a confidence_score reflecting how similar they are (0.6-0.95). Examples:
- Two functions that both validate user input but never call each other
- A class in code and a concept in a paper that describe the same algorithm
- Two error types that handle the same failure mode differently
Only add these when the similarity is genuinely non-obvious and cross-cutting. Do not add them for trivially similar things.
Hyperedges: if 3 or more nodes clearly participate together in a shared concept, flow, or pattern that is not captured by pairwise edges alone, add a hyperedge to a top-level `hyperedges` array. Examples:
- All classes that implement a common protocol or interface
- All functions in an authentication flow (even if they don't all call each other)
- All concepts from a paper section that form one coherent idea
Use sparingly — only when the group relationship adds information beyond the pairwise edges. Maximum 3 hyperedges per chunk.
If a file has YAML frontmatter (--- ... ---), copy source_url, captured_at, author,
contributor onto every node from that file.
confidence_score is REQUIRED on every edge - never omit it, never use 0.5 as a default:
- EXTRACTED edges: confidence_score = 1.0 always
- INFERRED edges: pick exactly ONE value from this set — never 0.5:
0.95 direct structural evidence (shared data structure, named cross-file reference).
0.85 strong inference (clear functional alignment, no direct symbol link).
0.75 reasonable inference (shared problem domain + similar shape, requires interpretation).
0.65 weak inference (thematically related, no shape evidence).
0.55 speculative but plausible (surface-level co-occurrence only).
Models follow discrete rubrics better than continuous ranges; the bimodal
distribution observed in production (>50% at 0.5, >40% at 0.85+) shows the
range guidance is being collapsed to a binary. If no value above fits, mark
the edge AMBIGUOUS rather than picking 0.4 or below.
- AMBIGUOUS edges: 0.1-0.3
Node ID format: lowercase, only `[a-z0-9_]`, no dots or slashes. Format: `{stem}_{entity}` where stem is `{parent_dir}_{filename_without_ext}` (the **immediate** parent directory name + the filename stem, both lowercased with non-alphanumeric chars replaced by `_`) and entity is the symbol name similarly normalized. Only one level of parent is used — not the full path. Examples: `src/auth/session.py` + `ValidateToken``auth_session_validatetoken`; `lib/utils/helpers.py` + `parse_url``utils_helpers_parse_url`; `tests/test_foo.py` + `_helper``tests_test_foo_helper`. Top-level files (no parent dir, e.g. `setup.py`) use just the filename stem: `setup_my_func`. This must match the ID the AST extractor generates — using just the filename (e.g., `session_validatetoken`) or the full path (e.g., `src_auth_session_validatetoken`) will create orphan ghost-duplicate nodes. If you are re-extracting a project that had ghost duplicates under the old format, the user should run `graphify extract --force` to rebuild cleanly. CRITICAL: never append chunk numbers, sequence numbers, or any suffix to an ID (no `_c1`, `_c2`, `_chunk2`, etc.). IDs must be deterministic from the label alone — the same entity must always produce the same ID regardless of which chunk processes it.
Generate the extraction JSON matching this schema exactly:
{"nodes":[{"id":"session_validatetoken","label":"Human Readable Name","file_type":"code|document|paper|image|rationale|concept","source_file":"relative/path","source_location":null,"source_url":null,"captured_at":null,"author":null,"contributor":null}],"edges":[{"source":"node_id","target":"node_id","relation":"calls|implements|references|cites|conceptually_related_to|shares_data_with|semantically_similar_to|rationale_for","confidence":"EXTRACTED|INFERRED|AMBIGUOUS","confidence_score":1.0,"source_file":"relative/path","source_location":null,"weight":1.0}],"hyperedges":[{"id":"snake_case_id","label":"Human Readable Label","nodes":["node_id1","node_id2","node_id3"],"relation":"participate_in|implement|form","confidence":"EXTRACTED|INFERRED","confidence_score":0.75,"source_file":"relative/path"}],"input_tokens":0,"output_tokens":0}
Then write the JSON to disk using the Write tool at this exact absolute path (no relative paths — Write resolves relative paths against an undefined cwd and the file will be silently lost):
CHUNK_PATH
```
**Step B3 - Collect, cache, and merge** **Step B3 - Collect, cache, and merge**
@ -511,7 +384,7 @@ print(f'Merged: {total} nodes, {edges} edges ({len(ast[\"nodes\"])} AST + {len(s
### Step 4 - Build graph, cluster, analyze, generate outputs ### Step 4 - Build graph, cluster, analyze, generate outputs
**Before starting:** note whether `--directed` was given. If so, pass `directed=True` to `build_from_json()` in the code block below. This builds a `DiGraph` that preserves edge direction (source→target) instead of the default undirected `Graph`. **Before starting:** the code blocks below pass `directed=IS_DIRECTED` to `build_from_json()`. Replace `IS_DIRECTED` with `True` if `--directed` was given (builds a `DiGraph` preserving edge direction source→target), otherwise `False` (the default undirected `Graph`). Substitute it the same way you substitute `INPUT_PATH` — do not leave the literal `IS_DIRECTED` in the code.
```bash ```bash
mkdir -p graphify-out mkdir -p graphify-out
@ -527,7 +400,15 @@ from pathlib import Path
extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\")) extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\")) detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
G = build_from_json(extraction) # root= mirrors the --update runbook (#1361): relativize source_file to the same
# base so the full build and incremental --update never drift apart on re-extract.
G = build_from_json(extraction, root='INPUT_PATH', directed=IS_DIRECTED)
# Guard BEFORE any write: an empty extraction must not clobber a good graph.json /
# GRAPH_REPORT.md / analysis sidecar. Check immediately after build (#1392).
if G.number_of_nodes() == 0:
print('ERROR: Graph is empty - extraction produced no nodes.')
print('Possible causes: all files were skipped, binary-only corpus, or extraction failed.')
raise SystemExit(1)
communities = cluster(G) communities = cluster(G)
cohesion = score_all(G, communities) cohesion = score_all(G, communities)
tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)} tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}
@ -537,10 +418,17 @@ labels = {cid: 'Community ' + str(cid) for cid in communities}
# Placeholder questions - regenerated with real labels in Step 5 # Placeholder questions - regenerated with real labels in Step 5
questions = suggest_questions(G, communities, labels) questions = suggest_questions(G, communities, labels)
# Export FIRST and honor the #479 shrink-guard: to_json returns False (writing
# nothing) when the new graph is smaller than the existing graph.json. Only write
# GRAPH_REPORT.md + the analysis sidecar when the graph was actually written, so
# they never describe a graph that graph.json doesn't contain (#1392).
wrote = to_json(G, communities, 'graphify-out/graph.json')
if not wrote:
print('ERROR: refused to shrink graphify-out/graph.json (existing graph has more nodes; #479).')
print('If this shrink is intentional (you deleted files), re-run a full build with --force.')
raise SystemExit(1)
report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, 'INPUT_PATH', suggested_questions=questions) report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, 'INPUT_PATH', suggested_questions=questions)
Path('graphify-out/GRAPH_REPORT.md').write_text(report, encoding=\"utf-8\") Path('graphify-out/GRAPH_REPORT.md').write_text(report, encoding=\"utf-8\")
to_json(G, communities, 'graphify-out/graph.json')
analysis = { analysis = {
'communities': {str(k): v for k, v in communities.items()}, 'communities': {str(k): v for k, v in communities.items()},
'cohesion': {str(k): v for k, v in cohesion.items()}, 'cohesion': {str(k): v for k, v in cohesion.items()},
@ -549,10 +437,6 @@ analysis = {
'questions': questions, 'questions': questions,
} }
Path('graphify-out/.graphify_analysis.json').write_text(json.dumps(analysis, indent=2, ensure_ascii=False), encoding=\"utf-8\") Path('graphify-out/.graphify_analysis.json').write_text(json.dumps(analysis, indent=2, ensure_ascii=False), encoding=\"utf-8\")
if G.number_of_nodes() == 0:
print('ERROR: Graph is empty - extraction produced no nodes.')
print('Possible causes: all files were skipped, binary-only corpus, or extraction failed.')
raise SystemExit(1)
print(f'Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges, {len(communities)} communities') print(f'Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges, {len(communities)} communities')
" "
``` ```
@ -580,7 +464,8 @@ extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(en
detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\")) detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
analysis = json.loads(Path('graphify-out/.graphify_analysis.json').read_text(encoding=\"utf-8\")) analysis = json.loads(Path('graphify-out/.graphify_analysis.json').read_text(encoding=\"utf-8\"))
G = build_from_json(extraction) # root= as in Step 4 / the --update runbook (#1361) — same base for node-key parity.
G = build_from_json(extraction, root='INPUT_PATH', directed=IS_DIRECTED)
communities = {int(k): v for k, v in analysis['communities'].items()} communities = {int(k): v for k, v in analysis['communities'].items()}
cohesion = {int(k): v for k, v in analysis['cohesion'].items()} cohesion = {int(k): v for k, v in analysis['cohesion'].items()}
tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)} tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}
@ -621,73 +506,9 @@ graphify export html # auto-aggregates to community view if graph > 5000 nodes
# or: graphify export html --no-viz # or: graphify export html --no-viz
``` ```
### Step 6b - Wiki (only if --wiki flag) ### Steps 6b-8 - Wiki, Neo4j, FalkorDB, SVG, GraphML, MCP, benchmark (only on their flags)
**Only run this step if `--wiki` was explicitly given in the original command.** These run only when their flag is present (`--wiki`, `--neo4j`/`--neo4j-push`, `--falkordb`/`--falkordb-push`, `--svg`, `--graphml`, `--mcp`) or, for the token-reduction benchmark, when `total_words` exceeds 5,000. A default run with no export flags skips all of them. See `references/exports.md` for each one. Run any `--wiki` export before Step 9 cleanup so `.graphify_labels.json` is still available.
Run this before Step 9 (cleanup) so `.graphify_labels.json` is still available.
```bash
graphify export wiki
```
### Step 7 - Neo4j export (only if --neo4j or --neo4j-push flag)
**If `--neo4j`** - generate a Cypher file for manual import:
```bash
graphify export neo4j
```
**If `--neo4j-push <uri>`** - push directly to a running Neo4j instance. Ask the user for credentials if not provided:
```bash
graphify export neo4j --push bolt://localhost:7687 --user neo4j --password PASSWORD
```
Default URI is `bolt://localhost:7687`, default user is `neo4j`. Uses MERGE - safe to re-run without creating duplicates.
### Step 7b - SVG export (only if --svg flag)
```bash
graphify export svg
```
### Step 7c - GraphML export (only if --graphml flag)
```bash
graphify export graphml
```
### Step 7d - MCP server (only if --mcp flag)
```bash
python3 -m graphify.serve graphify-out/graph.json
```
This starts a stdio MCP server that exposes tools: `query_graph`, `get_node`, `get_neighbors`, `get_community`, `god_nodes`, `graph_stats`, `shortest_path`. Add to Claude Desktop or any MCP-compatible agent orchestrator so other agents can query the graph live.
To configure in Claude Desktop, add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"graphify": {
"command": "python3",
"args": ["-m", "graphify.serve", "/absolute/path/to/graphify-out/graph.json"]
}
}
}
```
### Step 8 - Token reduction benchmark (only if total_words > 5000)
If `total_words` from `graphify-out/.graphify_detect.json` is greater than 5,000, run:
```bash
graphify benchmark
```
Print the output directly in chat. If `total_words <= 5000`, skip silently - the graph value is structural clarity, not token compression, for small corpora.
--- ---
@ -704,7 +525,10 @@ from graphify.detect import save_manifest
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\")) detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
# In --update mode, 'all_files' carries the full corpus; 'files' is the changed # In --update mode, 'all_files' carries the full corpus; 'files' is the changed
# subset. Full-rebuild mode populates only 'files', so the fallback handles that. # subset. Full-rebuild mode populates only 'files', so the fallback handles that.
save_manifest(detect.get('all_files') or detect['files']) # root= relativizes the manifest keys to the scan root (same base as the build),
# so the on-disk manifest is portable across clones/machines and a later --update
# matches cached files instead of missing every one (#1417).
save_manifest(detect.get('all_files') or detect['files'], root='INPUT_PATH')
# Update cumulative cost tracker # Update cumulative cost tracker
extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\")) extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
@ -730,10 +554,13 @@ cost_path.write_text(json.dumps(cost, indent=2, ensure_ascii=False), encoding=\"
print(f'This run: {input_tok:,} input tokens, {output_tok:,} output tokens') print(f'This run: {input_tok:,} input tokens, {output_tok:,} output tokens')
print(f'All time: {cost[\"total_input_tokens\"]:,} input, {cost[\"total_output_tokens\"]:,} output ({len(cost[\"runs\"])} runs)') print(f'All time: {cost[\"total_input_tokens\"]:,} input, {cost[\"total_output_tokens\"]:,} output ({len(cost[\"runs\"])} runs)')
" "
rm -f graphify-out/.graphify_detect.json graphify-out/.graphify_extract.json graphify-out/.graphify_ast.json graphify-out/.graphify_semantic.json graphify-out/.graphify_analysis.json graphify-out/.graphify_chunk_*.json rm -f graphify-out/.graphify_detect.json graphify-out/.graphify_extract.json graphify-out/.graphify_ast.json graphify-out/.graphify_semantic.json graphify-out/.graphify_analysis.json
find graphify-out -maxdepth 1 -name '.graphify_chunk_*.json' -delete 2>/dev/null
rm -f graphify-out/.needs_update 2>/dev/null || true rm -f graphify-out/.needs_update 2>/dev/null || true
``` ```
Replace INPUT_PATH with the actual path (same value used in Steps 4-5) so the manifest is relativized to the scan root.
Tell the user (omit the obsidian line unless --obsidian was given): Tell the user (omit the obsidian line unless --obsidian was given):
``` ```
Graph complete. Outputs in PATH_TO_DIR/graphify-out/ Graph complete. Outputs in PATH_TO_DIR/graphify-out/
@ -783,325 +610,33 @@ if [ ! -f graphify-out/.graphify_python ]; then
fi fi
``` ```
## For --update (incremental re-extraction) ## For --update and --cluster-only
Use when you've added or modified files since the last run. Only re-extracts changed files - saves tokens and time. Both are non-default subcommands. `--update` re-extracts only new or changed files; `--cluster-only` reruns clustering on the existing graph. See `references/update.md` for both flows.
```bash
$(cat graphify-out/.graphify_python) -c "
import sys, json
from graphify.detect import detect_incremental, save_manifest
from pathlib import Path
result = detect_incremental(Path('INPUT_PATH'))
new_total = result.get('new_total', 0)
print(json.dumps(result, indent=2, ensure_ascii=False))
Path('graphify-out/.graphify_incremental.json').write_text(json.dumps(result, ensure_ascii=False), encoding=\"utf-8\")
deleted = list(result.get('deleted_files', []))
if new_total == 0 and not deleted:
print('No files changed since last run. Nothing to update.')
raise SystemExit(0)
if deleted:
print(f'{len(deleted)} deleted file(s) to prune.')
if new_total > 0:
print(f'{new_total} new/changed file(s) to re-extract.')
"
```
Then populate `.graphify_detect.json` so Steps 3A6 (which read it unconditionally) see the right state for an incremental run. `files` carries the changed subset (drives Step 3A AST + Step 3B0 cache check on only what changed); `all_files` carries the full corpus for any step that needs corpus-wide context:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
r = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
Path('graphify-out/.graphify_detect.json').write_text(json.dumps({
'files': r.get('new_files', {}),
'all_files': r.get('files', {}),
'total_files': r.get('new_total', 0),
'total_words': r.get('total_words', 0),
'skipped_sensitive': r.get('skipped_sensitive', []),
'needs_graph': True,
}, ensure_ascii=False), encoding=\"utf-8\")
"
```
If new files exist, first check whether all changed files are code files:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
result = json.loads(open('graphify-out/.graphify_incremental.json', encoding='utf-8').read()) if Path('graphify-out/.graphify_incremental.json').exists() else {}
code_exts = {'.py','.ts','.js','.go','.rs','.java','.cpp','.c','.rb','.swift','.kt','.cs','.scala','.php','.cc','.cxx','.hpp','.h','.kts','.lua','.toc','.f','.F','.f90','.F90','.f95','.F95','.f03','.F03','.f08','.F08'}
new_files = result.get('new_files', {})
all_changed = [f for files in new_files.values() for f in files]
code_only = all(Path(f).suffix.lower() in code_exts for f in all_changed)
print('code_only:', code_only)
"
```
If `code_only` is True: print `[graphify update] Code-only changes detected - skipping semantic extraction (no LLM needed)`, run only Step 3A (AST) on the changed files, skip Step 3B entirely (no subagents), then go straight to merge and Steps 48.
If `code_only` is False (any changed file is a doc/paper/image): run the full Steps 3A3C pipeline as normal.
If no new files exist (only deletions), create an empty extraction so the merge step can prune:
```bash
if [ ! -f graphify-out/.graphify_extract.json ]; then
echo '[graphify update] Only deletions -- creating empty extraction for merge.'
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
Path('graphify-out/.graphify_extract.json').write_text(json.dumps({'nodes':[],'edges':[],'hyperedges':[],'input_tokens':0,'output_tokens':0}), encoding='utf-8')
"
fi
```
Then:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
from graphify.build import build_merge
from graphify.detect import save_manifest
# Load new extraction and incremental state
new_extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
incremental = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
deleted = list(incremental.get('deleted_files', []))
# Use build_merge() — reads graph.json directly without NetworkX round-trip
# so edge direction (calls, implements, imports) is always preserved (#801).
G = build_merge(
[new_extraction],
graph_path='graphify-out/graph.json',
prune_sources=deleted or None,
)
print(f'[graphify update] Merged: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges')
# Write merged result back to .graphify_extract.json so Step 4 sees the full graph
merged_out = {
'nodes': [{'id': n, **d} for n, d in G.nodes(data=True)],
'edges': [
# Explicit source/target last so they win over any stale attrs in d.
{**{k: val for k, val in d.items() if k not in ('_src', '_tgt', 'source', 'target')},
'source': d.get('_src', u), 'target': d.get('_tgt', v)}
for u, v, d in G.edges(data=True)
],
# G.graph["hyperedges"] holds hyperedges from both existing graph.json
# and new_extraction (build_merge combines them). Falling back to
# new_extraction only would silently drop prior-run hyperedges (#801).
'hyperedges': list(G.graph.get('hyperedges', [])),
'input_tokens': new_extraction.get('input_tokens', 0),
'output_tokens': new_extraction.get('output_tokens', 0),
}
Path('graphify-out/.graphify_extract.json').write_text(json.dumps(merged_out, ensure_ascii=False), encoding=\"utf-8\")
print(f'[graphify update] Merged extraction written ({len(merged_out[\"nodes\"])} nodes, {len(merged_out[\"edges\"])} edges)')
# Save manifest so next --update diffs against today's state, not the
# prior run's baseline (prevents ghost-node reports on subsequent updates).
save_manifest(incremental['files'])
print('[graphify update] Manifest saved.')
"
```
Then run Steps 48 on the merged graph as normal.
After Step 4, show the graph diff:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from graphify.analyze import graph_diff
from graphify.build import build_from_json
from networkx.readwrite import json_graph
import networkx as nx
from pathlib import Path
# Load old graph (before update) from backup written before merge
old_data = json.loads(Path('graphify-out/.graphify_old.json').read_text(encoding=\"utf-8\")) if Path('graphify-out/.graphify_old.json').exists() else None
new_extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
G_new = build_from_json(new_extract)
if old_data:
G_old = json_graph.node_link_graph(old_data, edges='links')
diff = graph_diff(G_old, G_new)
print(diff['summary'])
if diff['new_nodes']:
print('New nodes:', ', '.join(n['label'] for n in diff['new_nodes'][:5]))
if diff['new_edges']:
print('New edges:', len(diff['new_edges']))
"
```
Before the merge step, save the old graph: `cp graphify-out/graph.json graphify-out/.graphify_old.json`
Clean up after: `rm -f graphify-out/.graphify_old.json`
---
## For --cluster-only
Skip Steps 13. Re-run clustering on the existing graph:
```bash
graphify cluster-only .
```
Then run Steps 59 as normal (label communities, generate viz, benchmark, clean up, report).
--- ---
## For /graphify query ## For /graphify query
Two traversal modes - choose based on the question: When `graphify-out/graph.json` already exists and the user asks a question about the corpus, answer from the graph rather than rebuilding it:
| Mode | Flag | Best for |
|------|------|----------|
| BFS (default) | _(none)_ | "What is X connected to?" - broad context, nearest neighbors first |
| DFS | `--dfs` | "How does X reach Y?" - trace a specific chain or dependency path |
```bash ```bash
graphify query "QUESTION" graphify query "<question>"
# or: graphify query "QUESTION" --dfs --budget 3000
``` ```
Replace `QUESTION` with the user's actual question. Answer using **only** what the graph output contains. Quote `source_location` when citing a specific fact. If the graph lacks enough information, say so - do not hallucinate edges. Before traversal, expand the question against the graph's own vocabulary so a wording mismatch does not collapse the answer to noise. If the `graphify query` CLI is unavailable, fall back to an inline NetworkX traversal of `graphify-out/graph.json`. Answer using only what the graph output contains, and quote `source_location` when citing a specific fact. For that vocab-expansion step, the BFS/DFS traversal modes, the `--budget` cap, the NetworkX fallback, `save-result` feedback, and the `/graphify path` and `/graphify explain` flows, see `references/query.md`.
After writing the answer, save it back into the graph so it improves future queries:
```bash
$(cat graphify-out/.graphify_python) -m graphify save-result --question "QUESTION" --answer "ANSWER" --type query --nodes NODE1 NODE2
```
Replace `QUESTION` with the question, `ANSWER` with your full answer text, `SOURCE_NODES` with the list of node labels you cited. This closes the feedback loop: the next `--update` will extract this Q&A as a node in the graph.
--- ---
## For /graphify path ## For /graphify add and --watch
Find the shortest path between two named concepts in the graph. Neither is part of the default build. When the user runs `/graphify add <url>` to fetch a URL into the corpus, or passes `--watch` to auto-rebuild on file changes, see `references/add-watch.md`.
```bash
graphify path "NODE_A" "NODE_B"
```
Replace `NODE_A` and `NODE_B` with the actual concept names. Then explain the path in plain language - what each hop means, why it's significant.
After writing the explanation, save it back:
```bash
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Path from NODE_A to NODE_B" --answer "ANSWER" --type path_query --nodes NODE_A NODE_B
```
--- ---
## For /graphify explain ## For the commit hook and native CLAUDE.md integration
Give a plain-language explanation of a single node - everything connected to it. When the user asks to install the post-commit auto-rebuild hook or wire graphify into a project's CLAUDE.md, see `references/hooks.md`.
```bash
graphify explain "NODE_NAME"
```
Replace `NODE_NAME` with the concept the user asked about. Then write a 3-5 sentence explanation of what this node is, what it connects to, and why those connections are significant. Use the source locations as citations.
After writing the explanation, save it back:
```bash
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Explain NODE_NAME" --answer "ANSWER" --type explain --nodes NODE_NAME
```
---
## For /graphify add
Fetch a URL and add it to the corpus, then update the graph.
```bash
$(cat graphify-out/.graphify_python) -c "
import sys
from graphify.ingest import ingest
from pathlib import Path
try:
out = ingest('URL', Path('./raw'), author='AUTHOR', contributor='CONTRIBUTOR')
print(f'Saved to {out}')
except ValueError as e:
print(f'error: {e}', file=sys.stderr)
sys.exit(1)
except RuntimeError as e:
print(f'error: {e}', file=sys.stderr)
sys.exit(1)
"
```
Replace `URL` with the actual URL, `AUTHOR` with the user's name if provided, `CONTRIBUTOR` likewise. If the command exits with an error, tell the user what went wrong - do not silently continue. After a successful save, automatically run the `--update` pipeline on `./raw` to merge the new file into the existing graph.
Supported URL types (auto-detected):
- YouTube / any video URL → audio downloaded via yt-dlp, transcribed to `.txt` on next run (requires `pip install 'graphifyy[video]'`)
- Twitter/X → fetched via oEmbed, saved as `.md` with tweet text and author
- arXiv → abstract + metadata saved as `.md`
- PDF → downloaded as `.pdf`
- Images (.png/.jpg/.webp) → downloaded, Claude vision extracts on next run
- Any webpage → converted to markdown via html2text
---
## For --watch
Start a background watcher that monitors a folder and auto-updates the graph when files change.
```bash
python3 -m graphify.watch INPUT_PATH --debounce 3
```
Replace INPUT_PATH with the folder to watch. Behavior depends on what changed:
- **Code files only (.py, .ts, .go, etc.):** re-runs AST extraction + rebuild + cluster immediately, no LLM needed. `graph.json` and `GRAPH_REPORT.md` are updated automatically.
- **Docs, papers, or images:** writes a `graphify-out/needs_update` flag and prints a notification to run `/graphify --update` (LLM semantic re-extraction required).
Debounce (default 3s): waits until file activity stops before triggering, so a wave of parallel agent writes doesn't trigger a rebuild per file.
Press Ctrl+C to stop.
For agentic workflows: run `--watch` in a background terminal. Code changes from agent waves are picked up automatically between waves. If agents are also writing docs or notes, you'll need a manual `/graphify --update` after those waves.
---
## For git commit hook
Install a post-commit hook that auto-rebuilds the graph after every commit. No background process needed - triggers once per commit, works with any editor.
```bash
graphify hook install # install
graphify hook uninstall # remove
graphify hook status # check
```
After every `git commit`, the hook detects which code files changed (via `git diff HEAD~1`), re-runs AST extraction on those files, and rebuilds `graph.json` and `GRAPH_REPORT.md`. Doc/image changes are ignored by the hook - run `/graphify --update` manually for those.
If a post-commit hook already exists, graphify appends to it rather than replacing it.
---
## For native CLAUDE.md integration
Run once per project to make graphify always-on in Claude Code sessions:
```bash
graphify claude install
```
This writes a `## graphify` section to the local `CLAUDE.md` that instructs Claude to check the graph before answering codebase questions and rebuild it after code changes. No manual `/graphify` needed in future sessions.
```bash
graphify claude uninstall # remove the section
```
--- ---

View File

@ -0,0 +1,56 @@
# graphify reference: add a URL and watch a folder
Load this when the user ran `/graphify add <url>` or passed `--watch`. Neither is part of the default build.
## For /graphify add
Fetch a URL and add it to the corpus, then update the graph.
```bash
$(cat graphify-out/.graphify_python) -c "
import sys
from graphify.ingest import ingest
from pathlib import Path
try:
out = ingest('URL', Path('./raw'), author='AUTHOR', contributor='CONTRIBUTOR')
print(f'Saved to {out}')
except ValueError as e:
print(f'error: {e}', file=sys.stderr)
sys.exit(1)
except RuntimeError as e:
print(f'error: {e}', file=sys.stderr)
sys.exit(1)
"
```
Replace `URL` with the actual URL, `AUTHOR` with the user's name if provided, `CONTRIBUTOR` likewise. If the command exits with an error, tell the user what went wrong - do not silently continue. After a successful save, automatically run the `--update` pipeline on `./raw` to merge the new file into the existing graph.
Supported URL types (auto-detected):
- YouTube / any video URL → audio downloaded via yt-dlp, transcribed to `.txt` on next run (requires `pip install 'graphifyy[video]'`)
- Twitter/X → fetched via oEmbed, saved as `.md` with tweet text and author
- arXiv → abstract + metadata saved as `.md`
- PDF → downloaded as `.pdf`
- Images (.png/.jpg/.webp) → downloaded, Claude vision extracts on next run
- Any webpage → converted to markdown via html2text
---
## For --watch
Start a background watcher that monitors a folder and auto-updates the graph when files change.
```bash
$(cat graphify-out/.graphify_python) -m graphify.watch INPUT_PATH --debounce 3
```
Replace INPUT_PATH with the folder to watch. Behavior depends on what changed:
- **Code files only (.py, .ts, .go, etc.):** re-runs AST extraction + rebuild + cluster immediately, no LLM needed. `graph.json` and `GRAPH_REPORT.md` are updated automatically.
- **Docs, papers, or images:** writes a `graphify-out/needs_update` flag and prints a notification to run `/graphify --update` (LLM semantic re-extraction required).
Debounce (default 3s): waits until file activity stops before triggering, so a wave of parallel agent writes doesn't trigger a rebuild per file.
Press Ctrl+C to stop.
For agentic workflows: run `--watch` in a background terminal. Code changes from agent waves are picked up automatically between waves. If agents are also writing docs or notes, you'll need a manual `/graphify --update` after those waves.

View File

@ -0,0 +1,87 @@
# graphify reference: extra exports and benchmark
Load this when the user passed one of the export flags (`--wiki`, `--neo4j`, `--neo4j-push`, `--falkordb`, `--falkordb-push`, `--svg`, `--graphml`, `--mcp`), or when the corpus is large enough for the token-reduction benchmark. Each step runs only for its own flag.
### Step 6b - Wiki (only if --wiki flag)
**Only run this step if `--wiki` was explicitly given in the original command.**
Run this before Step 9 (cleanup) so `.graphify_labels.json` is still available.
```bash
graphify export wiki
```
### Step 7 - Neo4j export (only if --neo4j or --neo4j-push flag)
**If `--neo4j`** - generate a Cypher file for manual import:
```bash
graphify export neo4j
```
**If `--neo4j-push <uri>`** - push directly to a running Neo4j instance. Ask the user for credentials if not provided:
```bash
graphify export neo4j --push bolt://localhost:7687 --user neo4j --password PASSWORD
```
Default URI is `bolt://localhost:7687`, default user is `neo4j`. Uses MERGE - safe to re-run without creating duplicates.
### Step 7a - FalkorDB export (only if --falkordb or --falkordb-push flag)
**If `--falkordb`** - generate a Cypher file. The statements are OpenCypher, but FalkorDB's `GRAPH.QUERY` runs one statement at a time (no bulk script import like Neo4j's `cypher-shell`), so prefer `--falkordb-push` to load a graph. Use this only when you want the portable `cypher.txt` artifact:
```bash
graphify export falkordb
```
**If `--falkordb-push <uri>`** - push directly to a running FalkorDB instance. Credentials are optional; ask the user only if the instance requires auth:
```bash
graphify export falkordb --push falkordb://localhost:6379
```
Default URI is `falkordb://localhost:6379` (the scheme is informational - `redis://` or a bare `host:port` work too), auth is optional, and the target graph defaults to `graphify`. Uses MERGE - safe to re-run without creating duplicates.
### Step 7b - SVG export (only if --svg flag)
```bash
graphify export svg
```
### Step 7c - GraphML export (only if --graphml flag)
```bash
graphify export graphml
```
### Step 7d - MCP server (only if --mcp flag)
```bash
$(cat graphify-out/.graphify_python) -m graphify.serve graphify-out/graph.json
```
This starts a stdio MCP server that exposes tools: `query_graph`, `get_node`, `get_neighbors`, `get_community`, `god_nodes`, `graph_stats`, `shortest_path`. Add to Claude Desktop or any MCP-compatible agent orchestrator so other agents can query the graph live.
To configure in Claude Desktop, add to `claude_desktop_config.json`. Claude Desktop can't run `$(...)`, and under `uv tool install` the system `python3` can't import graphify — so set `command` to the **absolute interpreter path** printed by `cat graphify-out/.graphify_python`:
```json
{
"mcpServers": {
"graphify": {
"command": "<absolute path from: cat graphify-out/.graphify_python>",
"args": ["-m", "graphify.serve", "/absolute/path/to/graphify-out/graph.json"]
}
}
}
```
### Step 8 - Token reduction benchmark (only if total_words > 5000)
If `total_words` from `graphify-out/.graphify_detect.json` is greater than 5,000, run:
```bash
graphify benchmark
```
Print the output directly in chat. If `total_words <= 5000`, skip silently - the graph value is structural clarity, not token compression, for small corpora.

View File

@ -0,0 +1,70 @@
# graphify reference: extraction subagent prompt
Load this in Step 3 Part B when the corpus has at least one doc, paper, or image chunk. A pure-code corpus skips Part B and never reads this file. Each semantic subagent receives the prompt below verbatim (substitute FILE_LIST, CHUNK_NUM, TOTAL_CHUNKS, DEEP_MODE, and CHUNK_PATH).
```
You are a graphify extraction subagent. Read the files listed and extract a knowledge graph fragment.
Output ONLY valid JSON matching the schema below - no explanation, no markdown fences, no preamble.
Files (chunk CHUNK_NUM of TOTAL_CHUNKS):
FILE_LIST
Rules:
- EXTRACTED: relationship explicit in source (import, call, citation, "see §3.2")
- INFERRED: reasonable inference (shared data structure, implied dependency)
- AMBIGUOUS: uncertain - flag for review, do not omit
Code files: focus on semantic edges AST cannot find (call relationships, shared data, arch patterns).
Do not re-extract imports - AST already has those.
Doc/paper files: extract named concepts, entities, citations. For rationale (WHY decisions were made, trade-offs, design intent): store as a `rationale` attribute on the relevant concept node — do NOT create a separate rationale node or fragment node. Only create a node for something that is itself a named entity or concept. Use `file_type:"rationale"` for concept-like nodes (ideas, principles, mechanisms, design patterns). `file_type` MUST be one of exactly these six values: `code`, `document`, `paper`, `image`, `rationale`, `concept`. Any other value is invalid and will be rejected.
Code files: when adding `calls` edges, source MUST be the caller (the function/class doing the calling), target MUST be the callee. Never reverse this direction. `calls` edges MUST stay within one language: a Python function cannot `calls` a JS/TS/Go/Rust/Java symbol and vice versa — cross-language call edges are phantom artifacts, never emit them.
Image files: use vision to understand what the image IS - do not just OCR.
UI screenshot: layout patterns, design decisions, key elements, purpose.
Chart: metric, trend/insight, data source.
Tweet/post: claim as node, author, concepts mentioned.
Diagram: components and connections.
Research figure: what it demonstrates, method, result.
Handwritten/whiteboard: ideas and arrows, mark uncertain readings AMBIGUOUS.
DEEP_MODE (if --mode deep was given): be aggressive with INFERRED edges - indirect deps,
shared assumptions, latent couplings. Mark uncertain ones AMBIGUOUS instead of omitting.
Semantic similarity: if two concepts in this chunk solve the same problem or represent the same idea without any structural link (no import, no call, no citation), add a `semantically_similar_to` edge marked INFERRED with a confidence_score reflecting how similar they are (0.6-0.95). Examples:
- Two functions that both validate user input but never call each other
- A class in code and a concept in a paper that describe the same algorithm
- Two error types that handle the same failure mode differently
Only add these when the similarity is genuinely non-obvious and cross-cutting. Do not add them for trivially similar things.
Hyperedges: if 3 or more nodes clearly participate together in a shared concept, flow, or pattern that is not captured by pairwise edges alone, add a hyperedge to a top-level `hyperedges` array. Examples:
- All classes that implement a common protocol or interface
- All functions in an authentication flow (even if they don't all call each other)
- All concepts from a paper section that form one coherent idea
Use sparingly — only when the group relationship adds information beyond the pairwise edges. Maximum 3 hyperedges per chunk.
If a file has YAML frontmatter (--- ... ---), copy source_url, captured_at, author,
contributor onto every node from that file.
confidence_score is REQUIRED on every edge - never omit it, never use 0.5 as a default:
- EXTRACTED edges: confidence_score = 1.0 always
- INFERRED edges: pick exactly ONE value from this set — never 0.5:
0.95 direct structural evidence (shared data structure, named cross-file reference).
0.85 strong inference (clear functional alignment, no direct symbol link).
0.75 reasonable inference (shared problem domain + similar shape, requires interpretation).
0.65 weak inference (thematically related, no shape evidence).
0.55 speculative but plausible (surface-level co-occurrence only).
Models follow discrete rubrics better than continuous ranges; the bimodal
distribution observed in production (>50% at 0.5, >40% at 0.85+) shows the
range guidance is being collapsed to a binary. If no value above fits, mark
the edge AMBIGUOUS rather than picking 0.4 or below.
- AMBIGUOUS edges: 0.1-0.3
Node ID format: lowercase, only `[a-z0-9_]`, no dots or slashes. Format: `{stem}_{entity}` where stem is `{parent_dir}_{filename_without_ext}` (the **immediate** parent directory name + the filename stem, both lowercased with non-alphanumeric chars replaced by `_`) and entity is the symbol name similarly normalized. Only one level of parent is used — not the full path. Examples: `src/auth/session.py` + `ValidateToken``auth_session_validatetoken`; `lib/utils/helpers.py` + `parse_url``utils_helpers_parse_url`; `tests/test_foo.py` + `_helper``tests_test_foo_helper`. Top-level files (no parent dir, e.g. `setup.py`) use just the filename stem: `setup_my_func`. This must match the ID the AST extractor generates — using just the filename (e.g., `session_validatetoken`) or the full path (e.g., `src_auth_session_validatetoken`) will create orphan ghost-duplicate nodes. If you are re-extracting a project that had ghost duplicates under the old format, the user should run `graphify extract --force` to rebuild cleanly. CRITICAL: never append chunk numbers, sequence numbers, or any suffix to an ID (no `_c1`, `_c2`, `_chunk2`, etc.). IDs must be deterministic from the label alone — the same entity must always produce the same ID regardless of which chunk processes it.
Generate the extraction JSON matching this schema exactly:
{"nodes":[{"id":"auth_session_validatetoken","label":"Human Readable Name","file_type":"code|document|paper|image|rationale|concept","source_file":"<FILE_LIST path verbatim>","source_location":null,"source_url":null,"captured_at":null,"author":null,"contributor":null}],"edges":[{"source":"node_id","target":"node_id","relation":"calls|implements|references|cites|conceptually_related_to|shares_data_with|semantically_similar_to|rationale_for","confidence":"EXTRACTED|INFERRED|AMBIGUOUS","confidence_score":1.0,"source_file":"<FILE_LIST path verbatim>","source_location":null,"weight":1.0}],"hyperedges":[{"id":"snake_case_id","label":"Human Readable Label","nodes":["node_id1","node_id2","node_id3"],"relation":"participate_in|implement|form","confidence":"EXTRACTED|INFERRED","confidence_score":0.75,"source_file":"<FILE_LIST path verbatim>"}],"input_tokens":0,"output_tokens":0}
source_file RULE (every node, edge, and hyperedge): set source_file to the path of the originating file EXACTLY as it appears in FILE_LIST — verbatim and absolute. Do NOT shorten to a basename, do NOT re-relativize, do NOT strip any directory prefix, and do NOT change separators (the engine canonicalizes separators and relativizes against the build root downstream). Copy the FILE_LIST entry character-for-character. This keeps the full build and incremental --update on the same base, so build_merge's replace-on-re-extract matches the existing node instead of accumulating a duplicate.
Then write the JSON to disk using the Write tool at this exact absolute path (no relative paths — Write resolves relative paths against an undefined cwd and the file will be silently lost):
CHUNK_PATH
```

View File

@ -0,0 +1,46 @@
# graphify reference: GitHub clone and cross-repo merge
Load this when the user passed one or more `https://github.com/...` URLs, or named several local subfolders to merge into one graph.
### Step 0 - Clone GitHub repo(s) (only if a GitHub URL was given)
**Single repo:**
```bash
LOCAL_PATH=$(graphify clone <github-url> [--branch <branch>])
# Use LOCAL_PATH as the target for all subsequent steps
```
**Multiple repos (cross-repo graph):**
```bash
# Clone each repo, run the full pipeline on each, then merge
graphify clone <url1> # → ~/.graphify/repos/<owner1>/<repo1>
graphify clone <url2> # → ~/.graphify/repos/<owner2>/<repo2>
# Run /graphify on each local path to produce their graph.json files
# Then merge:
graphify merge-graphs \
~/.graphify/repos/<owner1>/<repo1>/graphify-out/graph.json \
~/.graphify/repos/<owner2>/<repo2>/graphify-out/graph.json \
--out graphify-out/cross-repo-graph.json
```
Graphify clones into `~/.graphify/repos/<owner>/<repo>` and reuses existing clones on repeat runs. Each node in the merged graph carries a `repo` attribute so you can filter by origin.
**Multiple local subfolders (monorepo or multi-service layout):**
The skill pipeline writes all intermediate and final outputs to `graphify-out/` in the current working directory. Running the skill on each subfolder separately will clobber the same output dir. Instead, use the CLI directly for each subfolder — it places `graphify-out/` *inside* the scanned path:
```bash
graphify extract ./core/ # → ./core/graphify-out/graph.json
graphify extract ./service/ # → ./service/graphify-out/graph.json
graphify extract ./platform/ # → ./platform/graphify-out/graph.json
# Add --backend gemini|kimi|openai|deepseek|claude-cli depending on which API key you have set
# Then merge at the project root:
graphify merge-graphs \
./core/graphify-out/graph.json \
./service/graphify-out/graph.json \
./platform/graphify-out/graph.json \
--out graphify-out/graph.json
```
Once `graphify-out/graph.json` exists, the fast path above takes over: any codebase question runs `graphify query` directly on the merged graph — no re-extraction, no size gate.

View File

@ -0,0 +1,33 @@
# graphify reference: commit hook and native CLAUDE.md integration
Load this when the user asked to install the post-commit hook or wire graphify into a project's CLAUDE.md.
## For git commit hook
Install a post-commit hook that auto-rebuilds the graph after every commit. No background process needed - triggers once per commit, works with any editor.
```bash
graphify hook install # install
graphify hook uninstall # remove
graphify hook status # check
```
After every `git commit`, the hook detects which code files changed (via `git diff HEAD~1`), re-runs AST extraction on those files, and rebuilds `graph.json` and `GRAPH_REPORT.md`. Doc/image changes are ignored by the hook - run `/graphify --update` manually for those.
If a post-commit hook already exists, graphify appends to it rather than replacing it.
---
## For native CLAUDE.md integration
Run once per project to make graphify always-on in Claude Code sessions:
```bash
graphify claude install
```
This writes a `## graphify` section to the local `CLAUDE.md` that instructs Claude to check the graph before answering codebase questions and rebuild it after code changes. No manual `/graphify` needed in future sessions.
```bash
graphify claude uninstall # remove the section
```

View File

@ -0,0 +1,303 @@
# graphify reference: query, path, explain
Load this when the user asks a question against an existing graph, or runs `/graphify path` or `/graphify explain`. The core's query stub points here for the full traversal flow. These flows use the `graphify query` CLI when it is available and fall back to an inline NetworkX traversal otherwise.
Two traversal modes - choose based on the question:
| Mode | Flag | Best for |
|------|------|----------|
| BFS (default) | _(none)_ | "What is X connected to?" - broad context, nearest neighbors first |
| DFS | `--dfs` | "How does X reach Y?" - trace a specific chain or dependency path |
First check the graph exists:
```bash
$(cat graphify-out/.graphify_python) -c "
from pathlib import Path
if not Path('graphify-out/graph.json').exists():
print('ERROR: No graph found. Run /graphify <path> first to build the graph.')
raise SystemExit(1)
"
```
If it fails, stop and tell the user to run `/graphify <path>` first.
### Step 0 — Constrained query expansion (REQUIRED before traversal)
graphify's `query` CLI matches nodes via case-folded substring + IDF — there is **no stemming, no synonyms, no cross-language match** inside the binary, and the inline fallback below matches the same way. If the user's question uses different language or different domain vocabulary than the graph's labels (user says "обработчик" / graph says "handler"; user says "authentication" / graph says "Guardian"), the literal matcher returns 0 hits and the answer collapses to noise.
Fix this **without inventing tokens** by expanding the query against the actual graph vocabulary first:
1. Extract the token vocabulary from node labels:
```bash
$(cat graphify-out/.graphify_python) -c "
import json, re
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
vocab = set()
for n in data['nodes']:
for c in re.findall(r'[^\W\d_]+', n.get('label','') or '', re.UNICODE):
parts = re.findall(r'[A-Z]+(?=[A-Z][a-z])|[A-Z]?[a-z]+|[A-Z]+', c) or [c]
for p in parts:
t = p.lower()
if 3 <= len(t) <= 30:
vocab.add(t)
Path('graphify-out/.vocab.txt').write_text('\n'.join(sorted(vocab)))
print(f'vocab: {len(vocab)} tokens')
"
```
2. Read `graphify-out/.vocab.txt`. Then for the user's question, select **up to 12 tokens from this exact list** that semantically match the query intent. Hard constraints:
- You MUST pick only tokens present in the vocabulary file. Do NOT invent tokens.
- If a query concept has no plausible token in the vocab, skip it — do not substitute a near-synonym from training memory.
- If **no** vocab tokens match the query at all, output an empty list and tell the user the corpus has no relevant vocabulary for this question. Do not fabricate a search.
- Translate cross-language: Russian "аутентификация" → look for `auth`, `credential`, `token`, `security` IFF present in vocab.
- Morphology: "handlers" maps to `handler` IFF present; "todos" maps to `todo` IFF present.
3. Print the selection explicitly to the user before running the query, so the expansion is auditable:
```
Query expanded to (from graph vocab, N tokens): [token1, token2, ...]
```
If the list is empty, say so plainly and stop — do not proceed to traversal.
### Step 1 — Traversal
Build the **expanded query string** by joining the selected tokens with spaces. Use this string as `QUESTION` below — NOT the original user question. (The original question is preserved only for `save-result` at the end.)
Prefer the CLI when it is installed:
```bash
graphify query "QUESTION"
# or: graphify query "QUESTION" --dfs --budget 3000
```
If the CLI is unavailable, load `graphify-out/graph.json` and run the traversal inline:
1. Find the 1-3 nodes whose label best matches the expanded tokens.
2. Run the appropriate traversal from each starting node.
3. Read the subgraph - node labels, edge relations, confidence tags, source locations.
4. Answer using **only** what the graph contains. Quote `source_location` when citing a specific fact.
5. If the graph lacks enough information, say so - do not hallucinate edges.
```bash
$(cat graphify-out/.graphify_python) -c "
import sys, json
from networkx.readwrite import json_graph
import networkx as nx
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')
question = 'QUESTION'
mode = 'MODE' # 'bfs' or 'dfs'
terms = [t.lower() for t in question.split() if len(t) >= 3] # match the vocab threshold; keeps api/jwt/ios (#1392)
# Find best-matching start nodes
scored = []
for nid, ndata in G.nodes(data=True):
label = ndata.get('label', '').lower()
score = sum(1 for t in terms if t in label)
if score > 0:
scored.append((score, nid))
scored.sort(reverse=True)
start_nodes = [nid for _, nid in scored[:3]]
if not start_nodes:
print('No matching nodes found for query terms:', terms)
sys.exit(0)
subgraph_nodes = set()
subgraph_edges = []
if mode == 'dfs':
# DFS: follow one path as deep as possible before backtracking.
# Depth-limited to 6 to avoid traversing the whole graph.
visited = set()
stack = [(n, 0) for n in reversed(start_nodes)]
while stack:
node, depth = stack.pop()
if node in visited or depth > 6:
continue
visited.add(node)
subgraph_nodes.add(node)
for neighbor in G.neighbors(node):
if neighbor not in visited:
stack.append((neighbor, depth + 1))
subgraph_edges.append((node, neighbor))
else:
# BFS: explore all neighbors layer by layer up to depth 3.
frontier = set(start_nodes)
subgraph_nodes = set(start_nodes)
for _ in range(3):
next_frontier = set()
for n in frontier:
for neighbor in G.neighbors(n):
if neighbor not in subgraph_nodes:
next_frontier.add(neighbor)
subgraph_edges.append((n, neighbor))
subgraph_nodes.update(next_frontier)
frontier = next_frontier
# Token-budget aware output: rank by relevance, cut at budget (~4 chars/token)
token_budget = BUDGET # default 2000
char_budget = token_budget * 4
# Score each node by term overlap for ranked output
def relevance(nid):
label = G.nodes[nid].get('label', '').lower()
return sum(1 for t in terms if t in label)
ranked_nodes = sorted(subgraph_nodes, key=relevance, reverse=True)
lines = [f'Traversal: {mode.upper()} | Start: {[G.nodes[n].get(\"label\",n) for n in start_nodes]} | {len(subgraph_nodes)} nodes']
for nid in ranked_nodes:
d = G.nodes[nid]
lines.append(f' NODE {d.get(\"label\", nid)} [src={d.get(\"source_file\",\"\")} loc={d.get(\"source_location\",\"\")}]')
for u, v in subgraph_edges:
if u in subgraph_nodes and v in subgraph_nodes:
_raw = G[u][v]; d = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
lines.append(f' EDGE {G.nodes[u].get(\"label\",u)} --{d.get(\"relation\",\"\")} [{d.get(\"confidence\",\"\")}]--> {G.nodes[v].get(\"label\",v)}')
output = '\n'.join(lines)
if len(output) > char_budget:
output = output[:char_budget] + f'\n... (truncated at ~{token_budget} token budget - use --budget N for more)'
print(output)
"
```
Replace `QUESTION` with the **expanded** query string, `MODE` with `bfs` or `dfs`, and `BUDGET` with the token budget (default `2000`, or whatever `--budget N` specifies). Then answer based on the subgraph output above, using only what the graph contains.
After writing the answer, save it back into the graph so it improves future queries. Include the expanded tokens inside the `--answer` text (e.g. `"Expanded from original query via vocab: [tokens]. Then traversed..."`) so the next `--update` extracts the expansion history as a graph node:
```bash
$(cat graphify-out/.graphify_python) -m graphify save-result --question "ORIGINAL_QUESTION" --answer "ANSWER" --type query --nodes NODE1 NODE2
```
Replace `ORIGINAL_QUESTION` with the user's verbatim question, `ANSWER` with your full answer text (containing the expanded-token trace), `NODE1 NODE2` with the list of node labels you cited. This closes the feedback loop: the next `--update` will extract this Q&A as a node in the graph.
---
## For /graphify path
Find the shortest path between two named concepts in the graph. Prefer the CLI when installed:
```bash
graphify path "NODE_A" "NODE_B"
```
If the CLI is unavailable, run it inline:
```bash
$(cat graphify-out/.graphify_python) -c "
import json, sys
import networkx as nx
from networkx.readwrite import json_graph
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')
a_term = 'NODE_A'
b_term = 'NODE_B'
def find_node(term):
term = term.lower()
scored = sorted(
[(sum(1 for w in term.split() if w in G.nodes[n].get('label','').lower()), n)
for n in G.nodes()],
reverse=True
)
return scored[0][1] if scored and scored[0][0] > 0 else None
src = find_node(a_term)
tgt = find_node(b_term)
if not src or not tgt:
print(f'Could not find nodes matching: {a_term!r} or {b_term!r}')
sys.exit(0)
try:
path = nx.shortest_path(G, src, tgt)
print(f'Shortest path ({len(path)-1} hops):')
for i, nid in enumerate(path):
label = G.nodes[nid].get('label', nid)
if i < len(path) - 1:
_raw = G[nid][path[i+1]]; edge = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
rel = edge.get('relation', '')
conf = edge.get('confidence', '')
print(f' {label} --{rel}--> [{conf}]')
else:
print(f' {label}')
except nx.NetworkXNoPath:
print(f'No path found between {a_term!r} and {b_term!r}')
except nx.NodeNotFound as e:
print(f'Node not found: {e}')
"
```
Replace `NODE_A` and `NODE_B` with the actual concept names from the user. Then explain the path in plain language - what each hop means, why it's significant.
After writing the explanation, save it back:
```bash
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Path from NODE_A to NODE_B" --answer "ANSWER" --type path_query --nodes NODE_A NODE_B
```
---
## For /graphify explain
Give a plain-language explanation of a single node - everything connected to it. Prefer the CLI when installed:
```bash
graphify explain "NODE_NAME"
```
If the CLI is unavailable, run it inline:
```bash
$(cat graphify-out/.graphify_python) -c "
import json, sys
import networkx as nx
from networkx.readwrite import json_graph
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')
term = 'NODE_NAME'
term_lower = term.lower()
# Find best matching node
scored = sorted(
[(sum(1 for w in term_lower.split() if w in G.nodes[n].get('label','').lower()), n)
for n in G.nodes()],
reverse=True
)
if not scored or scored[0][0] == 0:
print(f'No node matching {term!r}')
sys.exit(0)
nid = scored[0][1]
data_n = G.nodes[nid]
print(f'NODE: {data_n.get(\"label\", nid)}')
print(f' source: {data_n.get(\"source_file\",\"unknown\")}')
print(f' type: {data_n.get(\"file_type\",\"unknown\")}')
print(f' degree: {G.degree(nid)}')
print()
print('CONNECTIONS:')
for neighbor in G.neighbors(nid):
_raw = G[nid][neighbor]; edge = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
nlabel = G.nodes[neighbor].get('label', neighbor)
rel = edge.get('relation', '')
conf = edge.get('confidence', '')
src_file = G.nodes[neighbor].get('source_file', '')
print(f' --{rel}--> {nlabel} [{conf}] ({src_file})')
"
```
Replace `NODE_NAME` with the concept the user asked about. Then write a 3-5 sentence explanation of what this node is, what it connects to, and why those connections are significant. Use the source locations as citations.
After writing the explanation, save it back:
```bash
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Explain NODE_NAME" --answer "ANSWER" --type explain --nodes NODE_NAME
```

View File

@ -0,0 +1,52 @@
# graphify reference: transcribe video and audio
Load this only when `detect` reported one or more `video` files. A corpus with no video never reads this.
### Step 2.5 - Transcribe video / audio files (only if video files detected)
Skip this step entirely if `detect` returned zero `video` files.
Video and audio files cannot be read directly. Transcribe them to text first, then treat the transcripts as doc files in Step 3.
**Strategy:** Read the god nodes from `graphify-out/.graphify_detect.json` (or the analysis file if it exists from a previous run). You are already a language model — write a one-sentence domain hint yourself from those labels. Then pass it to Whisper as the initial prompt. No separate API call needed.
**However**, if the corpus has *only* video files and no other docs/code, use the generic fallback prompt: `"Use proper punctuation and paragraph breaks."`
**Step 1 - Write the Whisper prompt yourself.**
Read the top god node labels from detect output or analysis, then compose a short domain hint sentence, for example:
- Labels: `transformer, attention, encoder, decoder``"Machine learning research on transformer architectures and attention mechanisms. Use proper punctuation and paragraph breaks."`
- Labels: `kubernetes, deployment, pod, helm``"DevOps discussion about Kubernetes deployments and Helm charts. Use proper punctuation and paragraph breaks."`
**Export** it as `GRAPHIFY_WHISPER_PROMPT` (the exact name the transcriber reads — and it must be `export`ed so the child Python process sees it) for the next command.
**Step 2 - Transcribe:**
```bash
export GRAPHIFY_WHISPER_MODEL=base # or whatever --whisper-model the user passed (must be exported)
export GRAPHIFY_WHISPER_PROMPT="<the one-sentence domain hint you composed in Step 1>"
$(cat graphify-out/.graphify_python) -c "
import json, os, sys
from pathlib import Path
from graphify.transcribe import transcribe_all
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
video_files = detect.get('files', {}).get('video', [])
prompt = os.environ.get('GRAPHIFY_WHISPER_PROMPT', 'Use proper punctuation and paragraph breaks.')
transcript_paths = transcribe_all(video_files, initial_prompt=prompt)
# Write the JSON from Python (NOT a shell '>' redirect): transcribe_all/Whisper
# print progress to stdout, which would otherwise corrupt the JSON file (#1392).
Path('graphify-out/.graphify_transcripts.json').write_text(json.dumps(transcript_paths, ensure_ascii=False), encoding=\"utf-8\")
print(f'Transcribed {len(transcript_paths)} file(s)', file=sys.stderr)
"
```
After transcription:
- Read the transcript paths from `graphify-out/.graphify_transcripts.json`
- Add them to the docs list before dispatching semantic subagents in Step 3B
- Print how many transcripts were created: `Transcribed N video file(s) -> treating as docs`
- If transcription fails for a file, print a warning and continue with the rest
**Whisper model:** Default is `base`. If the user passed `--whisper-model <name>`, `export GRAPHIFY_WHISPER_MODEL=<name>` (it must be exported, not just assigned) before running the command above.

View File

@ -0,0 +1,192 @@
# graphify reference: incremental update and cluster-only
Load this only when the user passed `--update` or `--cluster-only`. A first-time full build never reads this file.
## For --update (incremental re-extraction)
Use when you've added or modified files since the last run. Only re-extracts changed files - saves tokens and time.
```bash
$(cat graphify-out/.graphify_python) -c "
import sys, json
from graphify.detect import detect_incremental, save_manifest
from pathlib import Path
result = detect_incremental(Path('INPUT_PATH'))
new_total = result.get('new_total', 0)
print(json.dumps(result, indent=2, ensure_ascii=False))
Path('graphify-out/.graphify_incremental.json').write_text(json.dumps(result, ensure_ascii=False), encoding=\"utf-8\")
deleted = list(result.get('deleted_files', []))
if new_total == 0 and not deleted:
print('No files changed since last run. Nothing to update.')
raise SystemExit(0)
if deleted:
print(f'{len(deleted)} deleted file(s) to prune.')
if new_total > 0:
print(f'{new_total} new/changed file(s) to re-extract.')
"
```
Then populate `.graphify_detect.json` so Steps 3A6 (which read it unconditionally) see the right state for an incremental run. `files` carries the changed subset (drives Step 3A AST + Step 3B0 cache check on only what changed); `all_files` carries the full corpus for any step that needs corpus-wide context:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
r = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
Path('graphify-out/.graphify_detect.json').write_text(json.dumps({
'files': r.get('new_files', {}),
'all_files': r.get('files', {}),
'total_files': r.get('new_total', 0),
'total_words': r.get('total_words', 0),
'skipped_sensitive': r.get('skipped_sensitive', []),
'needs_graph': True,
}, ensure_ascii=False), encoding=\"utf-8\")
"
```
If new files exist, first check whether all changed files are code files:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
result = json.loads(open('graphify-out/.graphify_incremental.json', encoding='utf-8').read()) if Path('graphify-out/.graphify_incremental.json').exists() else {}
code_exts = {'.py','.ts','.js','.go','.rs','.java','.cpp','.c','.rb','.swift','.kt','.cs','.scala','.php','.cc','.cxx','.hpp','.h','.kts','.lua','.toc','.f','.F','.f90','.F90','.f95','.F95','.f03','.F03','.f08','.F08'}
new_files = result.get('new_files', {})
all_changed = [f for files in new_files.values() for f in files]
code_only = all(Path(f).suffix.lower() in code_exts for f in all_changed)
print('code_only:', code_only)
"
```
If `code_only` is True: print `[graphify update] Code-only changes detected - skipping semantic extraction (no LLM needed)`, run only Step 3A (AST) on the changed files, skip Step 3B entirely (no subagents), then go straight to merge and Steps 48.
If `code_only` is False (any changed file is a doc/paper/image/video): **first, if any changed file is in `new_files['video']`, run `references/transcribe.md` (Step 2.5) on those files, then rewrite `.graphify_detect.json` to move the resulting transcript paths into `files['document']` and drop `files['video']`** — otherwise raw `.mp4/.mp3` paths are fed to semantic subagents as unreadable media (#1392). Then run the full Steps 3A3C pipeline as normal.
If no new files exist (only deletions), create an empty extraction so the merge step can prune:
```bash
if [ ! -f graphify-out/.graphify_extract.json ]; then
echo '[graphify update] Only deletions -- creating empty extraction for merge.'
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
Path('graphify-out/.graphify_extract.json').write_text(json.dumps({'nodes':[],'edges':[],'hyperedges':[],'input_tokens':0,'output_tokens':0}), encoding='utf-8')
"
fi
```
Then:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from pathlib import Path
from graphify.build import build_merge
from graphify.detect import save_manifest
# Load new extraction and incremental state
new_extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
incremental = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
deleted = list(incremental.get('deleted_files', []))
# prune_sources is ONLY for genuinely DELETED files. Changed/re-extracted files are
# handled by build_merge's replace-on-re-extract (#1344): every source_file in
# new_chunks is dropped from the base before merge, so old/stale nodes don't survive.
# Do NOT add `changed` here: with root= passed, prune_set relativizes to the same base
# as the freshly merged nodes and would DELETE the re-extracted content (#1178 is moot
# now that replace — not the dedup pass — reconciles changed files).
prune = list(deleted) or None
# Use build_merge() — reads graph.json directly without NetworkX round-trip
# so edge direction (calls, implements, imports) is always preserved (#801).
# Pass root= so prune_sources (absolute paths from detect_incremental) are
# relativized to match the graph's relative source_file values; without it
# nothing is pruned and stale nodes accumulate on every update (#1361).
# directed=IS_DIRECTED: replace IS_DIRECTED with True if --directed was given, else
# False. Without it a --directed --update silently rebuilds undirected and collapses
# reciprocal A<->B edges (#1392).
G = build_merge(
[new_extraction],
graph_path='graphify-out/graph.json',
prune_sources=prune,
root='INPUT_PATH',
directed=IS_DIRECTED,
)
print(f'[graphify update] Merged: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges')
# Write merged result back to .graphify_extract.json so Step 4 sees the full graph
merged_out = {
'nodes': [{'id': n, **d} for n, d in G.nodes(data=True)],
'edges': [
# Explicit source/target last so they win over any stale attrs in d.
{**{k: val for k, val in d.items() if k not in ('_src', '_tgt', 'source', 'target')},
'source': d.get('_src', u), 'target': d.get('_tgt', v)}
for u, v, d in G.edges(data=True)
],
# G.graph["hyperedges"] holds hyperedges from both existing graph.json
# and new_extraction (build_merge combines them). Falling back to
# new_extraction only would silently drop prior-run hyperedges (#801).
'hyperedges': list(G.graph.get('hyperedges', [])),
'input_tokens': new_extraction.get('input_tokens', 0),
'output_tokens': new_extraction.get('output_tokens', 0),
}
Path('graphify-out/.graphify_extract.json').write_text(json.dumps(merged_out, ensure_ascii=False), encoding=\"utf-8\")
print(f'[graphify update] Merged extraction written ({len(merged_out[\"nodes\"])} nodes, {len(merged_out[\"edges\"])} edges)')
# Save manifest so next --update diffs against today's state, not the
# prior run's baseline (prevents ghost-node reports on subsequent updates).
# root= matches the build_merge call above so the manifest keys stay relative to
# the scan root — portable across clones/machines, so --update keeps matching
# cached files instead of missing every one after a move (#1417).
save_manifest(incremental['files'], root='INPUT_PATH')
print('[graphify update] Manifest saved.')
"
```
Then run Steps 48 on the merged graph as normal.
After Step 4, show the graph diff:
```bash
$(cat graphify-out/.graphify_python) -c "
import json
from graphify.analyze import graph_diff
from graphify.build import build_from_json
from networkx.readwrite import json_graph
import networkx as nx
from pathlib import Path
# Load old graph (before update) from backup written before merge
old_data = json.loads(Path('graphify-out/.graphify_old.json').read_text(encoding=\"utf-8\")) if Path('graphify-out/.graphify_old.json').exists() else None
new_extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
G_new = build_from_json(new_extract, directed=IS_DIRECTED)
if old_data:
G_old = json_graph.node_link_graph(old_data, edges='links')
diff = graph_diff(G_old, G_new)
print(diff['summary'])
if diff['new_nodes']:
print('New nodes:', ', '.join(n['label'] for n in diff['new_nodes'][:5]))
if diff['new_edges']:
print('New edges:', len(diff['new_edges']))
"
```
Before the merge step, save the old graph: `cp graphify-out/graph.json graphify-out/.graphify_old.json`
Clean up after: `rm -f graphify-out/.graphify_old.json`
---
## For --cluster-only
Skip Steps 13. Re-run clustering on the existing graph:
```bash
graphify cluster-only .
```
`graphify cluster-only .` is **self-contained**: it re-clusters, names communities, and regenerates `GRAPH_REPORT.md`, `graph.json`, and `graph.html` from the existing graph. **Do not re-run Steps 59** — they read intermediate files (`.graphify_extract.json`, `.graphify_detect.json`, `.graphify_analysis.json`) that a prior build's cleanup (Step 9) already deleted, so they raise `FileNotFoundError` (#1392). When it finishes, present the refreshed `GRAPH_REPORT.md` summary as usual.

View File

@ -287,10 +287,13 @@ if command -v npx &>/dev/null; then
fi fi
# `skills add` is idempotent and pulls latest from the source repo, # `skills add` is idempotent and pulls latest from the source repo,
# which is the closest thing to an update operation the CLI exposes. # which is the closest thing to an update operation the CLI exposes.
if npx -y skills add "$_src" 2>/dev/null; then # Run from $HOME: the CLI resolves .agents/skills/ relative to the CWD, so
# running from the repo would write into $REPO/.agents/skills (gitignored)
# instead of $HOME/.agents/skills where link.sh expects it.
if (cd "$HOME" && npx -y skills add "$_src" 2>/dev/null); then
ok "$_name refreshed from $_src" ok "$_name refreshed from $_src"
else else
warn "$_name refresh failed — run manually: npx -y skills add $_src" warn "$_name refresh failed — run manually: (cd \"\$HOME\" && npx -y skills add $_src)"
fi fi
done done
else else