Compare commits
19 Commits
937dd1d366
...
b03cb0b910
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b03cb0b910 | ||
|
|
0b92935d6d | ||
|
|
29c4c9ea67 | ||
|
|
ed5b54e87e | ||
|
|
6516b85f0f | ||
|
|
d0a3740de5 | ||
|
|
960f0f92ce | ||
|
|
1b028cbc25 | ||
|
|
735b62a002 | ||
|
|
3b8ffb17b1 | ||
|
|
637b8379b1 | ||
|
|
b9c3937cd0 | ||
|
|
cba0672749 | ||
|
|
7de8761836 | ||
|
|
51afe9bd19 | ||
|
|
4e178dc393 | ||
|
|
2194b11329 | ||
|
|
211c7d4594 | ||
|
|
b6cc8b1a86 |
@ -94,3 +94,15 @@ rules:
|
||||
- **Solution applied** (NOT full `./setup` — surgical, no side effects): (1) Linked `spec` only — `mkdir skills/spec` + `ln -snf <abs>/skills-external/gstack/spec/SKILL.md skills/spec/SKILL.md`, matching gstack setup:440-476 (per-skill real dir + SKILL.md symlink, name from frontmatter). (2) Added `spec` to `full.profile` + `web-full.profile` planning sections (must be in active profile `full` else `set full` re-disables it). (3) iOS 5 skills deliberately NOT linked — Linux host, device-farm needs Mac daemon + Tailscale + iOS devices = dead skills + token cost. (4) Completed `.gitignore` gstack allowlist: added all 12 missing (`spec`, 5 `ios-*`, 6 parked `document-generate/landing-report/scrape/setup-gbrain/skillify/sync-gbrain`), removed stale `checkpoint` (BLK-005 rename). Reason: `gstack on` (BDR-018) moves parked skills into `skills/` — any gstack skill missing from allowlist = untracked git noise on enable.
|
||||
- **Verified**: `profile show full`+`web-full` → spec enabled; allowlist drift recheck EMPTY; spec skill now visible to Claude.
|
||||
- **Status**: resolved. iOS = intentional exclusion (re-linkable via gstack `./setup` on a Mac). See [[gstack-gitignore-allowlist-completeness]] (LRN-025).
|
||||
|
||||
## BLK-008 — gstack ./setup fails on Ubuntu 26.04 — Playwright chromium unsupported
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Friction**: fresh Ubuntu 26.04, `make install` / `make plugin` → "Failed to install browsers / ERROR: Playwright does not support chromium on ubuntu26.04-x64" → "GStack ./setup failed". Non-fatal in our wrapper (warn only) but gstack's browser (`/browse`, `/qa`, design screenshots) is silently dead once gstack is enabled.
|
||||
- **Real cause**: Playwright 1.58.2 (pinned in the gstack submodule) registry lists `ubuntu20.04/22.04/24.04` only; 26.04 released later → not in list → `getHostPlatform` errors. Pure OS-newness, not an install bug.
|
||||
- **Solution**: gated `export PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64` (ubuntu >24.04 only) before gstack setup + persisted to `.bashrc` for runtime. Playwright then pulls a Chrome-for-Testing fallback build for ubuntu24.04. Verified on 26.04: `ldd` resolves all libs + real headless render OK.
|
||||
- **Status**: resolved (commit 211c7d4). Residual: exact rev 1208 launch not in-session-tested (sandbox download hung at extraction); proved via sibling rev 1228 same-platform CfT build. Confirm on next real `make plugin`. Proper upstream fix = gstack bumps Playwright to a version that lists ubuntu26.04. See [[LRN-038]].
|
||||
|
||||
- **2026-06-23 UPDATE — Solution REVERTED, status downgraded to UPSTREAM/open** (commit b9c3937): the `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE` solution above does NOT work on 26.04. The fallback build downloads to 100% then HANGS at extraction (chrome binary never appears, no headless-shell download starts; reproduced on real machine + sandbox) → turned a 0.5s fast-fail into an install-blocking hang (user Ctrl+C). Reverted to the fast-fail (non-fatal; gstack OFF by default, browser only for /browse,/qa,screenshots). The earlier "verified ldd + headless render" was an isolated test on a sibling already-extracted build (rev 1228) — it masked the rev-1208 install-path hang. **Real fix = upstream**: gstack bumps Playwright to a version that lists ubuntu26.04. Until then gstack's browser is unavailable on 26.04, install completes cleanly. See [[LRN-038]] correction.
|
||||
|
||||
- **2026-06-23 FINAL — RESOLVED** (commit 3b8ffb1): gstack browser now works on Ubuntu 26.04. Two layers fixed: (1) bumped gstack's pinned Playwright 1.58.2 → 1.61 (`bun add playwright@latest` in the submodule; 1.61 ships a native ubuntu26.04 build — chromium rev 1228), automated in the installer (`gstack_bump_playwright_if_unsupported`, idempotent, OS-gated); (2) `GSTACK_CHROMIUM_NO_SANDBOX=1` to work around the AppArmor userns restriction (`sysctl kernel.apparmor_restrict_unprivileged_userns=1`), persisted to `.bashrc` + installer Step 9 (sysctl-gated). Verified end-to-end: `browse goto https://example.com` → "Navigated (200)". Caveat: the Playwright bump is a local submodule edit, reset by `git submodule update`, re-applied by the next install. See [[BDR-029]], [[LRN-040]].
|
||||
|
||||
@ -451,3 +451,50 @@ rules:
|
||||
- Secret in `repo/.env`, gitignored (status quo) — one `git add -f` or a `.gitignore` slip leaks it; the secret physically sits in the tree.
|
||||
- Scripts read `~/.claude/.env` directly — makes the symlink redundant but rewrites every read path and loses repo-local visibility.
|
||||
- **Reference**: `link.sh` `link_env()`, `.gitignore`, `lib/toggle-external.sh`, `install-plugins.sh`, `.env.example`, commits 131d0bc / f9cc866. Linked to [[BDR-025]] (magic's `MAGIC_API_KEY`, consumed by the gate's required-but-manual class).
|
||||
|
||||
---
|
||||
|
||||
## BDR-027 — Minimal npm-via-nvm bootstrap over centralized prereq lib (reverses the reverted approach)
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Status**: accepted (supersedes the reverted `lib/install-prereqs.sh` centralization, commit 1ddeed1 removed from history)
|
||||
- **Decision**: the only real bootstrap blocker = `npm` absent on fresh machine. `install.sh` now installs current LTS via nvm (`v0.39.7` → `nvm install --lts`) ONLY when node/npm missing (`install_node_via_nvm`). Keep the inline per-tool prereq blocks in `install-plugins.sh` (no shared `ensure_*` lib). Re-add `jq` inline (Step 1) + `doctor.sh` fail-level — `jq` is an active-hook dep that was never installed.
|
||||
- **Why**: a 1-function fallback fixes the actual blocker. Folding 9 prereqs into a 245-line lib was scope-creep for "npm missing"; user reverted it. Inline blocks stay readable + co-located with their step.
|
||||
- **Alternatives rejected**: centralized `lib/install-prereqs.sh` (commit 1ddeed1 — over-engineered for the real blocker, reverted); leave `npm` as a hard `err` (the original bug — aborts before the CLI install).
|
||||
- **Reference**: `install.sh` `install_node_via_nvm`, `install-plugins.sh` Step 1 jq, `doctor.sh`, commits b6cc8b1 / 2194b11. Linked to [[BLK-008]] (the chromium half of the same fresh-Ubuntu-26.04 session).
|
||||
|
||||
---
|
||||
|
||||
## BDR-028 — Hand-curated config is install-immutable (auto-revert guard) + de-vendor installer-managed skills
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Status**: accepted
|
||||
- **Decision**: `install-plugins.sh` snapshots `CLAUDE.md` + `settings.json` + `.claude/settings.json` at start, restores them on EXIT (trap) → installer never mutates hand-curated config. `frontend-design` un-tracked (`git rm --cached` + gitignore `skills-external/frontend-design/`) — re-synced from the example-skills plugin cache every run, so vendoring = pure churn. npx-skills pollution (`/.agents/`, `/skills-lock.json`) gitignored, anchored so our `agents/` stays tracked.
|
||||
- **Why**: a fresh `make install` drifted all 4: graphify clobbered `CLAUDE.md` (deleted the `# This repo only` header) + injected aggressive MANDATORY pre-tool hooks; `claude plugin install` flipped `example-skills`→true + added `plugin-dev`; frontend-design diffed on every upstream update; darwin-skill polluted repo `.agents/` at project scope. Guard = these files maintained by hand+commit only; gitignore = generated artifacts never tracked.
|
||||
- **Caveat**: guard makes the 3 config files install-immutable — anything the installer SHOULD add must be committed by hand. Safe today: committed `settings.json` already carries the rtk hook (install skips init). `update-all.sh` needs no guard (only `claude plugin update`, no enable flips, no graphify reconfig).
|
||||
- **Alternatives rejected**: `git checkout` post-install (nukes legit uncommitted edits, depends on git state); surgical JSON/markdown patching (fragile); accept graphify's generic CLAUDE.md (loses curation).
|
||||
- **Reference**: `install-plugins.sh` guard block + `restore_curated_configs` trap, `.gitignore`, commits 51afe9b / 7de8761. Linked to [[LRN-039]].
|
||||
|
||||
---
|
||||
|
||||
## BDR-029 — Installer auto-fixes gstack browser on OS newer than its pinned Playwright supports
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Status**: accepted
|
||||
- **Decision**: `install-plugins.sh` makes gstack's browser work on too-new distros without manual steps. (1) `gstack_bump_playwright_if_unsupported()` runs before `./setup`: if the pinned Playwright's support list lacks the running distro (grep `node_modules/playwright-core/lib` for the `ubuntuXX.04` tag), `bun add playwright@latest` in the submodule, then `./setup`'s frozen-lockfile install picks it up + rebuilds the browse binary. Idempotent (skips when already supported). (2) Persist `GSTACK_CHROMIUM_NO_SANDBOX=1` to the shell profile, gated on `sysctl kernel.apparmor_restrict_unprivileged_userns=1`.
|
||||
- **Why**: fresh `make install` on Ubuntu 26.04 must yield a working gstack browser. Submodule pins Playwright 1.58.2; upstream hasn't bumped; can't wait. Local bump in the installer = "just works" + self-heals after a `git submodule update` (re-applies next run).
|
||||
- **Caveats**: the installer EDITS the submodule (goes dirty each run on a too-new OS) — invasive, but the user chose it over waiting upstream. `bun add playwright@latest` could pull a Playwright that breaks gstack's build → non-fatal (`./setup` fail warns, install continues). The local bump is reset by `git submodule update`. The `.bashrc` env can be wiped if the user restores a hand-managed `.bashrc` (theirs is managed — the first install's lines were already lost that way).
|
||||
- **Alternatives rejected**: `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE` (fallback build HANGS at extraction — [[BLK-008]]); wait for gstack upstream Playwright bump (no ETA); leave browser unavailable (user wanted it); system chromium + executablePath (needs gstack code change).
|
||||
- **Reference**: `install-plugins.sh` `gstack_bump_playwright_if_unsupported()` + Step 9 sysctl-gated env, commit 3b8ffb1. Linked to [[LRN-040]], [[BLK-008]].
|
||||
|
||||
---
|
||||
|
||||
## BDR-030 — gstack skills activated ON-DEMAND per profile, not pre-installed; OFF by default stays
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Status**: accepted
|
||||
- **Decision**: gstack stays OFF by default (no per-skill symlink in `skills/`, zero context cost) — but `profile.sh set <profile>` that LISTS a gstack skill activates it for that profile. `enable_skill gstack` gained a branch: skill not in `skills/` and not parked in `skills-disabled/` but present in the `skills-external/gstack/<name>` submodule → `ln -s` it into `skills/`. `disable_gstack_not_in()` parks it again when an unrelated profile is set. The gstack/bin + browse/dist infra those skills need is created independently by `link.sh`.
|
||||
- **Why**: user wanted `make install` self-sufficient AND `set full` (lists 35 gstack skills) to work without 35 `missing — try: bash link.sh` warnings, WITHOUT abandoning gstack's OFF-by-default context-cost policy ([[BDR-029]] install comment). On-demand-per-profile threads both: gstack invisible until a profile needs it, then auto-on for exactly that profile. Source of truth = the submodule (`gstack_skills()` already reads `skills-external/gstack/*/SKILL.md`), so activation needs no gstack `./setup` skill-registration (which this gstack version writes to the WRONG dir anyway — [[LRN-042]]).
|
||||
- **Caveats**: the symlink form (`skills/<name> -> skills-external/gstack/<name>`) differs from what gstack `./setup` would create (real dir + symlinked SKILL.md) — fine here because `./setup` never populates `skills/` in this layout, so no mixed-form collision. Browse RUNTIME still needs the built binary + sandbox env ([[BDR-029]]) — on-demand makes the skill DISCOVERABLE, not the browser functional on an unsupported OS. The old "try: bash link.sh" message was wrong (link.sh never creates gstack skills) → replaced with submodule-aware messages.
|
||||
- **Alternatives rejected**: full gstack integration (make `./setup` install into `skills/`) — user picked option 1, too invasive/version-fragile; leave `full` broken with honest 1-line warning — worse UX; pre-symlink all gstack at install — violates OFF-by-default context policy.
|
||||
- **Reference**: `lib/profile.sh` `GSTACK_SRC` + `enable_skill` gstack branch. Verified: `set full` → 0 missing, 35 on-demand; `minimal`↔`full` cycle re-parks/restores; git clean (gstack symlinks gitignored, [[LRN-025]]). Linked to [[LRN-042]], [[LRN-022]], [[BDR-018]] (gstack on/off verb).
|
||||
|
||||
@ -64,3 +64,13 @@ rules:
|
||||
- **Method**: 5 parallel structure judges (shared rubric file, calibration anchor, lower-score-when-hesitating rule) + 5 behavior tests on fixtures (hotfix, geo, commit-change, status, analyze) + geo fix validated by re-test (0 source edits, `?? .claude/` only) + 2/2 counterbalanced blind judges (safety 3→9).
|
||||
- **Anomalies**: (1) KEY: stub skills (analyze 33.5, hotfix 36.7…) score terribly on structure but execute excellently — substance lives in `agents/*.md`; rubric must judge SKILL.md+agent.md as system, else misleading. (2) geo confirmed live: 2 HTML source files edited unsupervised pre-fix. (3) Self-inflicted: overwrote 5 pre-existing test-prompts.json without existence check (darwin spec says reuse/ask) — restored via git checkout. (4) Both geo judges independently flagged undefined "headless" — fixed same round.
|
||||
- **Action**: keep — bugs real, fixes verified. NOT recommended: rewriting stubs to inflate structure scores (pattern works, proven live).
|
||||
|
||||
---
|
||||
|
||||
## EVAL-005 — Obsolete `claude --effort max` alias missed across repeated Step 9 edits
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Output checked**: install-plugins.sh Step 9 kept `alias claude='claude --effort max'` while `settings.json` sets `"effortLevel": "xhigh"` (the source of truth). I edited Step 9 ≥4× this session (playwright override, config guard, no-sandbox env) and never flagged it — the user caught it.
|
||||
- **Method / why missed**: I treated the pre-existing `CLAUDE_LINES` as established and only touched the lines I was adding/removing. Spotting the redundancy needs cross-referencing TWO config layers (shell alias vs settings.json) — a semantic check I never ran. Masked further: the user's `.bashrc` is hand-managed and the alias line wasn't even present, so it looked inert.
|
||||
- **Anomaly**: not just dead config — a CLI flag (`--effort max`) silently OVERRIDES the settings.json value (`xhigh`). Real correctness bug.
|
||||
- **Action**: when editing installer shell-config, audit EACH existing line against the current settings.json / CLAUDE.md source of truth, not only the lines being changed. Removed the alias + added cleanup. General rule: reconcile config to ONE source of truth across env/alias/settings layers.
|
||||
|
||||
@ -172,3 +172,21 @@ rules:
|
||||
- Built via superpowers:writing-skills TDD: RED v1 baseline too easy (passed) → strengthened to RED v2 (pressured) which failed on anti-noise + invented subtask + no gate → GREEN passed. Gate STOP itself untested (non-interactive harness) — flagged as skill Red flag.
|
||||
- LRN-031: skill value = gate + anti-noise + determinism, NOT re-coding what a capable agent does free; if RED baseline passes, harden the fixture before writing.
|
||||
- Docs routing synced (CLAUDE.md table + README + USAGE) in separate commit; caveman-purge WIP in those files left unstaged. Commits 9dc2b83, be0f047, 765e9d7.
|
||||
|
||||
## 2026-06-23
|
||||
|
||||
- Reverted commit 1ddeed1 (centralized `lib/install-prereqs.sh`) — over-engineered for the real blocker. Replaced with minimal npm-via-nvm fallback in `install.sh` (b6cc8b1). Re-added `jq` prereq inline + `doctor.sh` fail-level (2194b11). BDR-027.
|
||||
- Diagnosed gstack chromium fail on Ubuntu 26.04: Playwright 1.58.2 doesn't list 26.04. Fix = gated `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64`, wrapper-only (no submodule edit), install + runtime (211c7d4). Verified ldd + headless render on 26.04. BLK-008, LRN-038.
|
||||
|
||||
- Fresh-install audit: `make install` drifted 4 repo files. Root-caused each: graphify installer clobbers `CLAUDE.md` (deletes `# This repo only` header) + injects MANDATORY hooks in `.claude/settings.json`; `claude plugin install` flips `example-skills`→true + adds `plugin-dev` in `settings.json`; example-skills `cp` churns `frontend-design`; `npx skills add` pollutes repo `.agents/` + `skills-lock.json`.
|
||||
- Fix: reverted current drift (`git checkout` 3 configs); added snapshot+trap-restore guard in `install-plugins.sh` (curated config now install-immutable); de-vendored frontend-design + gitignored `/.agents/` + `/skills-lock.json` (anchored so `agents/` stays tracked). Guard tested drift→restore. Commits 51afe9b / 7de8761. BDR-028, LRN-039.
|
||||
|
||||
- gstack chromium fix BACKFIRED: the `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64` pin made `make plugin` HANG at extraction on real 26.04 (download hits 100%, chrome never extracts) — worse than the original 0.5s fast-fail. Reverted (b9c3937). Root: isolated `ldd`+render proof used a sibling already-extracted build (rev 1228), masking the rev-1208 install-path hang. gstack browser stays unavailable on 26.04 (OFF by default); real fix upstream. Corrected BLK-008 + LRN-038.
|
||||
|
||||
- gstack browser FIXED on Ubuntu 26.04 (full saga). `git submodule update` would NOT help (latest gstack still pins playwright 1.58.2). Two layers: (1) bumped Playwright→1.61 in submodule (native 26.04 build), (2) GSTACK_CHROMIUM_NO_SANDBOX=1 for AppArmor userns block. Both automated in install-plugins.sh (auto-bump gated on dep support-list grep; env gated on apparmor sysctl) + env to .bashrc. Verified browse drives a real page (200). Discovered user's .bashrc is hand-managed (installer's env lines had been wiped by a restore). Commit 3b8ffb1. BDR-029, LRN-040, BLK-008 resolved.
|
||||
|
||||
- Fixed MAGIC_API_KEY false-negative: check grep'd `repo/.env` (symlink), never created because `~/.claude/.env` was made AFTER link.sh on the fresh machine (and `make plugin` skips link.sh). install-plugins.sh now self-heals the symlink + both scripts use a tolerant regex (export/whitespace/non-empty). Immediate fix: `make link`. Sandbox blocked all `.env*` reads → diagnosed via dir listing + synthetic-line regex tests. Commit 1b028cb. LRN-041.
|
||||
|
||||
- Removed obsolete `alias claude='claude --effort max'` from install Step 9 — settings.json `effortLevel: xhigh` is the source of truth and the CLI alias would override it (forcing max over xhigh). Step 9 now also strips the alias + old CLAUDE_EFFORT from the profile if present. A dtach `cc` launcher was prototyped then dropped — deferred to a later sprint (per user). Why missed earlier = EVAL-005 (never cross-audited existing Step 9 lines vs settings.json).
|
||||
|
||||
- Made install self-sufficient + gstack on-demand per profile (user: "make install doit TOUT installer"). 3 root causes via install log: (A) install.sh ran link.sh BEFORE install-plugins.sh which never re-linked → npx-skill symlinks never created on fresh run; (B) `npx skills add` + gstack `./setup` resolve target relative to CWD → darwin-skill landed in `$REPO/.agents/skills`+`$REPO/.claude/skills`, not `$HOME/.agents/skills` (self-reinforcing once `$REPO/.agents` exists); (C) `profile.sh set full` → 35 "missing — try bash link.sh" (wrong remedy) because gstack OFF + skills never in `skills/`. Fixes: install-plugins.sh runs npx from `$HOME` + cleans parasites + Step 10 final re-link; update-all.sh same npx fix; profile.sh `enable_skill gstack` symlinks on-demand from submodule (gstack OFF default, ON per profile). Verified live: link.sh → darwin OK; `set full` → 0 missing / 35 on-demand; minimal↔full cycle re-parks/restores; git clean. Residual: `$REPO/.claude/skills/darwin-skill` rm blocked by `.claude/` permission guard → auto-cleaned next `make plugin`. BDR-030, LRN-042.
|
||||
|
||||
@ -519,3 +519,54 @@ rules:
|
||||
- **Pattern**: for the load-bearing scenario, run it on the REAL subject in the REAL invocation context (prod path `$HOME/.claude/lib/...`, prod-like PATH), not a stub or a "the code path is correct" argument. A stub proves branch coverage; only the real subject proves the integration. Always add a DISCRIMINATING case — force the failure state; the check must REPORT it, not pass by default (a check that only ever passes proves nothing).
|
||||
- **Future application**: any "fixed/works" claim on a critical path → produce the real run output (command + lines + exit code) before capitalizing or shipping; don't summarize ("condition met") in place of the output. Stub/logic = necessary for branch coverage, never sufficient for the integration claim. Most rentable discipline of the whole segment: every refutation came from execution, none from reasoning.
|
||||
- **Reference**: design-gate chantier, the `PATH=/usr/bin:/bin` matrix (magic-on → READY/0, magic-off → INCOMPLETE/10), commits 4d19135 / f963318. Linked to [[LRN-036]] (the concrete instance: the PATH cause surfaced only by the real run), [[LRN-034]] (its twin — 034 = don't trust a narrated *claim*; 037 = don't trust a *stub/logic argument* as proof; both demand execution against ground truth).
|
||||
|
||||
---
|
||||
|
||||
## LRN-038 — Playwright host-platform override for distros newer than its hardcoded support list
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Context**: fresh Ubuntu 26.04. gstack `./setup` aborted: "Playwright does not support chromium on ubuntu26.04-x64". Playwright 1.58.2's registry hardcodes `ubuntu20.04/22.04/24.04` only; a newer release → no matching build → hard error. gstack is a pinned submodule (must not edit).
|
||||
- **Pattern**: `PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntuXX.04-<arch>` forces a fallback build. MUST include arch (`x64`/`arm64`) — bare `ubuntu24.04` fails ("does not support … ubuntu24.04"). Set it from the WRAPPER: `export` before the submodule's setup (install-time download) AND persist to the shell profile (runtime launch) — both paths call `getHostPlatform`. No submodule edit. Gate on real OS version (`sort -V` compare) so supported distros are untouched. Test with the LOCAL `./node_modules/.bin/playwright` — `bunx playwright` pulls the LATEST playwright (different browser revision than the local import), which masks the result.
|
||||
- **Future application**: any pinned tool that hardcodes an OS allowlist breaks on a fresh OS upgrade. Look for a host-platform override env before bumping/forking the dep. Prove the fallback binary actually runs (`ldd` = no missing libs + a real headless render), not just that the download resolves.
|
||||
- **Reference**: `install-plugins.sh` `playwright_platform_override()`, commit 211c7d4. Linked to [[BLK-008]].
|
||||
- **2026-06-23 CORRECTION (override REVERTED, commit b9c3937)**: the override is NOT a usable fix on Ubuntu 26.04. It makes `playwright install` switch to the ubuntu24.04 fallback build, which downloads to 100% then HANGS at extraction (chrome binary never materializes; real machine + sandbox). Turned a 0.5s fast-fail into an install-blocking hang. The isolated proof (`ldd` + headless render) PASSED but used an already-extracted sibling build (rev 1228) — it masked the install-path hang in the real flow (rev 1208). **Sharpened lesson**: proving the binary launches in isolation is NOT proving the install path works — run the ACTUAL install command end-to-end (it must COMPLETE, not just "download resolves" nor "a binary launches"). The override technique stays valid in general, but the EXTRACTION/COMPLETE step is part of "does it work".
|
||||
|
||||
---
|
||||
|
||||
## LRN-039 — Installers drift hand-curated config → snapshot+trap-restore guard; anchor gitignore for pollution
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Context**: fresh Ubuntu `make install`. 3rd-party installers mutated repo files: graphify rewrote `CLAUDE.md`+hooks (every `graphify install`, Step 7), `claude plugin install` flipped `enabledPlugins`, the example-skills `cp` churned `frontend-design`, `npx skills add` wrote project-scope `.agents/` + `skills-lock.json`.
|
||||
- **Pattern**: file an installer rewrites but YOU curate → snapshot to a `mktemp -d` at start + `trap restore EXIT` (`cmp -s` before `cp`, revert only real diffs). Preserves pre-existing edits, no git dependency, idempotent, survives early-exit. Pure generated pollution → gitignore. ANCHOR the ignore (`/.agents/`, NOT `.agents/` and NOT `agents`) so it can't catch a legit sibling — our agents live in `agents/` (no dot). Verify with `git check-ignore -v <legit-dir>` that the pattern doesn't over-match.
|
||||
- **Future application**: audit a fresh install = `git status` right after `make install`; classify every drift as (a) curated → guard, or (b) pollution → anchored gitignore. Never `git checkout` to clean drift (destroys uncommitted work). Prove the guard with an isolated drift→restore test before trusting it.
|
||||
- **Reference**: `install-plugins.sh` `restore_curated_configs` + EXIT trap, `.gitignore` `/.agents/`, commits 51afe9b / 7de8761. Linked to [[BDR-028]].
|
||||
|
||||
---
|
||||
|
||||
## LRN-040 — OS newer than a pinned tool supports = TWO distinct layers (version build + security policy)
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Context**: gstack browser on fresh Ubuntu 26.04. Layer 1 = Playwright 1.58.2 ships no browser build for 26.04 → install errors (the host-platform override "fixes" the error but its fallback build HANGS at extraction — dead end, [[BLK-008]]). Layer 2 = even with Playwright 1.61 (native 26.04 build that launches fine in isolation), the real browse path aborts "No usable sandbox" because Ubuntu 24.04+ restricts unprivileged user namespaces via AppArmor.
|
||||
- **Pattern**: (a) bump the tool PAST the OS-support threshold — don't force the OS to look older (overrides/fallbacks are fragile; prove the install COMPLETES, not just that a binary launches). For a pinned submodule dep: `bun add X@latest` in the submodule, automatable in the installer, idempotent by grepping the dep's support list for the running OS tag before bumping. (b) SEPARATELY handle OS security hardening: Chromium needs `--no-sandbox` where `sysctl kernel.apparmor_restrict_unprivileged_userns=1`; gstack exposes `GSTACK_CHROMIUM_NO_SANDBOX=1` (#1562). Gate persistence on the sysctl, not an OS-version guess.
|
||||
- **Future application**: "tool X broke after an OS upgrade" → check BOTH (1) does X ship a build / support entry for the new OS (bump if not), and (2) does the new OS's hardening (userns/AppArmor/SELinux) block X at runtime (needs an opt-out flag). Fix one without the other and it still fails. Verify the FULL runtime path (drive a real page) — here the isolated `chromium.launch()` PASSED while the real `browse` path failed on the sandbox.
|
||||
- **Reference**: `install-plugins.sh`, `.bashrc` `GSTACK_CHROMIUM_NO_SANDBOX=1`, gstack `browse/src/browser-manager.ts` `shouldEnableChromiumSandbox()`, commit 3b8ffb1. Linked to [[BDR-029]], [[BLK-008]], [[LRN-038]].
|
||||
|
||||
---
|
||||
|
||||
## LRN-041 — A check reading a symlink an EARLIER install step makes → false negative if that step's precondition wasn't met
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Context**: install warned "MAGIC_API_KEY not found in ~/.claude/.env" though the key WAS set there. Root: the check grep'd `$REPO/.env` — a symlink → `~/.claude/.env` ([[BDR-026]]) created by `link.sh`'s `link_env`. On a fresh machine `~/.claude/.env` is created AFTER `link.sh` runs (install first warns "create it"), so the symlink was never made and the key was unreachable via `$REPO/.env`. `make plugin` also never runs `link.sh`. The warning misleadingly blamed `~/.claude/.env`.
|
||||
- **Pattern**: a check that reads a path PRODUCED by an earlier setup step silently fails when that step's precondition wasn't met yet (target absent → symlink skipped). Fix: read the CANONICAL source and/or self-heal (create the missing symlink when the canonical exists). Env-key greps must tolerate `export `/leading whitespace and require a non-empty value: `^[[:space:]]*(export[[:space:]]+)?KEY=.` — and the message must name the real gap (symlink missing vs key absent), with an actionable hint (`run make link`).
|
||||
- **Future application**: any "X not found in FILE" where FILE is a symlink/derived path → verify the producing step ran with its precondition, prefer the canonical source, self-heal or give an actionable message. Sandbox note: `.env*` reads were blocked — diagnosed via directory listing + regex tests on SYNTHETIC lines, never reading the secret.
|
||||
- **Reference**: `install-plugins.sh` magic check (self-heal symlink + tolerant regex), `link.sh` `link_env`, commit 1b028cb. Linked to [[BDR-026]].
|
||||
|
||||
---
|
||||
|
||||
## LRN-042 — `npx skills add` / gstack `./setup` resolve install target RELATIVE TO CWD — run from repo = wrong dir, breaks `$HOME` symlink assumptions
|
||||
|
||||
- **Date**: 2026-06-23
|
||||
- **Context**: darwin-skill `npx -y skills add` (Step 8.5) + gstack `./setup` (Step 2) both ran with CWD=repo. The `skills` CLI writes to `<cwd>/.agents/skills`; gstack `./setup` likewise wrote per-skill dirs into repo-local `.agents/skills`/`.claude/skills`. So darwin landed in `$REPO/.agents/skills/darwin-skill` + `$REPO/.claude/skills/darwin-skill`, NOT `$HOME/.agents/skills/darwin-skill` where `link.sh` (NPX_EXTERNAL_SKILLS) + `install-plugins.sh` (`_dst`) look → symlink never created, "darwin-skill not installed — run make plugin" though it WAS installed. SELF-REINFORCING: once `$REPO/.agents` exists, every later `skills add` targets it. `find-skills` only worked because an earlier run (before `$REPO/.agents` existed) wrote it to `$HOME`. BDR-028/LRN-039 had already gitignored repo `.agents/`+`skills-lock.json` as "drift noise" — masked the symptom, never saw the install was landing in the WRONG PLACE.
|
||||
- **Pattern**: a per-user installer that resolves its target relative to CWD (walks up for / creates `.<tool>/` in CWD) silently installs into the project tree when run from a repo that already carries such a dir. Gitignoring the junk hides it but the artifact is unreachable from `$HOME`-based consumers. Fix: run the installer from `$HOME` (`(cd "$HOME" && npx -y skills add …)`) so it targets `$HOME/.agents/skills`; clean up the repo-local copies (gitignored → safe `rm -rf`). Also fix the ordering twin: `link.sh` must re-run AFTER the install steps that produce what it symlinks (install.sh ran link FIRST; install-plugins never re-linked) — added a final `link.sh` step so `make plugin`/`make install` finish self-sufficient.
|
||||
- **Future application**: before running any `npx <x> add` / `<tool> init` / `setup` that materializes a dotfile dir, set CWD to where the artifact MUST live (usually `$HOME`), don't trust the script's default resolution. When a "X not installed" warning contradicts a "successfully installed" log line → diff the EXPECTED path vs where the log says it wrote (here log line showed `~/Documents/claude/.agents/skills/darwin-skill`). When an installer A produces inputs for symlinker B, B must run after A in the same invocation.
|
||||
- **Reference**: `install-plugins.sh` Step 8.5 (`cd "$HOME"` + parasite cleanup) + Step 10 (final `link.sh`), `update-all.sh` Step 7.5, log `install-20260623-181416.log:1399`. Extends [[LRN-039]] (BDR-028 — gitignored the symptom) + [[LRN-007]] (toggle-external source-only state) + [[LRN-041]] (install-ordering false-negative). gstack on-demand consumer = [[BDR-030]].
|
||||
|
||||
@ -1,5 +1,33 @@
|
||||
# TODO
|
||||
|
||||
## 2026-06-23 — install self-sufficient + gstack on-demand par profil
|
||||
Goal: `make install`/`make plugin`/`make update` installent TOUT sans étape
|
||||
manuelle. Plus le profil-driven gstack on-demand (option 1 user : gstack OFF
|
||||
par défaut, mais `set <profil>` qui a besoin de gstack l'active pour ce profil).
|
||||
Root causes trouvées (logs install-20260623-181416.log) :
|
||||
- Bug A : install.sh lance link.sh (étape 5) AVANT install-plugins.sh (étape 6),
|
||||
qui n'a jamais re-lancé link.sh → symlinks npx/externes jamais créés au 1er run
|
||||
(LRN-022 documentait déjà le trou). update-all.sh re-link déjà (L364).
|
||||
- Bug B : `npx skills add` + gstack ./setup résolvent leur cible relativement au
|
||||
CWD (repo) → darwin-skill atterrit dans $REPO/.agents/skills + $REPO/.claude/skills
|
||||
au lieu de $HOME/.agents/skills. Auto-entretenu une fois $REPO/.agents créé.
|
||||
- Bug C : profile.sh "missing — try: bash link.sh" trompeur (link.sh ne crée pas
|
||||
les skills gstack) ; full.profile liste 35 skills gstack jamais posés dans skills/.
|
||||
|
||||
- [x] Edit 1 — install-plugins.sh Step 8.5 : `npx skills add` depuis $HOME (subshell cd)
|
||||
- [x] Edit 2 — install-plugins.sh : cleanup parasites $REPO/.agents/skills + $REPO/.claude/skills (gitignorés)
|
||||
- [x] Edit 3 — install-plugins.sh : Step 10 final re-lance `bash "$REPO/link.sh"` (idempotent)
|
||||
- [x] Edit 4 — update-all.sh Step 7.5 : `npx skills add` depuis $HOME (même Bug B)
|
||||
- [x] Edit 5 — lib/profile.sh : GSTACK_SRC var + enable_skill gstack branche on-demand
|
||||
(symlink skills/<name> → skills-external/gstack/<name>) + message honnête
|
||||
- [x] Verif — shellcheck/bash -n propres ; migré darwin → $HOME/.agents/skills + `bash link.sh`
|
||||
(skills/darwin-skill OK) ; `profile.sh set full` → 0 "missing", 35 gstack on-demand ;
|
||||
cycle minimal↔full OK ; git propre (symlinks gstack gitignorés) ; profil full restauré
|
||||
- [~] Cleanup machine courante : $REPO/.claude/skills/darwin-skill + .agents/skills VIDE
|
||||
restent (rm bloqué par garde permission .claude/) → auto-nettoyés au prochain `make plugin`
|
||||
- [x] Capitalize — LRN-042 (Bug B CWD-relatif) + BDR-030 (gstack on-demand par profil) + journal 2026-06-23
|
||||
- [ ] Commit (via /commit-change)
|
||||
|
||||
## profile.sh — verbe `gstack on|off`
|
||||
- [x] Extraire helper `enable_all_gstack()` (boucle de cmd_reset) — anti-duplication
|
||||
- [x] Extraire helper `disable_gstack_not_in(prof)` (boucle gstack de cmd_set) — anti-duplication
|
||||
|
||||
13
.gitignore
vendored
13
.gitignore
vendored
@ -119,3 +119,16 @@ desktop.ini
|
||||
# Profile cache — written by lib/profile.sh, read by hooks/statusline.sh
|
||||
.active-profile
|
||||
.gstack/
|
||||
|
||||
# Frontend Design (Anthropic) — installed/refreshed from the example-skills
|
||||
# plugin cache by install-plugins.sh (Step 8b) and update-all.sh on every run.
|
||||
# Not vendored: tracking it produced a repo diff each time Anthropic shipped
|
||||
# an update. The source is always re-synced, so no offline copy is needed.
|
||||
skills-external/frontend-design/
|
||||
|
||||
# npx `skills add` project-scope artifacts — darwin-skill copies itself into
|
||||
# the repo's .agents/ and writes skills-lock.json at root. Our own agents live
|
||||
# in agents/ (no dot) and stay tracked. Anchored to root so only the dotted
|
||||
# pollution dir is ignored.
|
||||
/.agents/
|
||||
/skills-lock.json
|
||||
|
||||
@ -127,6 +127,12 @@ else
|
||||
fail "Node.js not found"
|
||||
fi
|
||||
|
||||
if command -v jq &>/dev/null; then
|
||||
pass "jq $(jq --version 2>/dev/null | sed 's/^jq-//')"
|
||||
else
|
||||
fail "jq not found — statusline & rtk-rewrite hooks require it"
|
||||
fi
|
||||
|
||||
if command -v cargo &>/dev/null; then
|
||||
pass "Cargo $(cargo --version | awk '{print $2}')"
|
||||
else
|
||||
|
||||
@ -29,6 +29,42 @@ fi
|
||||
# shellcheck source=lib/detect-plugins.sh
|
||||
source "$REPO/lib/detect-plugins.sh"
|
||||
|
||||
# ── Guard hand-curated config against installer drift ────────
|
||||
# graphify's installer (Step 7) rewrites CLAUDE.md + .claude/settings.json
|
||||
# (clobbers the curated graphify section + injects aggressive MANDATORY
|
||||
# hooks), and `claude plugin install` (Step 5) flips enable-states in
|
||||
# settings.json. These 3 files are maintained by hand + commit, never by
|
||||
# the installer. Snapshot them now and restore on exit so a run leaves them
|
||||
# exactly as it found them. Pre-existing local edits are preserved; only the
|
||||
# installer's drift is undone. NOTE: this makes these files install-immutable
|
||||
# — anything the installer should add to them must be committed by hand.
|
||||
GUARDED_CONFIGS=("CLAUDE.md" ".claude/settings.json" "settings.json")
|
||||
CFG_SNAPSHOT="$(mktemp -d 2>/dev/null || true)"
|
||||
|
||||
restore_curated_configs() {
|
||||
[ -n "$CFG_SNAPSHOT" ] || return 0
|
||||
local f
|
||||
for f in "${GUARDED_CONFIGS[@]}"; do
|
||||
if [ -f "$CFG_SNAPSHOT/$f" ] && ! cmp -s "$CFG_SNAPSHOT/$f" "$REPO/$f"; then
|
||||
cp "$CFG_SNAPSHOT/$f" "$REPO/$f"
|
||||
info "Reverted installer drift in $f (curated config kept as committed)"
|
||||
fi
|
||||
done
|
||||
rm -rf "$CFG_SNAPSHOT"
|
||||
}
|
||||
|
||||
if [ -n "$CFG_SNAPSHOT" ]; then
|
||||
for _cfg in "${GUARDED_CONFIGS[@]}"; do
|
||||
if [ -f "$REPO/$_cfg" ]; then
|
||||
mkdir -p "$CFG_SNAPSHOT/$(dirname "$_cfg")"
|
||||
cp "$REPO/$_cfg" "$CFG_SNAPSHOT/$_cfg"
|
||||
fi
|
||||
done
|
||||
trap restore_curated_configs EXIT
|
||||
else
|
||||
warn "Config guard disabled (mktemp failed) — CLAUDE.md/settings may drift"
|
||||
fi
|
||||
|
||||
# Read pinned version from plugins.lock.json
|
||||
# Usage: pinned_version "rtk" → prints version string or "latest"
|
||||
pinned_version() {
|
||||
@ -193,6 +229,25 @@ else
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- jq (required by active hooks: statusline.sh, rtk-rewrite.sh) ---
|
||||
if command -v jq &>/dev/null; then
|
||||
ok "jq $(jq --version 2>/dev/null | sed 's/^jq-//')"
|
||||
else
|
||||
info "Installing jq..."
|
||||
case $OS in
|
||||
macos) brew install jq ;;
|
||||
linux-apt) sudo apt-get install -y jq ;;
|
||||
linux-dnf) sudo dnf install -y jq ;;
|
||||
linux-pacman) sudo pacman -S --noconfirm jq ;;
|
||||
*) warn "Cannot auto-install jq on $OS — statusline/rtk hooks need it" ;;
|
||||
esac
|
||||
if command -v jq &>/dev/null; then
|
||||
ok "jq installed"
|
||||
else
|
||||
warn "jq install failed — statusline & rtk-rewrite hooks require it"
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- Claude Code CLI ---
|
||||
if command -v claude &>/dev/null; then
|
||||
ok "Claude Code $(claude --version 2>/dev/null | head -1)"
|
||||
@ -203,6 +258,35 @@ fi
|
||||
|
||||
echo ""
|
||||
|
||||
# gstack pins Playwright (1.58.x) which only ships browser builds for
|
||||
# ubuntu<=24.04. On a newer distro the browser install fails ("does not
|
||||
# support chromium on ubuntuXX.04"). Bump gstack's Playwright to a version
|
||||
# that supports this OS so ./setup builds the browse binary against it and
|
||||
# installs a native browser. Fires only when the pinned version genuinely
|
||||
# lacks support — idempotent across runs. Edits the submodule locally (goes
|
||||
# dirty); a `git submodule update` resets it and the next install re-applies.
|
||||
# See BLK-008 / LRN-040.
|
||||
gstack_bump_playwright_if_unsupported() {
|
||||
[ -d "$GSTACK_DIR" ] && [ -r /etc/os-release ] || return 0
|
||||
local ostag pwlib
|
||||
# shellcheck disable=SC1091
|
||||
ostag="$(. /etc/os-release 2>/dev/null; [ "${ID:-}" = ubuntu ] && printf 'ubuntu%s' "${VERSION_ID:-}")"
|
||||
[ -n "$ostag" ] || return 0 # only the known Ubuntu case
|
||||
pwlib="$GSTACK_DIR/node_modules/playwright-core/lib"
|
||||
# populate node_modules at the pinned version so we can read its support list
|
||||
( cd "$GSTACK_DIR" && { bun install --frozen-lockfile >/dev/null 2>&1 || bun install >/dev/null 2>&1; } ) || return 0
|
||||
if grep -rqs "$ostag" "$pwlib" 2>/dev/null; then
|
||||
return 0 # pinned Playwright already supports this OS
|
||||
fi
|
||||
info "gstack's Playwright lacks $ostag support — bumping to latest (local submodule edit)..."
|
||||
( cd "$GSTACK_DIR" && bun add playwright@latest >/dev/null 2>&1 )
|
||||
if grep -rqs "$ostag" "$pwlib" 2>/dev/null; then
|
||||
ok "gstack Playwright bumped — now supports $ostag (browse binary rebuilt by ./setup)"
|
||||
else
|
||||
warn "Playwright bump didn't add $ostag support — gstack browser may stay unavailable"
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================
|
||||
# STEP 2 — GSTACK SUBMODULE
|
||||
# ============================================================
|
||||
@ -246,6 +330,12 @@ if [ -d "$GSTACK_DIR" ]; then
|
||||
ok "bun $(bun --version)"
|
||||
fi
|
||||
|
||||
# On a distro newer than gstack's pinned Playwright supports, bump Playwright
|
||||
# BEFORE ./setup so its frozen-lockfile install picks up the new version and
|
||||
# the browse binary is rebuilt against it (avoids the "does not support
|
||||
# chromium" fail). Non-fatal if it can't — gstack is OFF by default.
|
||||
gstack_bump_playwright_if_unsupported
|
||||
|
||||
info "Running GStack setup..."
|
||||
if [ -x "$GSTACK_DIR/setup" ]; then
|
||||
if (cd "$GSTACK_DIR" && ./setup); then
|
||||
@ -569,6 +659,11 @@ NPX_SKILLS=(
|
||||
"alchaincyf/find-skills"
|
||||
)
|
||||
|
||||
# `skills add` resolves its target (.agents/skills/, skills-lock.json) RELATIVE
|
||||
# TO THE CWD. Running it from the repo (which carries gitignored .agents/ and
|
||||
# .claude/ dirs) makes skills land in $REPO/.agents/skills instead of
|
||||
# $HOME/.agents/skills — where link.sh expects them — and the bug is
|
||||
# self-reinforcing once $REPO/.agents exists. Always install from $HOME.
|
||||
if ! command -v npx &>/dev/null; then
|
||||
warn "npx not available — skipping external skills"
|
||||
else
|
||||
@ -579,18 +674,28 @@ else
|
||||
ok "$_name already installed ($_dst)"
|
||||
continue
|
||||
fi
|
||||
info "Installing $_name via: npx -y skills add $_src"
|
||||
if npx -y skills add "$_src" 2>/dev/null; then
|
||||
info "Installing $_name via: npx -y skills add $_src (from \$HOME)"
|
||||
if (cd "$HOME" && npx -y skills add "$_src" 2>/dev/null); then
|
||||
if [ -d "$_dst" ]; then
|
||||
ok "$_name installed"
|
||||
else
|
||||
warn "$_name installed but not at expected path $_dst"
|
||||
fi
|
||||
else
|
||||
err "$_name install failed — run manually: npx -y skills add $_src"
|
||||
err "$_name install failed — run manually: (cd \"\$HOME\" && npx -y skills add $_src)"
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
# Earlier runs (before this CWD fix) scattered skills into the repo's gitignored
|
||||
# .agents/skills and .claude/skills. They shadow the canonical $HOME copies and
|
||||
# confuse skill discovery — remove them. Both are gitignored, so this is safe.
|
||||
for _stray in "$REPO/.agents/skills" "$REPO/.claude/skills"; do
|
||||
if [ -d "$_stray" ]; then
|
||||
rm -rf "$_stray"
|
||||
info "Removed stray repo-local skills dir: $_stray"
|
||||
fi
|
||||
done
|
||||
echo ""
|
||||
|
||||
# ============================================================
|
||||
@ -617,8 +722,19 @@ if [ -x "$REPO/lib/toggle-external.sh" ]; then
|
||||
else
|
||||
ok "magic MCP disabled (default)"
|
||||
fi
|
||||
if [ ! -f "$REPO/.env" ] || ! grep -q '^MAGIC_API_KEY=' "$REPO/.env" 2>/dev/null; then
|
||||
warn "MAGIC_API_KEY not found in ~/.claude/.env — copy .env.example there and set your key before enabling"
|
||||
# The key lives in ~/.claude/.env (canonical, BDR-026), reached via the
|
||||
# repo/.env symlink that toggle-external.sh sources. Self-heal the common
|
||||
# fresh-machine case: ~/.claude/.env was created AFTER link.sh ran, so the
|
||||
# symlink is missing and the key looks absent though it's set.
|
||||
HOME_ENV="$HOME/.claude/.env"
|
||||
if [ ! -e "$REPO/.env" ] && [ -f "$HOME_ENV" ]; then
|
||||
ln -sf "$HOME_ENV" "$REPO/.env" 2>/dev/null \
|
||||
&& info "Linked repo/.env → ~/.claude/.env (was missing)"
|
||||
fi
|
||||
# Tolerate optional `export ` and leading whitespace; require a value.
|
||||
MAGIC_KEY_RE='^[[:space:]]*(export[[:space:]]+)?MAGIC_API_KEY=.'
|
||||
if [ ! -f "$REPO/.env" ] || ! grep -qE "$MAGIC_KEY_RE" "$REPO/.env" 2>/dev/null; then
|
||||
warn "MAGIC_API_KEY not set in ~/.claude/.env — add it (and run 'make link') before enabling magic"
|
||||
fi
|
||||
else
|
||||
warn "lib/toggle-external.sh not found or not executable — skipping"
|
||||
@ -642,16 +758,31 @@ fi
|
||||
[ -z "$SHELL_PROFILE" ] && SHELL_PROFILE="$HOME/.profile"
|
||||
|
||||
CLAUDE_LINES=(
|
||||
"alias claude='claude --effort max'"
|
||||
'export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1'
|
||||
)
|
||||
|
||||
# Clean up old CLAUDE_EFFORT env var if present (replaced by alias)
|
||||
# Ubuntu 24.04+ (and other distros) restrict unprivileged user namespaces via
|
||||
# AppArmor, which breaks Chromium's sandbox → gstack's browser (/browse, /qa)
|
||||
# crashes with "No usable sandbox". Persist gstack's documented opt-out, but
|
||||
# only where the restriction is actually active (precise, distro-agnostic).
|
||||
if [ "$(sysctl -n kernel.apparmor_restrict_unprivileged_userns 2>/dev/null)" = "1" ]; then
|
||||
CLAUDE_LINES+=('export GSTACK_CHROMIUM_NO_SANDBOX=1')
|
||||
fi
|
||||
|
||||
# Remove obsolete effort config — effort is now set in settings.json
|
||||
# ("effortLevel"), which supersedes both the old CLAUDE_EFFORT env var and the
|
||||
# `claude --effort max` alias (the alias would even override settings.json).
|
||||
EFFORT_CLEANED=0
|
||||
if grep -qF 'export CLAUDE_EFFORT=max' "$SHELL_PROFILE" 2>/dev/null; then
|
||||
sed -i '/export CLAUDE_EFFORT=max/d' "$SHELL_PROFILE"
|
||||
# Also remove orphaned comment lines left by previous installs
|
||||
sed -i '/export CLAUDE_EFFORT=max/d' "$SHELL_PROFILE"; EFFORT_CLEANED=1
|
||||
fi
|
||||
if grep -qF "alias claude='claude --effort max'" "$SHELL_PROFILE" 2>/dev/null; then
|
||||
sed -i "\#alias claude='claude --effort max'#d" "$SHELL_PROFILE"; EFFORT_CLEANED=1
|
||||
fi
|
||||
if [ "$EFFORT_CLEANED" -eq 1 ]; then
|
||||
# Remove orphaned comment lines left before the deleted entries
|
||||
sed -i '/^# Claude Code — added by install-plugins.sh$/{ N; /^\n$/d; }' "$SHELL_PROFILE"
|
||||
info "Removed old CLAUDE_EFFORT=max from $SHELL_PROFILE (replaced by alias)"
|
||||
info "Removed obsolete effort alias/env from $SHELL_PROFILE (effort set in settings.json)"
|
||||
fi
|
||||
|
||||
ADDED=0
|
||||
@ -674,6 +805,23 @@ if [ "$ADDED" -eq 1 ]; then
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# ============================================================
|
||||
# STEP 10 — REFRESH SYMLINKS (final, so this script is self-sufficient)
|
||||
# ============================================================
|
||||
# Steps 2/8/8.5 INSTALL skills (gstack submodule, emil/frontend/motion, npx
|
||||
# darwin/find-skills) that link.sh must symlink into ~/.claude/skills/. Since
|
||||
# link.sh runs BEFORE this script in install.sh, those symlinks would be missing
|
||||
# on a fresh run until link.sh is run again by hand. Re-run it here so
|
||||
# `make plugin` (and `make install`) finish complete — nothing left to do.
|
||||
echo "── Step 10: Refreshing symlinks (link.sh) ─────────────────"
|
||||
echo ""
|
||||
if [ -f "$REPO/link.sh" ]; then
|
||||
bash "$REPO/link.sh"
|
||||
else
|
||||
warn "link.sh not found — run it manually to create skill symlinks"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# ============================================================
|
||||
# SUMMARY
|
||||
# ============================================================
|
||||
|
||||
19
install.sh
19
install.sh
@ -22,8 +22,23 @@ echo ""
|
||||
# ── 1. Check prerequisites ──
|
||||
echo "── Checking prerequisites..."
|
||||
|
||||
# node + npm drive the Claude Code CLI install below. On a fresh machine
|
||||
# they may be absent — install the current LTS via nvm instead of aborting.
|
||||
install_node_via_nvm() {
|
||||
info "Node.js/npm missing — installing LTS via nvm..."
|
||||
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
|
||||
export NVM_DIR="${NVM_DIR:-$HOME/.nvm}"
|
||||
# shellcheck source=/dev/null
|
||||
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
|
||||
nvm install --lts
|
||||
}
|
||||
|
||||
if ! command -v node &>/dev/null || ! command -v npm &>/dev/null; then
|
||||
install_node_via_nvm
|
||||
fi
|
||||
|
||||
if ! command -v node &>/dev/null; then
|
||||
err "Node.js not found. Install it first: https://nodejs.org"
|
||||
err "Node.js install failed — install it manually: https://nodejs.org"
|
||||
fi
|
||||
|
||||
NODE_MAJOR=$(node -v | sed 's/v//' | cut -d. -f1)
|
||||
@ -33,7 +48,7 @@ fi
|
||||
ok "Node.js $(node -v)"
|
||||
|
||||
if ! command -v npm &>/dev/null; then
|
||||
err "npm not found"
|
||||
err "npm not found (expected alongside Node.js)"
|
||||
fi
|
||||
ok "npm $(npm -v)"
|
||||
|
||||
|
||||
@ -45,6 +45,7 @@ set -euo pipefail
|
||||
REPO="$(cd -P "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
||||
SKILLS_DIR="$REPO/skills"
|
||||
DISABLED_DIR="$REPO/skills-disabled"
|
||||
GSTACK_SRC="$REPO/skills-external/gstack" # gstack submodule — source of truth for gstack skills
|
||||
PROFILES_DIR="$REPO/lib/profiles"
|
||||
TOGGLE_EXTERNAL="$REPO/lib/toggle-external.sh"
|
||||
ACTIVE_CACHE="$REPO/.active-profile" # statusline reads this — keep fast (single-line file, profile name only)
|
||||
@ -247,8 +248,19 @@ enable_skill() {
|
||||
ok "enabled: $skill"
|
||||
elif [ -e "$SKILLS_DIR/$skill" ]; then
|
||||
: # already enabled — silent
|
||||
elif [ -d "$GSTACK_SRC/$skill" ]; then
|
||||
# gstack is OFF by default: its skills live only in the submodule,
|
||||
# never pre-symlinked into skills/. A profile that lists this gstack
|
||||
# skill activates it on demand by symlinking the submodule skill dir
|
||||
# in. disable_gstack_not_in() parks it again when an unrelated profile
|
||||
# is set. The gstack/bin + browse/dist infra it relies on is created
|
||||
# by link.sh, independent of this.
|
||||
ln -sf "$GSTACK_SRC/$skill" "$SKILLS_DIR/$skill"
|
||||
ok "enabled: $skill (gstack on-demand)"
|
||||
elif [ ! -d "$GSTACK_SRC" ]; then
|
||||
warn "missing: $skill — gstack submodule absent, run: git submodule update --init"
|
||||
else
|
||||
warn "missing: $skill — try: bash link.sh"
|
||||
warn "missing: $skill — not found in gstack submodule ($GSTACK_SRC)"
|
||||
fi
|
||||
;;
|
||||
external|personal)
|
||||
|
||||
2
link.sh
2
link.sh
@ -117,7 +117,7 @@ link_env() {
|
||||
echo " cp \"$REPO/.env.example\" \"$home_env\" && \"\${EDITOR:-nano}\" \"$home_env\""
|
||||
return
|
||||
fi
|
||||
grep -q '^MAGIC_API_KEY=' "$home_env" 2>/dev/null \
|
||||
grep -qE '^[[:space:]]*(export[[:space:]]+)?MAGIC_API_KEY=.' "$home_env" 2>/dev/null \
|
||||
|| echo "⚠️ $home_env has no MAGIC_API_KEY line — magic won't enable until added."
|
||||
if [ -L "$repo_env" ]; then
|
||||
[ "$(readlink "$repo_env")" = "$home_env" ] && return
|
||||
|
||||
@ -1,177 +0,0 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
@ -1,42 +0,0 @@
|
||||
---
|
||||
name: frontend-design
|
||||
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
|
||||
|
||||
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
|
||||
|
||||
## Design Thinking
|
||||
|
||||
Before coding, understand the context and commit to a BOLD aesthetic direction:
|
||||
- **Purpose**: What problem does this interface solve? Who uses it?
|
||||
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
|
||||
- **Constraints**: Technical requirements (framework, performance, accessibility).
|
||||
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
|
||||
|
||||
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
|
||||
|
||||
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
|
||||
- Production-grade and functional
|
||||
- Visually striking and memorable
|
||||
- Cohesive with a clear aesthetic point-of-view
|
||||
- Meticulously refined in every detail
|
||||
|
||||
## Frontend Aesthetics Guidelines
|
||||
|
||||
Focus on:
|
||||
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
|
||||
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
|
||||
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
|
||||
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
|
||||
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
|
||||
|
||||
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
|
||||
|
||||
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
|
||||
|
||||
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
|
||||
|
||||
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
|
||||
@ -1 +1 @@
|
||||
0.8.13
|
||||
0.8.45
|
||||
@ -1,7 +1,6 @@
|
||||
---
|
||||
name: graphify
|
||||
description: "any input (code, docs, papers, images, videos) to knowledge graph. Use when user asks any question about a codebase, documents, or project content - especially if graphify-out/ exists, treat the question as a /graphify query."
|
||||
trigger: /graphify
|
||||
description: "Use for any question about a codebase, its architecture, file relationships, or project content — especially when graphify-out/ exists, where the question should be treated as a graphify query first. Turns any input (code, docs, papers, images, videos) into a persistent knowledge graph with god nodes, community detection, and query/path/explain tools."
|
||||
---
|
||||
|
||||
# /graphify
|
||||
@ -27,6 +26,8 @@ Turn any folder of files into a navigable knowledge graph with community detecti
|
||||
/graphify <path> --graphml # export graph.graphml (Gephi, yEd)
|
||||
/graphify <path> --neo4j # generate graphify-out/cypher.txt for Neo4j
|
||||
/graphify <path> --neo4j-push bolt://localhost:7687 # push directly to Neo4j
|
||||
/graphify <path> --falkordb # generate graphify-out/cypher.txt for FalkorDB
|
||||
/graphify <path> --falkordb-push falkordb://localhost:6379 # push directly to FalkorDB
|
||||
/graphify <path> --mcp # start MCP stdio server for agent access
|
||||
/graphify <path> --watch # watch folder, auto-rebuild on code changes (no LLM needed)
|
||||
/graphify <path> --wiki # build agent-crawlable wiki (index.md + one article per community)
|
||||
@ -57,48 +58,9 @@ If the path argument starts with `https://github.com/` or `http://github.com/`,
|
||||
|
||||
Follow these steps in order. Do not skip steps.
|
||||
|
||||
### Step 0 - Clone GitHub repo(s) (only if a GitHub URL was given)
|
||||
### Step 0 - GitHub repos and multi-path merge (only if a URL or several paths)
|
||||
|
||||
**Single repo:**
|
||||
```bash
|
||||
LOCAL_PATH=$(graphify clone <github-url> [--branch <branch>])
|
||||
# Use LOCAL_PATH as the target for all subsequent steps
|
||||
```
|
||||
|
||||
**Multiple repos (cross-repo graph):**
|
||||
```bash
|
||||
# Clone each repo, run the full pipeline on each, then merge
|
||||
graphify clone <url1> # → ~/.graphify/repos/<owner1>/<repo1>
|
||||
graphify clone <url2> # → ~/.graphify/repos/<owner2>/<repo2>
|
||||
# Run /graphify on each local path to produce their graph.json files
|
||||
# Then merge:
|
||||
graphify merge-graphs \
|
||||
~/.graphify/repos/<owner1>/<repo1>/graphify-out/graph.json \
|
||||
~/.graphify/repos/<owner2>/<repo2>/graphify-out/graph.json \
|
||||
--out graphify-out/cross-repo-graph.json
|
||||
```
|
||||
|
||||
Graphify clones into `~/.graphify/repos/<owner>/<repo>` and reuses existing clones on repeat runs. Each node in the merged graph carries a `repo` attribute so you can filter by origin.
|
||||
|
||||
**Multiple local subfolders (monorepo or multi-service layout):**
|
||||
|
||||
The skill pipeline writes all intermediate and final outputs to `graphify-out/` in the current working directory. Running the skill on each subfolder separately will clobber the same output dir. Instead, use the CLI directly for each subfolder — it places `graphify-out/` *inside* the scanned path:
|
||||
|
||||
```bash
|
||||
graphify extract ./core/ # → ./core/graphify-out/graph.json
|
||||
graphify extract ./service/ # → ./service/graphify-out/graph.json
|
||||
graphify extract ./platform/ # → ./platform/graphify-out/graph.json
|
||||
# Add --backend gemini|kimi|openai|deepseek|claude-cli depending on which API key you have set
|
||||
|
||||
# Then merge at the project root:
|
||||
graphify merge-graphs \
|
||||
./core/graphify-out/graph.json \
|
||||
./service/graphify-out/graph.json \
|
||||
./platform/graphify-out/graph.json \
|
||||
--out graphify-out/graph.json
|
||||
```
|
||||
|
||||
Once `graphify-out/graph.json` exists, the fast path above takes over: any codebase question runs `graphify query` directly on the merged graph — no re-extraction, no size gate.
|
||||
Only when the path is one or more `https://github.com/...` URLs, or several local subfolders to merge. See `references/github-and-merge.md` for the clone, cross-repo merge, and monorepo flow, then continue with the resolved local path. A plain local path skips this step.
|
||||
|
||||
### Step 1 - Ensure graphify is installed
|
||||
|
||||
@ -179,50 +141,9 @@ Then act on it:
|
||||
- Otherwise rank by count, show the top 5 with file counts, then ask which subfolder to run on. Wait for the user's answer before proceeding.
|
||||
- Otherwise: proceed directly to Step 2.5 if video files were detected, or Step 3 if not.
|
||||
|
||||
### Step 2.5 - Transcribe video / audio files (only if video files detected)
|
||||
### Step 2.5 - Video and audio (only if video files detected)
|
||||
|
||||
Skip this step entirely if `detect` returned zero `video` files.
|
||||
|
||||
Video and audio files cannot be read directly. Transcribe them to text first, then treat the transcripts as doc files in Step 3.
|
||||
|
||||
**Strategy:** Read the god nodes from `graphify-out/.graphify_detect.json` (or the analysis file if it exists from a previous run). You are already a language model — write a one-sentence domain hint yourself from those labels. Then pass it to Whisper as the initial prompt. No separate API call needed.
|
||||
|
||||
**However**, if the corpus has *only* video files and no other docs/code, use the generic fallback prompt: `"Use proper punctuation and paragraph breaks."`
|
||||
|
||||
**Step 1 - Write the Whisper prompt yourself.**
|
||||
|
||||
Read the top god node labels from detect output or analysis, then compose a short domain hint sentence, for example:
|
||||
|
||||
- Labels: `transformer, attention, encoder, decoder` → `"Machine learning research on transformer architectures and attention mechanisms. Use proper punctuation and paragraph breaks."`
|
||||
- Labels: `kubernetes, deployment, pod, helm` → `"DevOps discussion about Kubernetes deployments and Helm charts. Use proper punctuation and paragraph breaks."`
|
||||
|
||||
Set it as `WHISPER_PROMPT` to use in the next command.
|
||||
|
||||
**Step 2 - Transcribe:**
|
||||
|
||||
```bash
|
||||
GRAPHIFY_WHISPER_MODEL=base # or whatever --whisper-model the user passed
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json, os
|
||||
from pathlib import Path
|
||||
from graphify.transcribe import transcribe_all
|
||||
|
||||
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
|
||||
video_files = detect.get('files', {}).get('video', [])
|
||||
prompt = os.environ.get('GRAPHIFY_WHISPER_PROMPT', 'Use proper punctuation and paragraph breaks.')
|
||||
|
||||
transcript_paths = transcribe_all(video_files, initial_prompt=prompt)
|
||||
print(json.dumps(transcript_paths, ensure_ascii=False))
|
||||
" > graphify-out/.graphify_transcripts.json
|
||||
```
|
||||
|
||||
After transcription:
|
||||
- Read the transcript paths from `graphify-out/.graphify_transcripts.json`
|
||||
- Add them to the docs list before dispatching semantic subagents in Step 3B
|
||||
- Print how many transcripts were created: `Transcribed N video file(s) -> treating as docs`
|
||||
- If transcription fails for a file, print a warning and continue with the rest
|
||||
|
||||
**Whisper model:** Default is `base`. If the user passed `--whisper-model <name>`, set `GRAPHIFY_WHISPER_MODEL=<name>` in the environment before running the command above.
|
||||
Skip this step entirely if `detect` returned zero `video` files. When the corpus has video or audio, see `references/transcribe.md` to transcribe them to text first, then treat the transcripts as doc files in Step 3.
|
||||
|
||||
### Step 3 - Extract entities and relationships
|
||||
|
||||
@ -269,7 +190,15 @@ else:
|
||||
|
||||
#### Part B - Semantic extraction (parallel subagents)
|
||||
|
||||
**Fast path:** If detection found zero docs, papers, and images (code-only corpus), skip Part B entirely and go straight to Part C. AST handles code - there is nothing for semantic subagents to do.
|
||||
**Fast path:** If detection found zero docs, papers, and images (code-only corpus), skip Part B entirely and go straight to Part C. AST handles code - there is nothing for semantic subagents to do. **First write an empty semantic file** so Part C's merge has its input (it reads `.graphify_semantic.json` unconditionally; without this a code-only run hits `FileNotFoundError`):
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
Path('graphify-out/.graphify_semantic.json').write_text(json.dumps({'nodes':[],'edges':[],'hyperedges':[],'input_tokens':0,'output_tokens':0}), encoding='utf-8')
|
||||
"
|
||||
```
|
||||
|
||||
**MANDATORY: You MUST use the Agent tool here. Reading files yourself one-by-one is forbidden - it is 5-10x slower. If you do not use the Agent tool you are doing this wrong.**
|
||||
|
||||
@ -290,12 +219,19 @@ from graphify.cache import check_semantic_cache
|
||||
from pathlib import Path
|
||||
|
||||
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
|
||||
all_files = [f for files in detect['files'].values() for f in files]
|
||||
# Only content files go to semantic extraction. Code is already covered structurally
|
||||
# by the AST pass (Part A); flattening every category here makes subagents re-read
|
||||
# every source file (#1392). Video is transcribed to a document in Step 2.5 first.
|
||||
all_files = [f for cat in ('document', 'paper', 'image') for f in detect['files'].get(cat, [])]
|
||||
|
||||
cached_nodes, cached_edges, cached_hyperedges, uncached = check_semantic_cache(all_files)
|
||||
|
||||
# Always (re)write the cache file: write hits, else DELETE any leftover from a prior
|
||||
# run so Part C never merges a stale .graphify_cached.json (#1392).
|
||||
if cached_nodes or cached_edges or cached_hyperedges:
|
||||
Path('graphify-out/.graphify_cached.json').write_text(json.dumps({'nodes': cached_nodes, 'edges': cached_edges, 'hyperedges': cached_hyperedges}, ensure_ascii=False), encoding=\"utf-8\")
|
||||
else:
|
||||
Path('graphify-out/.graphify_cached.json').unlink(missing_ok=True)
|
||||
Path('graphify-out/.graphify_uncached.txt').write_text('\n'.join(uncached), encoding=\"utf-8\")
|
||||
print(f'Cache: {len(all_files)-len(uncached)} files hit, {len(uncached)} files need extraction')
|
||||
"
|
||||
@ -325,76 +261,13 @@ Each subagent receives this exact prompt (substitute FILE_LIST, CHUNK_NUM, TOTAL
|
||||
|
||||
CHUNK_PATH must be an **absolute** path — derive it before dispatching:
|
||||
```bash
|
||||
PROJECT_ROOT=$(cat graphify-out/.graphify_root)
|
||||
PROJECT_ROOT=$(pwd) # cwd — where Part C globs graphify-out/ (NOT .graphify_root/scan dir, #1392)
|
||||
# Then for chunk N: CHUNK_PATH="${PROJECT_ROOT}/graphify-out/.graphify_chunk_0N.json"
|
||||
```
|
||||
|
||||
Subagent prompt template:
|
||||
|
||||
```
|
||||
You are a graphify extraction subagent. Read the files listed and extract a knowledge graph fragment.
|
||||
Output ONLY valid JSON matching the schema below - no explanation, no markdown fences, no preamble.
|
||||
|
||||
Files (chunk CHUNK_NUM of TOTAL_CHUNKS):
|
||||
FILE_LIST
|
||||
|
||||
Rules:
|
||||
- EXTRACTED: relationship explicit in source (import, call, citation, "see §3.2")
|
||||
- INFERRED: reasonable inference (shared data structure, implied dependency)
|
||||
- AMBIGUOUS: uncertain - flag for review, do not omit
|
||||
|
||||
Code files: focus on semantic edges AST cannot find (call relationships, shared data, arch patterns).
|
||||
Do not re-extract imports - AST already has those.
|
||||
Doc/paper files: extract named concepts, entities, citations. For rationale (WHY decisions were made, trade-offs, design intent): store as a `rationale` attribute on the relevant concept node — do NOT create a separate rationale node or fragment node. Only create a node for something that is itself a named entity or concept. Use `file_type:"rationale"` for concept-like nodes (ideas, principles, mechanisms, design patterns). `file_type` MUST be one of exactly these six values: `code`, `document`, `paper`, `image`, `rationale`, `concept`. Any other value is invalid and will be rejected.
|
||||
Code files: when adding `calls` edges, source MUST be the caller (the function/class doing the calling), target MUST be the callee. Never reverse this direction.
|
||||
Image files: use vision to understand what the image IS - do not just OCR.
|
||||
UI screenshot: layout patterns, design decisions, key elements, purpose.
|
||||
Chart: metric, trend/insight, data source.
|
||||
Tweet/post: claim as node, author, concepts mentioned.
|
||||
Diagram: components and connections.
|
||||
Research figure: what it demonstrates, method, result.
|
||||
Handwritten/whiteboard: ideas and arrows, mark uncertain readings AMBIGUOUS.
|
||||
|
||||
DEEP_MODE (if --mode deep was given): be aggressive with INFERRED edges - indirect deps,
|
||||
shared assumptions, latent couplings. Mark uncertain ones AMBIGUOUS instead of omitting.
|
||||
|
||||
Semantic similarity: if two concepts in this chunk solve the same problem or represent the same idea without any structural link (no import, no call, no citation), add a `semantically_similar_to` edge marked INFERRED with a confidence_score reflecting how similar they are (0.6-0.95). Examples:
|
||||
- Two functions that both validate user input but never call each other
|
||||
- A class in code and a concept in a paper that describe the same algorithm
|
||||
- Two error types that handle the same failure mode differently
|
||||
Only add these when the similarity is genuinely non-obvious and cross-cutting. Do not add them for trivially similar things.
|
||||
|
||||
Hyperedges: if 3 or more nodes clearly participate together in a shared concept, flow, or pattern that is not captured by pairwise edges alone, add a hyperedge to a top-level `hyperedges` array. Examples:
|
||||
- All classes that implement a common protocol or interface
|
||||
- All functions in an authentication flow (even if they don't all call each other)
|
||||
- All concepts from a paper section that form one coherent idea
|
||||
Use sparingly — only when the group relationship adds information beyond the pairwise edges. Maximum 3 hyperedges per chunk.
|
||||
|
||||
If a file has YAML frontmatter (--- ... ---), copy source_url, captured_at, author,
|
||||
contributor onto every node from that file.
|
||||
|
||||
confidence_score is REQUIRED on every edge - never omit it, never use 0.5 as a default:
|
||||
- EXTRACTED edges: confidence_score = 1.0 always
|
||||
- INFERRED edges: pick exactly ONE value from this set — never 0.5:
|
||||
0.95 direct structural evidence (shared data structure, named cross-file reference).
|
||||
0.85 strong inference (clear functional alignment, no direct symbol link).
|
||||
0.75 reasonable inference (shared problem domain + similar shape, requires interpretation).
|
||||
0.65 weak inference (thematically related, no shape evidence).
|
||||
0.55 speculative but plausible (surface-level co-occurrence only).
|
||||
Models follow discrete rubrics better than continuous ranges; the bimodal
|
||||
distribution observed in production (>50% at 0.5, >40% at 0.85+) shows the
|
||||
range guidance is being collapsed to a binary. If no value above fits, mark
|
||||
the edge AMBIGUOUS rather than picking 0.4 or below.
|
||||
- AMBIGUOUS edges: 0.1-0.3
|
||||
|
||||
Node ID format: lowercase, only `[a-z0-9_]`, no dots or slashes. Format: `{stem}_{entity}` where stem is `{parent_dir}_{filename_without_ext}` (the **immediate** parent directory name + the filename stem, both lowercased with non-alphanumeric chars replaced by `_`) and entity is the symbol name similarly normalized. Only one level of parent is used — not the full path. Examples: `src/auth/session.py` + `ValidateToken` → `auth_session_validatetoken`; `lib/utils/helpers.py` + `parse_url` → `utils_helpers_parse_url`; `tests/test_foo.py` + `_helper` → `tests_test_foo_helper`. Top-level files (no parent dir, e.g. `setup.py`) use just the filename stem: `setup_my_func`. This must match the ID the AST extractor generates — using just the filename (e.g., `session_validatetoken`) or the full path (e.g., `src_auth_session_validatetoken`) will create orphan ghost-duplicate nodes. If you are re-extracting a project that had ghost duplicates under the old format, the user should run `graphify extract --force` to rebuild cleanly. CRITICAL: never append chunk numbers, sequence numbers, or any suffix to an ID (no `_c1`, `_c2`, `_chunk2`, etc.). IDs must be deterministic from the label alone — the same entity must always produce the same ID regardless of which chunk processes it.
|
||||
|
||||
Generate the extraction JSON matching this schema exactly:
|
||||
{"nodes":[{"id":"session_validatetoken","label":"Human Readable Name","file_type":"code|document|paper|image|rationale|concept","source_file":"relative/path","source_location":null,"source_url":null,"captured_at":null,"author":null,"contributor":null}],"edges":[{"source":"node_id","target":"node_id","relation":"calls|implements|references|cites|conceptually_related_to|shares_data_with|semantically_similar_to|rationale_for","confidence":"EXTRACTED|INFERRED|AMBIGUOUS","confidence_score":1.0,"source_file":"relative/path","source_location":null,"weight":1.0}],"hyperedges":[{"id":"snake_case_id","label":"Human Readable Label","nodes":["node_id1","node_id2","node_id3"],"relation":"participate_in|implement|form","confidence":"EXTRACTED|INFERRED","confidence_score":0.75,"source_file":"relative/path"}],"input_tokens":0,"output_tokens":0}
|
||||
|
||||
Then write the JSON to disk using the Write tool at this exact absolute path (no relative paths — Write resolves relative paths against an undefined cwd and the file will be silently lost):
|
||||
CHUNK_PATH
|
||||
```
|
||||
See `references/extraction-spec.md` for the exact subagent prompt (JSON schema, node-ID rules, confidence rubric, frontmatter, hyperedge, and vision rules). Load it only here, only when at least one chunk holds a doc, paper, or image; a pure-code corpus has skipped Part B and never reads it. Pass each subagent that prompt verbatim with FILE_LIST, CHUNK_NUM, TOTAL_CHUNKS, DEEP_MODE, and CHUNK_PATH substituted, and have it write the result to CHUNK_PATH.
|
||||
|
||||
**Step B3 - Collect, cache, and merge**
|
||||
|
||||
@ -511,7 +384,7 @@ print(f'Merged: {total} nodes, {edges} edges ({len(ast[\"nodes\"])} AST + {len(s
|
||||
|
||||
### Step 4 - Build graph, cluster, analyze, generate outputs
|
||||
|
||||
**Before starting:** note whether `--directed` was given. If so, pass `directed=True` to `build_from_json()` in the code block below. This builds a `DiGraph` that preserves edge direction (source→target) instead of the default undirected `Graph`.
|
||||
**Before starting:** the code blocks below pass `directed=IS_DIRECTED` to `build_from_json()`. Replace `IS_DIRECTED` with `True` if `--directed` was given (builds a `DiGraph` preserving edge direction source→target), otherwise `False` (the default undirected `Graph`). Substitute it the same way you substitute `INPUT_PATH` — do not leave the literal `IS_DIRECTED` in the code.
|
||||
|
||||
```bash
|
||||
mkdir -p graphify-out
|
||||
@ -527,7 +400,15 @@ from pathlib import Path
|
||||
extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
|
||||
detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
|
||||
|
||||
G = build_from_json(extraction)
|
||||
# root= mirrors the --update runbook (#1361): relativize source_file to the same
|
||||
# base so the full build and incremental --update never drift apart on re-extract.
|
||||
G = build_from_json(extraction, root='INPUT_PATH', directed=IS_DIRECTED)
|
||||
# Guard BEFORE any write: an empty extraction must not clobber a good graph.json /
|
||||
# GRAPH_REPORT.md / analysis sidecar. Check immediately after build (#1392).
|
||||
if G.number_of_nodes() == 0:
|
||||
print('ERROR: Graph is empty - extraction produced no nodes.')
|
||||
print('Possible causes: all files were skipped, binary-only corpus, or extraction failed.')
|
||||
raise SystemExit(1)
|
||||
communities = cluster(G)
|
||||
cohesion = score_all(G, communities)
|
||||
tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}
|
||||
@ -537,10 +418,17 @@ labels = {cid: 'Community ' + str(cid) for cid in communities}
|
||||
# Placeholder questions - regenerated with real labels in Step 5
|
||||
questions = suggest_questions(G, communities, labels)
|
||||
|
||||
# Export FIRST and honor the #479 shrink-guard: to_json returns False (writing
|
||||
# nothing) when the new graph is smaller than the existing graph.json. Only write
|
||||
# GRAPH_REPORT.md + the analysis sidecar when the graph was actually written, so
|
||||
# they never describe a graph that graph.json doesn't contain (#1392).
|
||||
wrote = to_json(G, communities, 'graphify-out/graph.json')
|
||||
if not wrote:
|
||||
print('ERROR: refused to shrink graphify-out/graph.json (existing graph has more nodes; #479).')
|
||||
print('If this shrink is intentional (you deleted files), re-run a full build with --force.')
|
||||
raise SystemExit(1)
|
||||
report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, 'INPUT_PATH', suggested_questions=questions)
|
||||
Path('graphify-out/GRAPH_REPORT.md').write_text(report, encoding=\"utf-8\")
|
||||
to_json(G, communities, 'graphify-out/graph.json')
|
||||
|
||||
analysis = {
|
||||
'communities': {str(k): v for k, v in communities.items()},
|
||||
'cohesion': {str(k): v for k, v in cohesion.items()},
|
||||
@ -549,10 +437,6 @@ analysis = {
|
||||
'questions': questions,
|
||||
}
|
||||
Path('graphify-out/.graphify_analysis.json').write_text(json.dumps(analysis, indent=2, ensure_ascii=False), encoding=\"utf-8\")
|
||||
if G.number_of_nodes() == 0:
|
||||
print('ERROR: Graph is empty - extraction produced no nodes.')
|
||||
print('Possible causes: all files were skipped, binary-only corpus, or extraction failed.')
|
||||
raise SystemExit(1)
|
||||
print(f'Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges, {len(communities)} communities')
|
||||
"
|
||||
```
|
||||
@ -580,7 +464,8 @@ extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(en
|
||||
detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
|
||||
analysis = json.loads(Path('graphify-out/.graphify_analysis.json').read_text(encoding=\"utf-8\"))
|
||||
|
||||
G = build_from_json(extraction)
|
||||
# root= as in Step 4 / the --update runbook (#1361) — same base for node-key parity.
|
||||
G = build_from_json(extraction, root='INPUT_PATH', directed=IS_DIRECTED)
|
||||
communities = {int(k): v for k, v in analysis['communities'].items()}
|
||||
cohesion = {int(k): v for k, v in analysis['cohesion'].items()}
|
||||
tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}
|
||||
@ -621,73 +506,9 @@ graphify export html # auto-aggregates to community view if graph > 5000 nodes
|
||||
# or: graphify export html --no-viz
|
||||
```
|
||||
|
||||
### Step 6b - Wiki (only if --wiki flag)
|
||||
### Steps 6b-8 - Wiki, Neo4j, FalkorDB, SVG, GraphML, MCP, benchmark (only on their flags)
|
||||
|
||||
**Only run this step if `--wiki` was explicitly given in the original command.**
|
||||
|
||||
Run this before Step 9 (cleanup) so `.graphify_labels.json` is still available.
|
||||
|
||||
```bash
|
||||
graphify export wiki
|
||||
```
|
||||
|
||||
### Step 7 - Neo4j export (only if --neo4j or --neo4j-push flag)
|
||||
|
||||
**If `--neo4j`** - generate a Cypher file for manual import:
|
||||
|
||||
```bash
|
||||
graphify export neo4j
|
||||
```
|
||||
|
||||
**If `--neo4j-push <uri>`** - push directly to a running Neo4j instance. Ask the user for credentials if not provided:
|
||||
|
||||
```bash
|
||||
graphify export neo4j --push bolt://localhost:7687 --user neo4j --password PASSWORD
|
||||
```
|
||||
|
||||
Default URI is `bolt://localhost:7687`, default user is `neo4j`. Uses MERGE - safe to re-run without creating duplicates.
|
||||
|
||||
### Step 7b - SVG export (only if --svg flag)
|
||||
|
||||
```bash
|
||||
graphify export svg
|
||||
```
|
||||
|
||||
### Step 7c - GraphML export (only if --graphml flag)
|
||||
|
||||
```bash
|
||||
graphify export graphml
|
||||
```
|
||||
|
||||
### Step 7d - MCP server (only if --mcp flag)
|
||||
|
||||
```bash
|
||||
python3 -m graphify.serve graphify-out/graph.json
|
||||
```
|
||||
|
||||
This starts a stdio MCP server that exposes tools: `query_graph`, `get_node`, `get_neighbors`, `get_community`, `god_nodes`, `graph_stats`, `shortest_path`. Add to Claude Desktop or any MCP-compatible agent orchestrator so other agents can query the graph live.
|
||||
|
||||
To configure in Claude Desktop, add to `claude_desktop_config.json`:
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"graphify": {
|
||||
"command": "python3",
|
||||
"args": ["-m", "graphify.serve", "/absolute/path/to/graphify-out/graph.json"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 8 - Token reduction benchmark (only if total_words > 5000)
|
||||
|
||||
If `total_words` from `graphify-out/.graphify_detect.json` is greater than 5,000, run:
|
||||
|
||||
```bash
|
||||
graphify benchmark
|
||||
```
|
||||
|
||||
Print the output directly in chat. If `total_words <= 5000`, skip silently - the graph value is structural clarity, not token compression, for small corpora.
|
||||
These run only when their flag is present (`--wiki`, `--neo4j`/`--neo4j-push`, `--falkordb`/`--falkordb-push`, `--svg`, `--graphml`, `--mcp`) or, for the token-reduction benchmark, when `total_words` exceeds 5,000. A default run with no export flags skips all of them. See `references/exports.md` for each one. Run any `--wiki` export before Step 9 cleanup so `.graphify_labels.json` is still available.
|
||||
|
||||
---
|
||||
|
||||
@ -704,7 +525,10 @@ from graphify.detect import save_manifest
|
||||
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
|
||||
# In --update mode, 'all_files' carries the full corpus; 'files' is the changed
|
||||
# subset. Full-rebuild mode populates only 'files', so the fallback handles that.
|
||||
save_manifest(detect.get('all_files') or detect['files'])
|
||||
# root= relativizes the manifest keys to the scan root (same base as the build),
|
||||
# so the on-disk manifest is portable across clones/machines and a later --update
|
||||
# matches cached files instead of missing every one (#1417).
|
||||
save_manifest(detect.get('all_files') or detect['files'], root='INPUT_PATH')
|
||||
|
||||
# Update cumulative cost tracker
|
||||
extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
|
||||
@ -730,10 +554,13 @@ cost_path.write_text(json.dumps(cost, indent=2, ensure_ascii=False), encoding=\"
|
||||
print(f'This run: {input_tok:,} input tokens, {output_tok:,} output tokens')
|
||||
print(f'All time: {cost[\"total_input_tokens\"]:,} input, {cost[\"total_output_tokens\"]:,} output ({len(cost[\"runs\"])} runs)')
|
||||
"
|
||||
rm -f graphify-out/.graphify_detect.json graphify-out/.graphify_extract.json graphify-out/.graphify_ast.json graphify-out/.graphify_semantic.json graphify-out/.graphify_analysis.json graphify-out/.graphify_chunk_*.json
|
||||
rm -f graphify-out/.graphify_detect.json graphify-out/.graphify_extract.json graphify-out/.graphify_ast.json graphify-out/.graphify_semantic.json graphify-out/.graphify_analysis.json
|
||||
find graphify-out -maxdepth 1 -name '.graphify_chunk_*.json' -delete 2>/dev/null
|
||||
rm -f graphify-out/.needs_update 2>/dev/null || true
|
||||
```
|
||||
|
||||
Replace INPUT_PATH with the actual path (same value used in Steps 4-5) so the manifest is relativized to the scan root.
|
||||
|
||||
Tell the user (omit the obsidian line unless --obsidian was given):
|
||||
```
|
||||
Graph complete. Outputs in PATH_TO_DIR/graphify-out/
|
||||
@ -783,325 +610,33 @@ if [ ! -f graphify-out/.graphify_python ]; then
|
||||
fi
|
||||
```
|
||||
|
||||
## For --update (incremental re-extraction)
|
||||
## For --update and --cluster-only
|
||||
|
||||
Use when you've added or modified files since the last run. Only re-extracts changed files - saves tokens and time.
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import sys, json
|
||||
from graphify.detect import detect_incremental, save_manifest
|
||||
from pathlib import Path
|
||||
|
||||
result = detect_incremental(Path('INPUT_PATH'))
|
||||
new_total = result.get('new_total', 0)
|
||||
print(json.dumps(result, indent=2, ensure_ascii=False))
|
||||
Path('graphify-out/.graphify_incremental.json').write_text(json.dumps(result, ensure_ascii=False), encoding=\"utf-8\")
|
||||
deleted = list(result.get('deleted_files', []))
|
||||
if new_total == 0 and not deleted:
|
||||
print('No files changed since last run. Nothing to update.')
|
||||
raise SystemExit(0)
|
||||
if deleted:
|
||||
print(f'{len(deleted)} deleted file(s) to prune.')
|
||||
if new_total > 0:
|
||||
print(f'{new_total} new/changed file(s) to re-extract.')
|
||||
"
|
||||
```
|
||||
|
||||
Then populate `.graphify_detect.json` so Steps 3A–6 (which read it unconditionally) see the right state for an incremental run. `files` carries the changed subset (drives Step 3A AST + Step 3B0 cache check on only what changed); `all_files` carries the full corpus for any step that needs corpus-wide context:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
r = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
|
||||
Path('graphify-out/.graphify_detect.json').write_text(json.dumps({
|
||||
'files': r.get('new_files', {}),
|
||||
'all_files': r.get('files', {}),
|
||||
'total_files': r.get('new_total', 0),
|
||||
'total_words': r.get('total_words', 0),
|
||||
'skipped_sensitive': r.get('skipped_sensitive', []),
|
||||
'needs_graph': True,
|
||||
}, ensure_ascii=False), encoding=\"utf-8\")
|
||||
"
|
||||
```
|
||||
|
||||
If new files exist, first check whether all changed files are code files:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
result = json.loads(open('graphify-out/.graphify_incremental.json', encoding='utf-8').read()) if Path('graphify-out/.graphify_incremental.json').exists() else {}
|
||||
code_exts = {'.py','.ts','.js','.go','.rs','.java','.cpp','.c','.rb','.swift','.kt','.cs','.scala','.php','.cc','.cxx','.hpp','.h','.kts','.lua','.toc','.f','.F','.f90','.F90','.f95','.F95','.f03','.F03','.f08','.F08'}
|
||||
new_files = result.get('new_files', {})
|
||||
all_changed = [f for files in new_files.values() for f in files]
|
||||
code_only = all(Path(f).suffix.lower() in code_exts for f in all_changed)
|
||||
print('code_only:', code_only)
|
||||
"
|
||||
```
|
||||
|
||||
If `code_only` is True: print `[graphify update] Code-only changes detected - skipping semantic extraction (no LLM needed)`, run only Step 3A (AST) on the changed files, skip Step 3B entirely (no subagents), then go straight to merge and Steps 4–8.
|
||||
|
||||
If `code_only` is False (any changed file is a doc/paper/image): run the full Steps 3A–3C pipeline as normal.
|
||||
|
||||
|
||||
If no new files exist (only deletions), create an empty extraction so the merge step can prune:
|
||||
|
||||
```bash
|
||||
if [ ! -f graphify-out/.graphify_extract.json ]; then
|
||||
echo '[graphify update] Only deletions -- creating empty extraction for merge.'
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
Path('graphify-out/.graphify_extract.json').write_text(json.dumps({'nodes':[],'edges':[],'hyperedges':[],'input_tokens':0,'output_tokens':0}), encoding='utf-8')
|
||||
"
|
||||
fi
|
||||
```
|
||||
|
||||
|
||||
Then:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
from graphify.build import build_merge
|
||||
from graphify.detect import save_manifest
|
||||
|
||||
# Load new extraction and incremental state
|
||||
new_extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
|
||||
incremental = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
|
||||
deleted = list(incremental.get('deleted_files', []))
|
||||
|
||||
# Use build_merge() — reads graph.json directly without NetworkX round-trip
|
||||
# so edge direction (calls, implements, imports) is always preserved (#801).
|
||||
G = build_merge(
|
||||
[new_extraction],
|
||||
graph_path='graphify-out/graph.json',
|
||||
prune_sources=deleted or None,
|
||||
)
|
||||
print(f'[graphify update] Merged: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges')
|
||||
|
||||
# Write merged result back to .graphify_extract.json so Step 4 sees the full graph
|
||||
merged_out = {
|
||||
'nodes': [{'id': n, **d} for n, d in G.nodes(data=True)],
|
||||
'edges': [
|
||||
# Explicit source/target last so they win over any stale attrs in d.
|
||||
{**{k: val for k, val in d.items() if k not in ('_src', '_tgt', 'source', 'target')},
|
||||
'source': d.get('_src', u), 'target': d.get('_tgt', v)}
|
||||
for u, v, d in G.edges(data=True)
|
||||
],
|
||||
# G.graph["hyperedges"] holds hyperedges from both existing graph.json
|
||||
# and new_extraction (build_merge combines them). Falling back to
|
||||
# new_extraction only would silently drop prior-run hyperedges (#801).
|
||||
'hyperedges': list(G.graph.get('hyperedges', [])),
|
||||
'input_tokens': new_extraction.get('input_tokens', 0),
|
||||
'output_tokens': new_extraction.get('output_tokens', 0),
|
||||
}
|
||||
Path('graphify-out/.graphify_extract.json').write_text(json.dumps(merged_out, ensure_ascii=False), encoding=\"utf-8\")
|
||||
print(f'[graphify update] Merged extraction written ({len(merged_out[\"nodes\"])} nodes, {len(merged_out[\"edges\"])} edges)')
|
||||
|
||||
# Save manifest so next --update diffs against today's state, not the
|
||||
# prior run's baseline (prevents ghost-node reports on subsequent updates).
|
||||
save_manifest(incremental['files'])
|
||||
print('[graphify update] Manifest saved.')
|
||||
"
|
||||
```
|
||||
|
||||
Then run Steps 4–8 on the merged graph as normal.
|
||||
|
||||
After Step 4, show the graph diff:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from graphify.analyze import graph_diff
|
||||
from graphify.build import build_from_json
|
||||
from networkx.readwrite import json_graph
|
||||
import networkx as nx
|
||||
from pathlib import Path
|
||||
|
||||
# Load old graph (before update) from backup written before merge
|
||||
old_data = json.loads(Path('graphify-out/.graphify_old.json').read_text(encoding=\"utf-8\")) if Path('graphify-out/.graphify_old.json').exists() else None
|
||||
new_extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
|
||||
G_new = build_from_json(new_extract)
|
||||
|
||||
if old_data:
|
||||
G_old = json_graph.node_link_graph(old_data, edges='links')
|
||||
diff = graph_diff(G_old, G_new)
|
||||
print(diff['summary'])
|
||||
if diff['new_nodes']:
|
||||
print('New nodes:', ', '.join(n['label'] for n in diff['new_nodes'][:5]))
|
||||
if diff['new_edges']:
|
||||
print('New edges:', len(diff['new_edges']))
|
||||
"
|
||||
```
|
||||
|
||||
Before the merge step, save the old graph: `cp graphify-out/graph.json graphify-out/.graphify_old.json`
|
||||
Clean up after: `rm -f graphify-out/.graphify_old.json`
|
||||
|
||||
---
|
||||
|
||||
## For --cluster-only
|
||||
|
||||
Skip Steps 1–3. Re-run clustering on the existing graph:
|
||||
|
||||
```bash
|
||||
graphify cluster-only .
|
||||
```
|
||||
|
||||
Then run Steps 5–9 as normal (label communities, generate viz, benchmark, clean up, report).
|
||||
Both are non-default subcommands. `--update` re-extracts only new or changed files; `--cluster-only` reruns clustering on the existing graph. See `references/update.md` for both flows.
|
||||
|
||||
---
|
||||
|
||||
## For /graphify query
|
||||
|
||||
Two traversal modes - choose based on the question:
|
||||
|
||||
| Mode | Flag | Best for |
|
||||
|------|------|----------|
|
||||
| BFS (default) | _(none)_ | "What is X connected to?" - broad context, nearest neighbors first |
|
||||
| DFS | `--dfs` | "How does X reach Y?" - trace a specific chain or dependency path |
|
||||
When `graphify-out/graph.json` already exists and the user asks a question about the corpus, answer from the graph rather than rebuilding it:
|
||||
|
||||
```bash
|
||||
graphify query "QUESTION"
|
||||
# or: graphify query "QUESTION" --dfs --budget 3000
|
||||
graphify query "<question>"
|
||||
```
|
||||
|
||||
Replace `QUESTION` with the user's actual question. Answer using **only** what the graph output contains. Quote `source_location` when citing a specific fact. If the graph lacks enough information, say so - do not hallucinate edges.
|
||||
|
||||
After writing the answer, save it back into the graph so it improves future queries:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify save-result --question "QUESTION" --answer "ANSWER" --type query --nodes NODE1 NODE2
|
||||
```
|
||||
|
||||
Replace `QUESTION` with the question, `ANSWER` with your full answer text, `SOURCE_NODES` with the list of node labels you cited. This closes the feedback loop: the next `--update` will extract this Q&A as a node in the graph.
|
||||
Before traversal, expand the question against the graph's own vocabulary so a wording mismatch does not collapse the answer to noise. If the `graphify query` CLI is unavailable, fall back to an inline NetworkX traversal of `graphify-out/graph.json`. Answer using only what the graph output contains, and quote `source_location` when citing a specific fact. For that vocab-expansion step, the BFS/DFS traversal modes, the `--budget` cap, the NetworkX fallback, `save-result` feedback, and the `/graphify path` and `/graphify explain` flows, see `references/query.md`.
|
||||
|
||||
---
|
||||
|
||||
## For /graphify path
|
||||
## For /graphify add and --watch
|
||||
|
||||
Find the shortest path between two named concepts in the graph.
|
||||
|
||||
```bash
|
||||
graphify path "NODE_A" "NODE_B"
|
||||
```
|
||||
|
||||
Replace `NODE_A` and `NODE_B` with the actual concept names. Then explain the path in plain language - what each hop means, why it's significant.
|
||||
|
||||
After writing the explanation, save it back:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Path from NODE_A to NODE_B" --answer "ANSWER" --type path_query --nodes NODE_A NODE_B
|
||||
```
|
||||
Neither is part of the default build. When the user runs `/graphify add <url>` to fetch a URL into the corpus, or passes `--watch` to auto-rebuild on file changes, see `references/add-watch.md`.
|
||||
|
||||
---
|
||||
|
||||
## For /graphify explain
|
||||
## For the commit hook and native CLAUDE.md integration
|
||||
|
||||
Give a plain-language explanation of a single node - everything connected to it.
|
||||
|
||||
```bash
|
||||
graphify explain "NODE_NAME"
|
||||
```
|
||||
|
||||
Replace `NODE_NAME` with the concept the user asked about. Then write a 3-5 sentence explanation of what this node is, what it connects to, and why those connections are significant. Use the source locations as citations.
|
||||
|
||||
After writing the explanation, save it back:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Explain NODE_NAME" --answer "ANSWER" --type explain --nodes NODE_NAME
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## For /graphify add
|
||||
|
||||
Fetch a URL and add it to the corpus, then update the graph.
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import sys
|
||||
from graphify.ingest import ingest
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
out = ingest('URL', Path('./raw'), author='AUTHOR', contributor='CONTRIBUTOR')
|
||||
print(f'Saved to {out}')
|
||||
except ValueError as e:
|
||||
print(f'error: {e}', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except RuntimeError as e:
|
||||
print(f'error: {e}', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
"
|
||||
```
|
||||
|
||||
Replace `URL` with the actual URL, `AUTHOR` with the user's name if provided, `CONTRIBUTOR` likewise. If the command exits with an error, tell the user what went wrong - do not silently continue. After a successful save, automatically run the `--update` pipeline on `./raw` to merge the new file into the existing graph.
|
||||
|
||||
Supported URL types (auto-detected):
|
||||
- YouTube / any video URL → audio downloaded via yt-dlp, transcribed to `.txt` on next run (requires `pip install 'graphifyy[video]'`)
|
||||
- Twitter/X → fetched via oEmbed, saved as `.md` with tweet text and author
|
||||
- arXiv → abstract + metadata saved as `.md`
|
||||
- PDF → downloaded as `.pdf`
|
||||
- Images (.png/.jpg/.webp) → downloaded, Claude vision extracts on next run
|
||||
- Any webpage → converted to markdown via html2text
|
||||
|
||||
---
|
||||
|
||||
## For --watch
|
||||
|
||||
Start a background watcher that monitors a folder and auto-updates the graph when files change.
|
||||
|
||||
```bash
|
||||
python3 -m graphify.watch INPUT_PATH --debounce 3
|
||||
```
|
||||
|
||||
Replace INPUT_PATH with the folder to watch. Behavior depends on what changed:
|
||||
|
||||
- **Code files only (.py, .ts, .go, etc.):** re-runs AST extraction + rebuild + cluster immediately, no LLM needed. `graph.json` and `GRAPH_REPORT.md` are updated automatically.
|
||||
- **Docs, papers, or images:** writes a `graphify-out/needs_update` flag and prints a notification to run `/graphify --update` (LLM semantic re-extraction required).
|
||||
|
||||
Debounce (default 3s): waits until file activity stops before triggering, so a wave of parallel agent writes doesn't trigger a rebuild per file.
|
||||
|
||||
Press Ctrl+C to stop.
|
||||
|
||||
For agentic workflows: run `--watch` in a background terminal. Code changes from agent waves are picked up automatically between waves. If agents are also writing docs or notes, you'll need a manual `/graphify --update` after those waves.
|
||||
|
||||
---
|
||||
|
||||
## For git commit hook
|
||||
|
||||
Install a post-commit hook that auto-rebuilds the graph after every commit. No background process needed - triggers once per commit, works with any editor.
|
||||
|
||||
```bash
|
||||
graphify hook install # install
|
||||
graphify hook uninstall # remove
|
||||
graphify hook status # check
|
||||
```
|
||||
|
||||
After every `git commit`, the hook detects which code files changed (via `git diff HEAD~1`), re-runs AST extraction on those files, and rebuilds `graph.json` and `GRAPH_REPORT.md`. Doc/image changes are ignored by the hook - run `/graphify --update` manually for those.
|
||||
|
||||
If a post-commit hook already exists, graphify appends to it rather than replacing it.
|
||||
|
||||
---
|
||||
|
||||
## For native CLAUDE.md integration
|
||||
|
||||
Run once per project to make graphify always-on in Claude Code sessions:
|
||||
|
||||
```bash
|
||||
graphify claude install
|
||||
```
|
||||
|
||||
This writes a `## graphify` section to the local `CLAUDE.md` that instructs Claude to check the graph before answering codebase questions and rebuild it after code changes. No manual `/graphify` needed in future sessions.
|
||||
|
||||
```bash
|
||||
graphify claude uninstall # remove the section
|
||||
```
|
||||
When the user asks to install the post-commit auto-rebuild hook or wire graphify into a project's CLAUDE.md, see `references/hooks.md`.
|
||||
|
||||
---
|
||||
|
||||
|
||||
56
skills/graphify/references/add-watch.md
Normal file
56
skills/graphify/references/add-watch.md
Normal file
@ -0,0 +1,56 @@
|
||||
# graphify reference: add a URL and watch a folder
|
||||
|
||||
Load this when the user ran `/graphify add <url>` or passed `--watch`. Neither is part of the default build.
|
||||
|
||||
## For /graphify add
|
||||
|
||||
Fetch a URL and add it to the corpus, then update the graph.
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import sys
|
||||
from graphify.ingest import ingest
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
out = ingest('URL', Path('./raw'), author='AUTHOR', contributor='CONTRIBUTOR')
|
||||
print(f'Saved to {out}')
|
||||
except ValueError as e:
|
||||
print(f'error: {e}', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except RuntimeError as e:
|
||||
print(f'error: {e}', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
"
|
||||
```
|
||||
|
||||
Replace `URL` with the actual URL, `AUTHOR` with the user's name if provided, `CONTRIBUTOR` likewise. If the command exits with an error, tell the user what went wrong - do not silently continue. After a successful save, automatically run the `--update` pipeline on `./raw` to merge the new file into the existing graph.
|
||||
|
||||
Supported URL types (auto-detected):
|
||||
- YouTube / any video URL → audio downloaded via yt-dlp, transcribed to `.txt` on next run (requires `pip install 'graphifyy[video]'`)
|
||||
- Twitter/X → fetched via oEmbed, saved as `.md` with tweet text and author
|
||||
- arXiv → abstract + metadata saved as `.md`
|
||||
- PDF → downloaded as `.pdf`
|
||||
- Images (.png/.jpg/.webp) → downloaded, Claude vision extracts on next run
|
||||
- Any webpage → converted to markdown via html2text
|
||||
|
||||
---
|
||||
|
||||
## For --watch
|
||||
|
||||
Start a background watcher that monitors a folder and auto-updates the graph when files change.
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify.watch INPUT_PATH --debounce 3
|
||||
```
|
||||
|
||||
Replace INPUT_PATH with the folder to watch. Behavior depends on what changed:
|
||||
|
||||
- **Code files only (.py, .ts, .go, etc.):** re-runs AST extraction + rebuild + cluster immediately, no LLM needed. `graph.json` and `GRAPH_REPORT.md` are updated automatically.
|
||||
- **Docs, papers, or images:** writes a `graphify-out/needs_update` flag and prints a notification to run `/graphify --update` (LLM semantic re-extraction required).
|
||||
|
||||
Debounce (default 3s): waits until file activity stops before triggering, so a wave of parallel agent writes doesn't trigger a rebuild per file.
|
||||
|
||||
Press Ctrl+C to stop.
|
||||
|
||||
For agentic workflows: run `--watch` in a background terminal. Code changes from agent waves are picked up automatically between waves. If agents are also writing docs or notes, you'll need a manual `/graphify --update` after those waves.
|
||||
87
skills/graphify/references/exports.md
Normal file
87
skills/graphify/references/exports.md
Normal file
@ -0,0 +1,87 @@
|
||||
# graphify reference: extra exports and benchmark
|
||||
|
||||
Load this when the user passed one of the export flags (`--wiki`, `--neo4j`, `--neo4j-push`, `--falkordb`, `--falkordb-push`, `--svg`, `--graphml`, `--mcp`), or when the corpus is large enough for the token-reduction benchmark. Each step runs only for its own flag.
|
||||
|
||||
### Step 6b - Wiki (only if --wiki flag)
|
||||
|
||||
**Only run this step if `--wiki` was explicitly given in the original command.**
|
||||
|
||||
Run this before Step 9 (cleanup) so `.graphify_labels.json` is still available.
|
||||
|
||||
```bash
|
||||
graphify export wiki
|
||||
```
|
||||
|
||||
### Step 7 - Neo4j export (only if --neo4j or --neo4j-push flag)
|
||||
|
||||
**If `--neo4j`** - generate a Cypher file for manual import:
|
||||
|
||||
```bash
|
||||
graphify export neo4j
|
||||
```
|
||||
|
||||
**If `--neo4j-push <uri>`** - push directly to a running Neo4j instance. Ask the user for credentials if not provided:
|
||||
|
||||
```bash
|
||||
graphify export neo4j --push bolt://localhost:7687 --user neo4j --password PASSWORD
|
||||
```
|
||||
|
||||
Default URI is `bolt://localhost:7687`, default user is `neo4j`. Uses MERGE - safe to re-run without creating duplicates.
|
||||
|
||||
### Step 7a - FalkorDB export (only if --falkordb or --falkordb-push flag)
|
||||
|
||||
**If `--falkordb`** - generate a Cypher file. The statements are OpenCypher, but FalkorDB's `GRAPH.QUERY` runs one statement at a time (no bulk script import like Neo4j's `cypher-shell`), so prefer `--falkordb-push` to load a graph. Use this only when you want the portable `cypher.txt` artifact:
|
||||
|
||||
```bash
|
||||
graphify export falkordb
|
||||
```
|
||||
|
||||
**If `--falkordb-push <uri>`** - push directly to a running FalkorDB instance. Credentials are optional; ask the user only if the instance requires auth:
|
||||
|
||||
```bash
|
||||
graphify export falkordb --push falkordb://localhost:6379
|
||||
```
|
||||
|
||||
Default URI is `falkordb://localhost:6379` (the scheme is informational - `redis://` or a bare `host:port` work too), auth is optional, and the target graph defaults to `graphify`. Uses MERGE - safe to re-run without creating duplicates.
|
||||
|
||||
### Step 7b - SVG export (only if --svg flag)
|
||||
|
||||
```bash
|
||||
graphify export svg
|
||||
```
|
||||
|
||||
### Step 7c - GraphML export (only if --graphml flag)
|
||||
|
||||
```bash
|
||||
graphify export graphml
|
||||
```
|
||||
|
||||
### Step 7d - MCP server (only if --mcp flag)
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify.serve graphify-out/graph.json
|
||||
```
|
||||
|
||||
This starts a stdio MCP server that exposes tools: `query_graph`, `get_node`, `get_neighbors`, `get_community`, `god_nodes`, `graph_stats`, `shortest_path`. Add to Claude Desktop or any MCP-compatible agent orchestrator so other agents can query the graph live.
|
||||
|
||||
To configure in Claude Desktop, add to `claude_desktop_config.json`. Claude Desktop can't run `$(...)`, and under `uv tool install` the system `python3` can't import graphify — so set `command` to the **absolute interpreter path** printed by `cat graphify-out/.graphify_python`:
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"graphify": {
|
||||
"command": "<absolute path from: cat graphify-out/.graphify_python>",
|
||||
"args": ["-m", "graphify.serve", "/absolute/path/to/graphify-out/graph.json"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 8 - Token reduction benchmark (only if total_words > 5000)
|
||||
|
||||
If `total_words` from `graphify-out/.graphify_detect.json` is greater than 5,000, run:
|
||||
|
||||
```bash
|
||||
graphify benchmark
|
||||
```
|
||||
|
||||
Print the output directly in chat. If `total_words <= 5000`, skip silently - the graph value is structural clarity, not token compression, for small corpora.
|
||||
70
skills/graphify/references/extraction-spec.md
Normal file
70
skills/graphify/references/extraction-spec.md
Normal file
@ -0,0 +1,70 @@
|
||||
# graphify reference: extraction subagent prompt
|
||||
|
||||
Load this in Step 3 Part B when the corpus has at least one doc, paper, or image chunk. A pure-code corpus skips Part B and never reads this file. Each semantic subagent receives the prompt below verbatim (substitute FILE_LIST, CHUNK_NUM, TOTAL_CHUNKS, DEEP_MODE, and CHUNK_PATH).
|
||||
|
||||
```
|
||||
You are a graphify extraction subagent. Read the files listed and extract a knowledge graph fragment.
|
||||
Output ONLY valid JSON matching the schema below - no explanation, no markdown fences, no preamble.
|
||||
|
||||
Files (chunk CHUNK_NUM of TOTAL_CHUNKS):
|
||||
FILE_LIST
|
||||
|
||||
Rules:
|
||||
- EXTRACTED: relationship explicit in source (import, call, citation, "see §3.2")
|
||||
- INFERRED: reasonable inference (shared data structure, implied dependency)
|
||||
- AMBIGUOUS: uncertain - flag for review, do not omit
|
||||
|
||||
Code files: focus on semantic edges AST cannot find (call relationships, shared data, arch patterns).
|
||||
Do not re-extract imports - AST already has those.
|
||||
Doc/paper files: extract named concepts, entities, citations. For rationale (WHY decisions were made, trade-offs, design intent): store as a `rationale` attribute on the relevant concept node — do NOT create a separate rationale node or fragment node. Only create a node for something that is itself a named entity or concept. Use `file_type:"rationale"` for concept-like nodes (ideas, principles, mechanisms, design patterns). `file_type` MUST be one of exactly these six values: `code`, `document`, `paper`, `image`, `rationale`, `concept`. Any other value is invalid and will be rejected.
|
||||
Code files: when adding `calls` edges, source MUST be the caller (the function/class doing the calling), target MUST be the callee. Never reverse this direction. `calls` edges MUST stay within one language: a Python function cannot `calls` a JS/TS/Go/Rust/Java symbol and vice versa — cross-language call edges are phantom artifacts, never emit them.
|
||||
Image files: use vision to understand what the image IS - do not just OCR.
|
||||
UI screenshot: layout patterns, design decisions, key elements, purpose.
|
||||
Chart: metric, trend/insight, data source.
|
||||
Tweet/post: claim as node, author, concepts mentioned.
|
||||
Diagram: components and connections.
|
||||
Research figure: what it demonstrates, method, result.
|
||||
Handwritten/whiteboard: ideas and arrows, mark uncertain readings AMBIGUOUS.
|
||||
|
||||
DEEP_MODE (if --mode deep was given): be aggressive with INFERRED edges - indirect deps,
|
||||
shared assumptions, latent couplings. Mark uncertain ones AMBIGUOUS instead of omitting.
|
||||
|
||||
Semantic similarity: if two concepts in this chunk solve the same problem or represent the same idea without any structural link (no import, no call, no citation), add a `semantically_similar_to` edge marked INFERRED with a confidence_score reflecting how similar they are (0.6-0.95). Examples:
|
||||
- Two functions that both validate user input but never call each other
|
||||
- A class in code and a concept in a paper that describe the same algorithm
|
||||
- Two error types that handle the same failure mode differently
|
||||
Only add these when the similarity is genuinely non-obvious and cross-cutting. Do not add them for trivially similar things.
|
||||
|
||||
Hyperedges: if 3 or more nodes clearly participate together in a shared concept, flow, or pattern that is not captured by pairwise edges alone, add a hyperedge to a top-level `hyperedges` array. Examples:
|
||||
- All classes that implement a common protocol or interface
|
||||
- All functions in an authentication flow (even if they don't all call each other)
|
||||
- All concepts from a paper section that form one coherent idea
|
||||
Use sparingly — only when the group relationship adds information beyond the pairwise edges. Maximum 3 hyperedges per chunk.
|
||||
|
||||
If a file has YAML frontmatter (--- ... ---), copy source_url, captured_at, author,
|
||||
contributor onto every node from that file.
|
||||
|
||||
confidence_score is REQUIRED on every edge - never omit it, never use 0.5 as a default:
|
||||
- EXTRACTED edges: confidence_score = 1.0 always
|
||||
- INFERRED edges: pick exactly ONE value from this set — never 0.5:
|
||||
0.95 direct structural evidence (shared data structure, named cross-file reference).
|
||||
0.85 strong inference (clear functional alignment, no direct symbol link).
|
||||
0.75 reasonable inference (shared problem domain + similar shape, requires interpretation).
|
||||
0.65 weak inference (thematically related, no shape evidence).
|
||||
0.55 speculative but plausible (surface-level co-occurrence only).
|
||||
Models follow discrete rubrics better than continuous ranges; the bimodal
|
||||
distribution observed in production (>50% at 0.5, >40% at 0.85+) shows the
|
||||
range guidance is being collapsed to a binary. If no value above fits, mark
|
||||
the edge AMBIGUOUS rather than picking 0.4 or below.
|
||||
- AMBIGUOUS edges: 0.1-0.3
|
||||
|
||||
Node ID format: lowercase, only `[a-z0-9_]`, no dots or slashes. Format: `{stem}_{entity}` where stem is `{parent_dir}_{filename_without_ext}` (the **immediate** parent directory name + the filename stem, both lowercased with non-alphanumeric chars replaced by `_`) and entity is the symbol name similarly normalized. Only one level of parent is used — not the full path. Examples: `src/auth/session.py` + `ValidateToken` → `auth_session_validatetoken`; `lib/utils/helpers.py` + `parse_url` → `utils_helpers_parse_url`; `tests/test_foo.py` + `_helper` → `tests_test_foo_helper`. Top-level files (no parent dir, e.g. `setup.py`) use just the filename stem: `setup_my_func`. This must match the ID the AST extractor generates — using just the filename (e.g., `session_validatetoken`) or the full path (e.g., `src_auth_session_validatetoken`) will create orphan ghost-duplicate nodes. If you are re-extracting a project that had ghost duplicates under the old format, the user should run `graphify extract --force` to rebuild cleanly. CRITICAL: never append chunk numbers, sequence numbers, or any suffix to an ID (no `_c1`, `_c2`, `_chunk2`, etc.). IDs must be deterministic from the label alone — the same entity must always produce the same ID regardless of which chunk processes it.
|
||||
|
||||
Generate the extraction JSON matching this schema exactly:
|
||||
{"nodes":[{"id":"auth_session_validatetoken","label":"Human Readable Name","file_type":"code|document|paper|image|rationale|concept","source_file":"<FILE_LIST path verbatim>","source_location":null,"source_url":null,"captured_at":null,"author":null,"contributor":null}],"edges":[{"source":"node_id","target":"node_id","relation":"calls|implements|references|cites|conceptually_related_to|shares_data_with|semantically_similar_to|rationale_for","confidence":"EXTRACTED|INFERRED|AMBIGUOUS","confidence_score":1.0,"source_file":"<FILE_LIST path verbatim>","source_location":null,"weight":1.0}],"hyperedges":[{"id":"snake_case_id","label":"Human Readable Label","nodes":["node_id1","node_id2","node_id3"],"relation":"participate_in|implement|form","confidence":"EXTRACTED|INFERRED","confidence_score":0.75,"source_file":"<FILE_LIST path verbatim>"}],"input_tokens":0,"output_tokens":0}
|
||||
|
||||
source_file RULE (every node, edge, and hyperedge): set source_file to the path of the originating file EXACTLY as it appears in FILE_LIST — verbatim and absolute. Do NOT shorten to a basename, do NOT re-relativize, do NOT strip any directory prefix, and do NOT change separators (the engine canonicalizes separators and relativizes against the build root downstream). Copy the FILE_LIST entry character-for-character. This keeps the full build and incremental --update on the same base, so build_merge's replace-on-re-extract matches the existing node instead of accumulating a duplicate.
|
||||
|
||||
Then write the JSON to disk using the Write tool at this exact absolute path (no relative paths — Write resolves relative paths against an undefined cwd and the file will be silently lost):
|
||||
CHUNK_PATH
|
||||
```
|
||||
46
skills/graphify/references/github-and-merge.md
Normal file
46
skills/graphify/references/github-and-merge.md
Normal file
@ -0,0 +1,46 @@
|
||||
# graphify reference: GitHub clone and cross-repo merge
|
||||
|
||||
Load this when the user passed one or more `https://github.com/...` URLs, or named several local subfolders to merge into one graph.
|
||||
|
||||
### Step 0 - Clone GitHub repo(s) (only if a GitHub URL was given)
|
||||
|
||||
**Single repo:**
|
||||
```bash
|
||||
LOCAL_PATH=$(graphify clone <github-url> [--branch <branch>])
|
||||
# Use LOCAL_PATH as the target for all subsequent steps
|
||||
```
|
||||
|
||||
**Multiple repos (cross-repo graph):**
|
||||
```bash
|
||||
# Clone each repo, run the full pipeline on each, then merge
|
||||
graphify clone <url1> # → ~/.graphify/repos/<owner1>/<repo1>
|
||||
graphify clone <url2> # → ~/.graphify/repos/<owner2>/<repo2>
|
||||
# Run /graphify on each local path to produce their graph.json files
|
||||
# Then merge:
|
||||
graphify merge-graphs \
|
||||
~/.graphify/repos/<owner1>/<repo1>/graphify-out/graph.json \
|
||||
~/.graphify/repos/<owner2>/<repo2>/graphify-out/graph.json \
|
||||
--out graphify-out/cross-repo-graph.json
|
||||
```
|
||||
|
||||
Graphify clones into `~/.graphify/repos/<owner>/<repo>` and reuses existing clones on repeat runs. Each node in the merged graph carries a `repo` attribute so you can filter by origin.
|
||||
|
||||
**Multiple local subfolders (monorepo or multi-service layout):**
|
||||
|
||||
The skill pipeline writes all intermediate and final outputs to `graphify-out/` in the current working directory. Running the skill on each subfolder separately will clobber the same output dir. Instead, use the CLI directly for each subfolder — it places `graphify-out/` *inside* the scanned path:
|
||||
|
||||
```bash
|
||||
graphify extract ./core/ # → ./core/graphify-out/graph.json
|
||||
graphify extract ./service/ # → ./service/graphify-out/graph.json
|
||||
graphify extract ./platform/ # → ./platform/graphify-out/graph.json
|
||||
# Add --backend gemini|kimi|openai|deepseek|claude-cli depending on which API key you have set
|
||||
|
||||
# Then merge at the project root:
|
||||
graphify merge-graphs \
|
||||
./core/graphify-out/graph.json \
|
||||
./service/graphify-out/graph.json \
|
||||
./platform/graphify-out/graph.json \
|
||||
--out graphify-out/graph.json
|
||||
```
|
||||
|
||||
Once `graphify-out/graph.json` exists, the fast path above takes over: any codebase question runs `graphify query` directly on the merged graph — no re-extraction, no size gate.
|
||||
33
skills/graphify/references/hooks.md
Normal file
33
skills/graphify/references/hooks.md
Normal file
@ -0,0 +1,33 @@
|
||||
# graphify reference: commit hook and native CLAUDE.md integration
|
||||
|
||||
Load this when the user asked to install the post-commit hook or wire graphify into a project's CLAUDE.md.
|
||||
|
||||
## For git commit hook
|
||||
|
||||
Install a post-commit hook that auto-rebuilds the graph after every commit. No background process needed - triggers once per commit, works with any editor.
|
||||
|
||||
```bash
|
||||
graphify hook install # install
|
||||
graphify hook uninstall # remove
|
||||
graphify hook status # check
|
||||
```
|
||||
|
||||
After every `git commit`, the hook detects which code files changed (via `git diff HEAD~1`), re-runs AST extraction on those files, and rebuilds `graph.json` and `GRAPH_REPORT.md`. Doc/image changes are ignored by the hook - run `/graphify --update` manually for those.
|
||||
|
||||
If a post-commit hook already exists, graphify appends to it rather than replacing it.
|
||||
|
||||
---
|
||||
|
||||
## For native CLAUDE.md integration
|
||||
|
||||
Run once per project to make graphify always-on in Claude Code sessions:
|
||||
|
||||
```bash
|
||||
graphify claude install
|
||||
```
|
||||
|
||||
This writes a `## graphify` section to the local `CLAUDE.md` that instructs Claude to check the graph before answering codebase questions and rebuild it after code changes. No manual `/graphify` needed in future sessions.
|
||||
|
||||
```bash
|
||||
graphify claude uninstall # remove the section
|
||||
```
|
||||
303
skills/graphify/references/query.md
Normal file
303
skills/graphify/references/query.md
Normal file
@ -0,0 +1,303 @@
|
||||
# graphify reference: query, path, explain
|
||||
|
||||
Load this when the user asks a question against an existing graph, or runs `/graphify path` or `/graphify explain`. The core's query stub points here for the full traversal flow. These flows use the `graphify query` CLI when it is available and fall back to an inline NetworkX traversal otherwise.
|
||||
|
||||
Two traversal modes - choose based on the question:
|
||||
|
||||
| Mode | Flag | Best for |
|
||||
|------|------|----------|
|
||||
| BFS (default) | _(none)_ | "What is X connected to?" - broad context, nearest neighbors first |
|
||||
| DFS | `--dfs` | "How does X reach Y?" - trace a specific chain or dependency path |
|
||||
|
||||
First check the graph exists:
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
from pathlib import Path
|
||||
if not Path('graphify-out/graph.json').exists():
|
||||
print('ERROR: No graph found. Run /graphify <path> first to build the graph.')
|
||||
raise SystemExit(1)
|
||||
"
|
||||
```
|
||||
If it fails, stop and tell the user to run `/graphify <path>` first.
|
||||
|
||||
### Step 0 — Constrained query expansion (REQUIRED before traversal)
|
||||
|
||||
graphify's `query` CLI matches nodes via case-folded substring + IDF — there is **no stemming, no synonyms, no cross-language match** inside the binary, and the inline fallback below matches the same way. If the user's question uses different language or different domain vocabulary than the graph's labels (user says "обработчик" / graph says "handler"; user says "authentication" / graph says "Guardian"), the literal matcher returns 0 hits and the answer collapses to noise.
|
||||
|
||||
Fix this **without inventing tokens** by expanding the query against the actual graph vocabulary first:
|
||||
|
||||
1. Extract the token vocabulary from node labels:
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json, re
|
||||
from pathlib import Path
|
||||
data = json.loads(Path('graphify-out/graph.json').read_text())
|
||||
vocab = set()
|
||||
for n in data['nodes']:
|
||||
for c in re.findall(r'[^\W\d_]+', n.get('label','') or '', re.UNICODE):
|
||||
parts = re.findall(r'[A-Z]+(?=[A-Z][a-z])|[A-Z]?[a-z]+|[A-Z]+', c) or [c]
|
||||
for p in parts:
|
||||
t = p.lower()
|
||||
if 3 <= len(t) <= 30:
|
||||
vocab.add(t)
|
||||
Path('graphify-out/.vocab.txt').write_text('\n'.join(sorted(vocab)))
|
||||
print(f'vocab: {len(vocab)} tokens')
|
||||
"
|
||||
```
|
||||
|
||||
2. Read `graphify-out/.vocab.txt`. Then for the user's question, select **up to 12 tokens from this exact list** that semantically match the query intent. Hard constraints:
|
||||
- You MUST pick only tokens present in the vocabulary file. Do NOT invent tokens.
|
||||
- If a query concept has no plausible token in the vocab, skip it — do not substitute a near-synonym from training memory.
|
||||
- If **no** vocab tokens match the query at all, output an empty list and tell the user the corpus has no relevant vocabulary for this question. Do not fabricate a search.
|
||||
- Translate cross-language: Russian "аутентификация" → look for `auth`, `credential`, `token`, `security` IFF present in vocab.
|
||||
- Morphology: "handlers" maps to `handler` IFF present; "todos" maps to `todo` IFF present.
|
||||
|
||||
3. Print the selection explicitly to the user before running the query, so the expansion is auditable:
|
||||
```
|
||||
Query expanded to (from graph vocab, N tokens): [token1, token2, ...]
|
||||
```
|
||||
If the list is empty, say so plainly and stop — do not proceed to traversal.
|
||||
|
||||
### Step 1 — Traversal
|
||||
|
||||
Build the **expanded query string** by joining the selected tokens with spaces. Use this string as `QUESTION` below — NOT the original user question. (The original question is preserved only for `save-result` at the end.)
|
||||
|
||||
Prefer the CLI when it is installed:
|
||||
```bash
|
||||
graphify query "QUESTION"
|
||||
# or: graphify query "QUESTION" --dfs --budget 3000
|
||||
```
|
||||
|
||||
If the CLI is unavailable, load `graphify-out/graph.json` and run the traversal inline:
|
||||
|
||||
1. Find the 1-3 nodes whose label best matches the expanded tokens.
|
||||
2. Run the appropriate traversal from each starting node.
|
||||
3. Read the subgraph - node labels, edge relations, confidence tags, source locations.
|
||||
4. Answer using **only** what the graph contains. Quote `source_location` when citing a specific fact.
|
||||
5. If the graph lacks enough information, say so - do not hallucinate edges.
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import sys, json
|
||||
from networkx.readwrite import json_graph
|
||||
import networkx as nx
|
||||
from pathlib import Path
|
||||
|
||||
data = json.loads(Path('graphify-out/graph.json').read_text())
|
||||
G = json_graph.node_link_graph(data, edges='links')
|
||||
|
||||
question = 'QUESTION'
|
||||
mode = 'MODE' # 'bfs' or 'dfs'
|
||||
terms = [t.lower() for t in question.split() if len(t) >= 3] # match the vocab threshold; keeps api/jwt/ios (#1392)
|
||||
|
||||
# Find best-matching start nodes
|
||||
scored = []
|
||||
for nid, ndata in G.nodes(data=True):
|
||||
label = ndata.get('label', '').lower()
|
||||
score = sum(1 for t in terms if t in label)
|
||||
if score > 0:
|
||||
scored.append((score, nid))
|
||||
scored.sort(reverse=True)
|
||||
start_nodes = [nid for _, nid in scored[:3]]
|
||||
|
||||
if not start_nodes:
|
||||
print('No matching nodes found for query terms:', terms)
|
||||
sys.exit(0)
|
||||
|
||||
subgraph_nodes = set()
|
||||
subgraph_edges = []
|
||||
|
||||
if mode == 'dfs':
|
||||
# DFS: follow one path as deep as possible before backtracking.
|
||||
# Depth-limited to 6 to avoid traversing the whole graph.
|
||||
visited = set()
|
||||
stack = [(n, 0) for n in reversed(start_nodes)]
|
||||
while stack:
|
||||
node, depth = stack.pop()
|
||||
if node in visited or depth > 6:
|
||||
continue
|
||||
visited.add(node)
|
||||
subgraph_nodes.add(node)
|
||||
for neighbor in G.neighbors(node):
|
||||
if neighbor not in visited:
|
||||
stack.append((neighbor, depth + 1))
|
||||
subgraph_edges.append((node, neighbor))
|
||||
else:
|
||||
# BFS: explore all neighbors layer by layer up to depth 3.
|
||||
frontier = set(start_nodes)
|
||||
subgraph_nodes = set(start_nodes)
|
||||
for _ in range(3):
|
||||
next_frontier = set()
|
||||
for n in frontier:
|
||||
for neighbor in G.neighbors(n):
|
||||
if neighbor not in subgraph_nodes:
|
||||
next_frontier.add(neighbor)
|
||||
subgraph_edges.append((n, neighbor))
|
||||
subgraph_nodes.update(next_frontier)
|
||||
frontier = next_frontier
|
||||
|
||||
# Token-budget aware output: rank by relevance, cut at budget (~4 chars/token)
|
||||
token_budget = BUDGET # default 2000
|
||||
char_budget = token_budget * 4
|
||||
|
||||
# Score each node by term overlap for ranked output
|
||||
def relevance(nid):
|
||||
label = G.nodes[nid].get('label', '').lower()
|
||||
return sum(1 for t in terms if t in label)
|
||||
|
||||
ranked_nodes = sorted(subgraph_nodes, key=relevance, reverse=True)
|
||||
|
||||
lines = [f'Traversal: {mode.upper()} | Start: {[G.nodes[n].get(\"label\",n) for n in start_nodes]} | {len(subgraph_nodes)} nodes']
|
||||
for nid in ranked_nodes:
|
||||
d = G.nodes[nid]
|
||||
lines.append(f' NODE {d.get(\"label\", nid)} [src={d.get(\"source_file\",\"\")} loc={d.get(\"source_location\",\"\")}]')
|
||||
for u, v in subgraph_edges:
|
||||
if u in subgraph_nodes and v in subgraph_nodes:
|
||||
_raw = G[u][v]; d = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
|
||||
lines.append(f' EDGE {G.nodes[u].get(\"label\",u)} --{d.get(\"relation\",\"\")} [{d.get(\"confidence\",\"\")}]--> {G.nodes[v].get(\"label\",v)}')
|
||||
|
||||
output = '\n'.join(lines)
|
||||
if len(output) > char_budget:
|
||||
output = output[:char_budget] + f'\n... (truncated at ~{token_budget} token budget - use --budget N for more)'
|
||||
print(output)
|
||||
"
|
||||
```
|
||||
|
||||
Replace `QUESTION` with the **expanded** query string, `MODE` with `bfs` or `dfs`, and `BUDGET` with the token budget (default `2000`, or whatever `--budget N` specifies). Then answer based on the subgraph output above, using only what the graph contains.
|
||||
|
||||
After writing the answer, save it back into the graph so it improves future queries. Include the expanded tokens inside the `--answer` text (e.g. `"Expanded from original query via vocab: [tokens]. Then traversed..."`) so the next `--update` extracts the expansion history as a graph node:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify save-result --question "ORIGINAL_QUESTION" --answer "ANSWER" --type query --nodes NODE1 NODE2
|
||||
```
|
||||
|
||||
Replace `ORIGINAL_QUESTION` with the user's verbatim question, `ANSWER` with your full answer text (containing the expanded-token trace), `NODE1 NODE2` with the list of node labels you cited. This closes the feedback loop: the next `--update` will extract this Q&A as a node in the graph.
|
||||
|
||||
---
|
||||
|
||||
## For /graphify path
|
||||
|
||||
Find the shortest path between two named concepts in the graph. Prefer the CLI when installed:
|
||||
|
||||
```bash
|
||||
graphify path "NODE_A" "NODE_B"
|
||||
```
|
||||
|
||||
If the CLI is unavailable, run it inline:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json, sys
|
||||
import networkx as nx
|
||||
from networkx.readwrite import json_graph
|
||||
from pathlib import Path
|
||||
|
||||
data = json.loads(Path('graphify-out/graph.json').read_text())
|
||||
G = json_graph.node_link_graph(data, edges='links')
|
||||
|
||||
a_term = 'NODE_A'
|
||||
b_term = 'NODE_B'
|
||||
|
||||
def find_node(term):
|
||||
term = term.lower()
|
||||
scored = sorted(
|
||||
[(sum(1 for w in term.split() if w in G.nodes[n].get('label','').lower()), n)
|
||||
for n in G.nodes()],
|
||||
reverse=True
|
||||
)
|
||||
return scored[0][1] if scored and scored[0][0] > 0 else None
|
||||
|
||||
src = find_node(a_term)
|
||||
tgt = find_node(b_term)
|
||||
|
||||
if not src or not tgt:
|
||||
print(f'Could not find nodes matching: {a_term!r} or {b_term!r}')
|
||||
sys.exit(0)
|
||||
|
||||
try:
|
||||
path = nx.shortest_path(G, src, tgt)
|
||||
print(f'Shortest path ({len(path)-1} hops):')
|
||||
for i, nid in enumerate(path):
|
||||
label = G.nodes[nid].get('label', nid)
|
||||
if i < len(path) - 1:
|
||||
_raw = G[nid][path[i+1]]; edge = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
|
||||
rel = edge.get('relation', '')
|
||||
conf = edge.get('confidence', '')
|
||||
print(f' {label} --{rel}--> [{conf}]')
|
||||
else:
|
||||
print(f' {label}')
|
||||
except nx.NetworkXNoPath:
|
||||
print(f'No path found between {a_term!r} and {b_term!r}')
|
||||
except nx.NodeNotFound as e:
|
||||
print(f'Node not found: {e}')
|
||||
"
|
||||
```
|
||||
|
||||
Replace `NODE_A` and `NODE_B` with the actual concept names from the user. Then explain the path in plain language - what each hop means, why it's significant.
|
||||
|
||||
After writing the explanation, save it back:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Path from NODE_A to NODE_B" --answer "ANSWER" --type path_query --nodes NODE_A NODE_B
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## For /graphify explain
|
||||
|
||||
Give a plain-language explanation of a single node - everything connected to it. Prefer the CLI when installed:
|
||||
|
||||
```bash
|
||||
graphify explain "NODE_NAME"
|
||||
```
|
||||
|
||||
If the CLI is unavailable, run it inline:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json, sys
|
||||
import networkx as nx
|
||||
from networkx.readwrite import json_graph
|
||||
from pathlib import Path
|
||||
|
||||
data = json.loads(Path('graphify-out/graph.json').read_text())
|
||||
G = json_graph.node_link_graph(data, edges='links')
|
||||
|
||||
term = 'NODE_NAME'
|
||||
term_lower = term.lower()
|
||||
|
||||
# Find best matching node
|
||||
scored = sorted(
|
||||
[(sum(1 for w in term_lower.split() if w in G.nodes[n].get('label','').lower()), n)
|
||||
for n in G.nodes()],
|
||||
reverse=True
|
||||
)
|
||||
if not scored or scored[0][0] == 0:
|
||||
print(f'No node matching {term!r}')
|
||||
sys.exit(0)
|
||||
|
||||
nid = scored[0][1]
|
||||
data_n = G.nodes[nid]
|
||||
print(f'NODE: {data_n.get(\"label\", nid)}')
|
||||
print(f' source: {data_n.get(\"source_file\",\"unknown\")}')
|
||||
print(f' type: {data_n.get(\"file_type\",\"unknown\")}')
|
||||
print(f' degree: {G.degree(nid)}')
|
||||
print()
|
||||
print('CONNECTIONS:')
|
||||
for neighbor in G.neighbors(nid):
|
||||
_raw = G[nid][neighbor]; edge = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
|
||||
nlabel = G.nodes[neighbor].get('label', neighbor)
|
||||
rel = edge.get('relation', '')
|
||||
conf = edge.get('confidence', '')
|
||||
src_file = G.nodes[neighbor].get('source_file', '')
|
||||
print(f' --{rel}--> {nlabel} [{conf}] ({src_file})')
|
||||
"
|
||||
```
|
||||
|
||||
Replace `NODE_NAME` with the concept the user asked about. Then write a 3-5 sentence explanation of what this node is, what it connects to, and why those connections are significant. Use the source locations as citations.
|
||||
|
||||
After writing the explanation, save it back:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Explain NODE_NAME" --answer "ANSWER" --type explain --nodes NODE_NAME
|
||||
```
|
||||
52
skills/graphify/references/transcribe.md
Normal file
52
skills/graphify/references/transcribe.md
Normal file
@ -0,0 +1,52 @@
|
||||
# graphify reference: transcribe video and audio
|
||||
|
||||
Load this only when `detect` reported one or more `video` files. A corpus with no video never reads this.
|
||||
|
||||
### Step 2.5 - Transcribe video / audio files (only if video files detected)
|
||||
|
||||
Skip this step entirely if `detect` returned zero `video` files.
|
||||
|
||||
Video and audio files cannot be read directly. Transcribe them to text first, then treat the transcripts as doc files in Step 3.
|
||||
|
||||
**Strategy:** Read the god nodes from `graphify-out/.graphify_detect.json` (or the analysis file if it exists from a previous run). You are already a language model — write a one-sentence domain hint yourself from those labels. Then pass it to Whisper as the initial prompt. No separate API call needed.
|
||||
|
||||
**However**, if the corpus has *only* video files and no other docs/code, use the generic fallback prompt: `"Use proper punctuation and paragraph breaks."`
|
||||
|
||||
**Step 1 - Write the Whisper prompt yourself.**
|
||||
|
||||
Read the top god node labels from detect output or analysis, then compose a short domain hint sentence, for example:
|
||||
|
||||
- Labels: `transformer, attention, encoder, decoder` → `"Machine learning research on transformer architectures and attention mechanisms. Use proper punctuation and paragraph breaks."`
|
||||
- Labels: `kubernetes, deployment, pod, helm` → `"DevOps discussion about Kubernetes deployments and Helm charts. Use proper punctuation and paragraph breaks."`
|
||||
|
||||
**Export** it as `GRAPHIFY_WHISPER_PROMPT` (the exact name the transcriber reads — and it must be `export`ed so the child Python process sees it) for the next command.
|
||||
|
||||
**Step 2 - Transcribe:**
|
||||
|
||||
```bash
|
||||
export GRAPHIFY_WHISPER_MODEL=base # or whatever --whisper-model the user passed (must be exported)
|
||||
export GRAPHIFY_WHISPER_PROMPT="<the one-sentence domain hint you composed in Step 1>"
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json, os, sys
|
||||
from pathlib import Path
|
||||
from graphify.transcribe import transcribe_all
|
||||
|
||||
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
|
||||
video_files = detect.get('files', {}).get('video', [])
|
||||
prompt = os.environ.get('GRAPHIFY_WHISPER_PROMPT', 'Use proper punctuation and paragraph breaks.')
|
||||
|
||||
transcript_paths = transcribe_all(video_files, initial_prompt=prompt)
|
||||
# Write the JSON from Python (NOT a shell '>' redirect): transcribe_all/Whisper
|
||||
# print progress to stdout, which would otherwise corrupt the JSON file (#1392).
|
||||
Path('graphify-out/.graphify_transcripts.json').write_text(json.dumps(transcript_paths, ensure_ascii=False), encoding=\"utf-8\")
|
||||
print(f'Transcribed {len(transcript_paths)} file(s)', file=sys.stderr)
|
||||
"
|
||||
```
|
||||
|
||||
After transcription:
|
||||
- Read the transcript paths from `graphify-out/.graphify_transcripts.json`
|
||||
- Add them to the docs list before dispatching semantic subagents in Step 3B
|
||||
- Print how many transcripts were created: `Transcribed N video file(s) -> treating as docs`
|
||||
- If transcription fails for a file, print a warning and continue with the rest
|
||||
|
||||
**Whisper model:** Default is `base`. If the user passed `--whisper-model <name>`, `export GRAPHIFY_WHISPER_MODEL=<name>` (it must be exported, not just assigned) before running the command above.
|
||||
192
skills/graphify/references/update.md
Normal file
192
skills/graphify/references/update.md
Normal file
@ -0,0 +1,192 @@
|
||||
# graphify reference: incremental update and cluster-only
|
||||
|
||||
Load this only when the user passed `--update` or `--cluster-only`. A first-time full build never reads this file.
|
||||
|
||||
## For --update (incremental re-extraction)
|
||||
|
||||
Use when you've added or modified files since the last run. Only re-extracts changed files - saves tokens and time.
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import sys, json
|
||||
from graphify.detect import detect_incremental, save_manifest
|
||||
from pathlib import Path
|
||||
|
||||
result = detect_incremental(Path('INPUT_PATH'))
|
||||
new_total = result.get('new_total', 0)
|
||||
print(json.dumps(result, indent=2, ensure_ascii=False))
|
||||
Path('graphify-out/.graphify_incremental.json').write_text(json.dumps(result, ensure_ascii=False), encoding=\"utf-8\")
|
||||
deleted = list(result.get('deleted_files', []))
|
||||
if new_total == 0 and not deleted:
|
||||
print('No files changed since last run. Nothing to update.')
|
||||
raise SystemExit(0)
|
||||
if deleted:
|
||||
print(f'{len(deleted)} deleted file(s) to prune.')
|
||||
if new_total > 0:
|
||||
print(f'{new_total} new/changed file(s) to re-extract.')
|
||||
"
|
||||
```
|
||||
|
||||
Then populate `.graphify_detect.json` so Steps 3A–6 (which read it unconditionally) see the right state for an incremental run. `files` carries the changed subset (drives Step 3A AST + Step 3B0 cache check on only what changed); `all_files` carries the full corpus for any step that needs corpus-wide context:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
r = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
|
||||
Path('graphify-out/.graphify_detect.json').write_text(json.dumps({
|
||||
'files': r.get('new_files', {}),
|
||||
'all_files': r.get('files', {}),
|
||||
'total_files': r.get('new_total', 0),
|
||||
'total_words': r.get('total_words', 0),
|
||||
'skipped_sensitive': r.get('skipped_sensitive', []),
|
||||
'needs_graph': True,
|
||||
}, ensure_ascii=False), encoding=\"utf-8\")
|
||||
"
|
||||
```
|
||||
|
||||
If new files exist, first check whether all changed files are code files:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
result = json.loads(open('graphify-out/.graphify_incremental.json', encoding='utf-8').read()) if Path('graphify-out/.graphify_incremental.json').exists() else {}
|
||||
code_exts = {'.py','.ts','.js','.go','.rs','.java','.cpp','.c','.rb','.swift','.kt','.cs','.scala','.php','.cc','.cxx','.hpp','.h','.kts','.lua','.toc','.f','.F','.f90','.F90','.f95','.F95','.f03','.F03','.f08','.F08'}
|
||||
new_files = result.get('new_files', {})
|
||||
all_changed = [f for files in new_files.values() for f in files]
|
||||
code_only = all(Path(f).suffix.lower() in code_exts for f in all_changed)
|
||||
print('code_only:', code_only)
|
||||
"
|
||||
```
|
||||
|
||||
If `code_only` is True: print `[graphify update] Code-only changes detected - skipping semantic extraction (no LLM needed)`, run only Step 3A (AST) on the changed files, skip Step 3B entirely (no subagents), then go straight to merge and Steps 4–8.
|
||||
|
||||
If `code_only` is False (any changed file is a doc/paper/image/video): **first, if any changed file is in `new_files['video']`, run `references/transcribe.md` (Step 2.5) on those files, then rewrite `.graphify_detect.json` to move the resulting transcript paths into `files['document']` and drop `files['video']`** — otherwise raw `.mp4/.mp3` paths are fed to semantic subagents as unreadable media (#1392). Then run the full Steps 3A–3C pipeline as normal.
|
||||
|
||||
|
||||
If no new files exist (only deletions), create an empty extraction so the merge step can prune:
|
||||
|
||||
```bash
|
||||
if [ ! -f graphify-out/.graphify_extract.json ]; then
|
||||
echo '[graphify update] Only deletions -- creating empty extraction for merge.'
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
Path('graphify-out/.graphify_extract.json').write_text(json.dumps({'nodes':[],'edges':[],'hyperedges':[],'input_tokens':0,'output_tokens':0}), encoding='utf-8')
|
||||
"
|
||||
fi
|
||||
```
|
||||
|
||||
|
||||
Then:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from pathlib import Path
|
||||
from graphify.build import build_merge
|
||||
from graphify.detect import save_manifest
|
||||
|
||||
# Load new extraction and incremental state
|
||||
new_extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
|
||||
incremental = json.loads(Path('graphify-out/.graphify_incremental.json').read_text(encoding=\"utf-8\"))
|
||||
deleted = list(incremental.get('deleted_files', []))
|
||||
# prune_sources is ONLY for genuinely DELETED files. Changed/re-extracted files are
|
||||
# handled by build_merge's replace-on-re-extract (#1344): every source_file in
|
||||
# new_chunks is dropped from the base before merge, so old/stale nodes don't survive.
|
||||
# Do NOT add `changed` here: with root= passed, prune_set relativizes to the same base
|
||||
# as the freshly merged nodes and would DELETE the re-extracted content (#1178 is moot
|
||||
# now that replace — not the dedup pass — reconciles changed files).
|
||||
prune = list(deleted) or None
|
||||
|
||||
# Use build_merge() — reads graph.json directly without NetworkX round-trip
|
||||
# so edge direction (calls, implements, imports) is always preserved (#801).
|
||||
# Pass root= so prune_sources (absolute paths from detect_incremental) are
|
||||
# relativized to match the graph's relative source_file values; without it
|
||||
# nothing is pruned and stale nodes accumulate on every update (#1361).
|
||||
# directed=IS_DIRECTED: replace IS_DIRECTED with True if --directed was given, else
|
||||
# False. Without it a --directed --update silently rebuilds undirected and collapses
|
||||
# reciprocal A<->B edges (#1392).
|
||||
G = build_merge(
|
||||
[new_extraction],
|
||||
graph_path='graphify-out/graph.json',
|
||||
prune_sources=prune,
|
||||
root='INPUT_PATH',
|
||||
directed=IS_DIRECTED,
|
||||
)
|
||||
print(f'[graphify update] Merged: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges')
|
||||
|
||||
# Write merged result back to .graphify_extract.json so Step 4 sees the full graph
|
||||
merged_out = {
|
||||
'nodes': [{'id': n, **d} for n, d in G.nodes(data=True)],
|
||||
'edges': [
|
||||
# Explicit source/target last so they win over any stale attrs in d.
|
||||
{**{k: val for k, val in d.items() if k not in ('_src', '_tgt', 'source', 'target')},
|
||||
'source': d.get('_src', u), 'target': d.get('_tgt', v)}
|
||||
for u, v, d in G.edges(data=True)
|
||||
],
|
||||
# G.graph["hyperedges"] holds hyperedges from both existing graph.json
|
||||
# and new_extraction (build_merge combines them). Falling back to
|
||||
# new_extraction only would silently drop prior-run hyperedges (#801).
|
||||
'hyperedges': list(G.graph.get('hyperedges', [])),
|
||||
'input_tokens': new_extraction.get('input_tokens', 0),
|
||||
'output_tokens': new_extraction.get('output_tokens', 0),
|
||||
}
|
||||
Path('graphify-out/.graphify_extract.json').write_text(json.dumps(merged_out, ensure_ascii=False), encoding=\"utf-8\")
|
||||
print(f'[graphify update] Merged extraction written ({len(merged_out[\"nodes\"])} nodes, {len(merged_out[\"edges\"])} edges)')
|
||||
|
||||
# Save manifest so next --update diffs against today's state, not the
|
||||
# prior run's baseline (prevents ghost-node reports on subsequent updates).
|
||||
# root= matches the build_merge call above so the manifest keys stay relative to
|
||||
# the scan root — portable across clones/machines, so --update keeps matching
|
||||
# cached files instead of missing every one after a move (#1417).
|
||||
save_manifest(incremental['files'], root='INPUT_PATH')
|
||||
print('[graphify update] Manifest saved.')
|
||||
"
|
||||
```
|
||||
|
||||
Then run Steps 4–8 on the merged graph as normal.
|
||||
|
||||
After Step 4, show the graph diff:
|
||||
|
||||
```bash
|
||||
$(cat graphify-out/.graphify_python) -c "
|
||||
import json
|
||||
from graphify.analyze import graph_diff
|
||||
from graphify.build import build_from_json
|
||||
from networkx.readwrite import json_graph
|
||||
import networkx as nx
|
||||
from pathlib import Path
|
||||
|
||||
# Load old graph (before update) from backup written before merge
|
||||
old_data = json.loads(Path('graphify-out/.graphify_old.json').read_text(encoding=\"utf-8\")) if Path('graphify-out/.graphify_old.json').exists() else None
|
||||
new_extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
|
||||
G_new = build_from_json(new_extract, directed=IS_DIRECTED)
|
||||
|
||||
if old_data:
|
||||
G_old = json_graph.node_link_graph(old_data, edges='links')
|
||||
diff = graph_diff(G_old, G_new)
|
||||
print(diff['summary'])
|
||||
if diff['new_nodes']:
|
||||
print('New nodes:', ', '.join(n['label'] for n in diff['new_nodes'][:5]))
|
||||
if diff['new_edges']:
|
||||
print('New edges:', len(diff['new_edges']))
|
||||
"
|
||||
```
|
||||
|
||||
Before the merge step, save the old graph: `cp graphify-out/graph.json graphify-out/.graphify_old.json`
|
||||
Clean up after: `rm -f graphify-out/.graphify_old.json`
|
||||
|
||||
---
|
||||
|
||||
## For --cluster-only
|
||||
|
||||
Skip Steps 1–3. Re-run clustering on the existing graph:
|
||||
|
||||
```bash
|
||||
graphify cluster-only .
|
||||
```
|
||||
|
||||
`graphify cluster-only .` is **self-contained**: it re-clusters, names communities, and regenerates `GRAPH_REPORT.md`, `graph.json`, and `graph.html` from the existing graph. **Do not re-run Steps 5–9** — they read intermediate files (`.graphify_extract.json`, `.graphify_detect.json`, `.graphify_analysis.json`) that a prior build's cleanup (Step 9) already deleted, so they raise `FileNotFoundError` (#1392). When it finishes, present the refreshed `GRAPH_REPORT.md` summary as usual.
|
||||
@ -287,10 +287,13 @@ if command -v npx &>/dev/null; then
|
||||
fi
|
||||
# `skills add` is idempotent and pulls latest from the source repo,
|
||||
# which is the closest thing to an update operation the CLI exposes.
|
||||
if npx -y skills add "$_src" 2>/dev/null; then
|
||||
# Run from $HOME: the CLI resolves .agents/skills/ relative to the CWD, so
|
||||
# running from the repo would write into $REPO/.agents/skills (gitignored)
|
||||
# instead of $HOME/.agents/skills where link.sh expects it.
|
||||
if (cd "$HOME" && npx -y skills add "$_src" 2>/dev/null); then
|
||||
ok "$_name refreshed from $_src"
|
||||
else
|
||||
warn "$_name refresh failed — run manually: npx -y skills add $_src"
|
||||
warn "$_name refresh failed — run manually: (cd \"\$HOME\" && npx -y skills add $_src)"
|
||||
fi
|
||||
done
|
||||
else
|
||||
|
||||
Loading…
Reference in New Issue
Block a user