Bump 0.8.13 -> 0.8.45. Extract the SKILL.md monolith (~530 lines) into references/ for progressive disclosure: github-and-merge, transcribe, extraction-spec, exports, update, query, add-watch, hooks. SKILL.md now points to each reference and loads it only on the path that needs it. Inline fixes carried by the new version: empty-extraction guard before any write (#1392), shrink-guard ordering so GRAPH_REPORT/analysis never describe a graph.json that was refused (#479), root= relativization for build/manifest parity across clones (#1361/#1417), stale-cache cleanup and code-only semantic pre-write (#1392), edge-direction preserving merge (#801). Adds FalkorDB export (--falkordb/--falkordb-push) and rewrites the frontmatter description (drops the obsolete trigger: field). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0169vjUD1sP9Nx4ZiCa8wvAw
12 KiB
graphify reference: query, path, explain
Load this when the user asks a question against an existing graph, or runs /graphify path or /graphify explain. The core's query stub points here for the full traversal flow. These flows use the graphify query CLI when it is available and fall back to an inline NetworkX traversal otherwise.
Two traversal modes - choose based on the question:
| Mode | Flag | Best for |
|---|---|---|
| BFS (default) | (none) | "What is X connected to?" - broad context, nearest neighbors first |
| DFS | --dfs |
"How does X reach Y?" - trace a specific chain or dependency path |
First check the graph exists:
$(cat graphify-out/.graphify_python) -c "
from pathlib import Path
if not Path('graphify-out/graph.json').exists():
print('ERROR: No graph found. Run /graphify <path> first to build the graph.')
raise SystemExit(1)
"
If it fails, stop and tell the user to run /graphify <path> first.
Step 0 — Constrained query expansion (REQUIRED before traversal)
graphify's query CLI matches nodes via case-folded substring + IDF — there is no stemming, no synonyms, no cross-language match inside the binary, and the inline fallback below matches the same way. If the user's question uses different language or different domain vocabulary than the graph's labels (user says "обработчик" / graph says "handler"; user says "authentication" / graph says "Guardian"), the literal matcher returns 0 hits and the answer collapses to noise.
Fix this without inventing tokens by expanding the query against the actual graph vocabulary first:
- Extract the token vocabulary from node labels:
$(cat graphify-out/.graphify_python) -c "
import json, re
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
vocab = set()
for n in data['nodes']:
for c in re.findall(r'[^\W\d_]+', n.get('label','') or '', re.UNICODE):
parts = re.findall(r'[A-Z]+(?=[A-Z][a-z])|[A-Z]?[a-z]+|[A-Z]+', c) or [c]
for p in parts:
t = p.lower()
if 3 <= len(t) <= 30:
vocab.add(t)
Path('graphify-out/.vocab.txt').write_text('\n'.join(sorted(vocab)))
print(f'vocab: {len(vocab)} tokens')
"
-
Read
graphify-out/.vocab.txt. Then for the user's question, select up to 12 tokens from this exact list that semantically match the query intent. Hard constraints:- You MUST pick only tokens present in the vocabulary file. Do NOT invent tokens.
- If a query concept has no plausible token in the vocab, skip it — do not substitute a near-synonym from training memory.
- If no vocab tokens match the query at all, output an empty list and tell the user the corpus has no relevant vocabulary for this question. Do not fabricate a search.
- Translate cross-language: Russian "аутентификация" → look for
auth,credential,token,securityIFF present in vocab. - Morphology: "handlers" maps to
handlerIFF present; "todos" maps totodoIFF present.
-
Print the selection explicitly to the user before running the query, so the expansion is auditable:
Query expanded to (from graph vocab, N tokens): [token1, token2, ...]
If the list is empty, say so plainly and stop — do not proceed to traversal.
Step 1 — Traversal
Build the expanded query string by joining the selected tokens with spaces. Use this string as QUESTION below — NOT the original user question. (The original question is preserved only for save-result at the end.)
Prefer the CLI when it is installed:
graphify query "QUESTION"
# or: graphify query "QUESTION" --dfs --budget 3000
If the CLI is unavailable, load graphify-out/graph.json and run the traversal inline:
- Find the 1-3 nodes whose label best matches the expanded tokens.
- Run the appropriate traversal from each starting node.
- Read the subgraph - node labels, edge relations, confidence tags, source locations.
- Answer using only what the graph contains. Quote
source_locationwhen citing a specific fact. - If the graph lacks enough information, say so - do not hallucinate edges.
$(cat graphify-out/.graphify_python) -c "
import sys, json
from networkx.readwrite import json_graph
import networkx as nx
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')
question = 'QUESTION'
mode = 'MODE' # 'bfs' or 'dfs'
terms = [t.lower() for t in question.split() if len(t) >= 3] # match the vocab threshold; keeps api/jwt/ios (#1392)
# Find best-matching start nodes
scored = []
for nid, ndata in G.nodes(data=True):
label = ndata.get('label', '').lower()
score = sum(1 for t in terms if t in label)
if score > 0:
scored.append((score, nid))
scored.sort(reverse=True)
start_nodes = [nid for _, nid in scored[:3]]
if not start_nodes:
print('No matching nodes found for query terms:', terms)
sys.exit(0)
subgraph_nodes = set()
subgraph_edges = []
if mode == 'dfs':
# DFS: follow one path as deep as possible before backtracking.
# Depth-limited to 6 to avoid traversing the whole graph.
visited = set()
stack = [(n, 0) for n in reversed(start_nodes)]
while stack:
node, depth = stack.pop()
if node in visited or depth > 6:
continue
visited.add(node)
subgraph_nodes.add(node)
for neighbor in G.neighbors(node):
if neighbor not in visited:
stack.append((neighbor, depth + 1))
subgraph_edges.append((node, neighbor))
else:
# BFS: explore all neighbors layer by layer up to depth 3.
frontier = set(start_nodes)
subgraph_nodes = set(start_nodes)
for _ in range(3):
next_frontier = set()
for n in frontier:
for neighbor in G.neighbors(n):
if neighbor not in subgraph_nodes:
next_frontier.add(neighbor)
subgraph_edges.append((n, neighbor))
subgraph_nodes.update(next_frontier)
frontier = next_frontier
# Token-budget aware output: rank by relevance, cut at budget (~4 chars/token)
token_budget = BUDGET # default 2000
char_budget = token_budget * 4
# Score each node by term overlap for ranked output
def relevance(nid):
label = G.nodes[nid].get('label', '').lower()
return sum(1 for t in terms if t in label)
ranked_nodes = sorted(subgraph_nodes, key=relevance, reverse=True)
lines = [f'Traversal: {mode.upper()} | Start: {[G.nodes[n].get(\"label\",n) for n in start_nodes]} | {len(subgraph_nodes)} nodes']
for nid in ranked_nodes:
d = G.nodes[nid]
lines.append(f' NODE {d.get(\"label\", nid)} [src={d.get(\"source_file\",\"\")} loc={d.get(\"source_location\",\"\")}]')
for u, v in subgraph_edges:
if u in subgraph_nodes and v in subgraph_nodes:
_raw = G[u][v]; d = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
lines.append(f' EDGE {G.nodes[u].get(\"label\",u)} --{d.get(\"relation\",\"\")} [{d.get(\"confidence\",\"\")}]--> {G.nodes[v].get(\"label\",v)}')
output = '\n'.join(lines)
if len(output) > char_budget:
output = output[:char_budget] + f'\n... (truncated at ~{token_budget} token budget - use --budget N for more)'
print(output)
"
Replace QUESTION with the expanded query string, MODE with bfs or dfs, and BUDGET with the token budget (default 2000, or whatever --budget N specifies). Then answer based on the subgraph output above, using only what the graph contains.
After writing the answer, save it back into the graph so it improves future queries. Include the expanded tokens inside the --answer text (e.g. "Expanded from original query via vocab: [tokens]. Then traversed...") so the next --update extracts the expansion history as a graph node:
$(cat graphify-out/.graphify_python) -m graphify save-result --question "ORIGINAL_QUESTION" --answer "ANSWER" --type query --nodes NODE1 NODE2
Replace ORIGINAL_QUESTION with the user's verbatim question, ANSWER with your full answer text (containing the expanded-token trace), NODE1 NODE2 with the list of node labels you cited. This closes the feedback loop: the next --update will extract this Q&A as a node in the graph.
For /graphify path
Find the shortest path between two named concepts in the graph. Prefer the CLI when installed:
graphify path "NODE_A" "NODE_B"
If the CLI is unavailable, run it inline:
$(cat graphify-out/.graphify_python) -c "
import json, sys
import networkx as nx
from networkx.readwrite import json_graph
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')
a_term = 'NODE_A'
b_term = 'NODE_B'
def find_node(term):
term = term.lower()
scored = sorted(
[(sum(1 for w in term.split() if w in G.nodes[n].get('label','').lower()), n)
for n in G.nodes()],
reverse=True
)
return scored[0][1] if scored and scored[0][0] > 0 else None
src = find_node(a_term)
tgt = find_node(b_term)
if not src or not tgt:
print(f'Could not find nodes matching: {a_term!r} or {b_term!r}')
sys.exit(0)
try:
path = nx.shortest_path(G, src, tgt)
print(f'Shortest path ({len(path)-1} hops):')
for i, nid in enumerate(path):
label = G.nodes[nid].get('label', nid)
if i < len(path) - 1:
_raw = G[nid][path[i+1]]; edge = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
rel = edge.get('relation', '')
conf = edge.get('confidence', '')
print(f' {label} --{rel}--> [{conf}]')
else:
print(f' {label}')
except nx.NetworkXNoPath:
print(f'No path found between {a_term!r} and {b_term!r}')
except nx.NodeNotFound as e:
print(f'Node not found: {e}')
"
Replace NODE_A and NODE_B with the actual concept names from the user. Then explain the path in plain language - what each hop means, why it's significant.
After writing the explanation, save it back:
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Path from NODE_A to NODE_B" --answer "ANSWER" --type path_query --nodes NODE_A NODE_B
For /graphify explain
Give a plain-language explanation of a single node - everything connected to it. Prefer the CLI when installed:
graphify explain "NODE_NAME"
If the CLI is unavailable, run it inline:
$(cat graphify-out/.graphify_python) -c "
import json, sys
import networkx as nx
from networkx.readwrite import json_graph
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')
term = 'NODE_NAME'
term_lower = term.lower()
# Find best matching node
scored = sorted(
[(sum(1 for w in term_lower.split() if w in G.nodes[n].get('label','').lower()), n)
for n in G.nodes()],
reverse=True
)
if not scored or scored[0][0] == 0:
print(f'No node matching {term!r}')
sys.exit(0)
nid = scored[0][1]
data_n = G.nodes[nid]
print(f'NODE: {data_n.get(\"label\", nid)}')
print(f' source: {data_n.get(\"source_file\",\"unknown\")}')
print(f' type: {data_n.get(\"file_type\",\"unknown\")}')
print(f' degree: {G.degree(nid)}')
print()
print('CONNECTIONS:')
for neighbor in G.neighbors(nid):
_raw = G[nid][neighbor]; edge = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
nlabel = G.nodes[neighbor].get('label', neighbor)
rel = edge.get('relation', '')
conf = edge.get('confidence', '')
src_file = G.nodes[neighbor].get('source_file', '')
print(f' --{rel}--> {nlabel} [{conf}] ({src_file})')
"
Replace NODE_NAME with the concept the user asked about. Then write a 3-5 sentence explanation of what this node is, what it connects to, and why those connections are significant. Use the source locations as citations.
After writing the explanation, save it back:
$(cat graphify-out/.graphify_python) -m graphify save-result --question "Explain NODE_NAME" --answer "ANSWER" --type explain --nodes NODE_NAME