Bastien Chanot ed5b54e87e chore(graphify): update skill to v0.8.45

Bump 0.8.13 -> 0.8.45. Extract the SKILL.md monolith (~530 lines) into
references/ for progressive disclosure: github-and-merge, transcribe,
extraction-spec, exports, update, query, add-watch, hooks. SKILL.md now
points to each reference and loads it only on the path that needs it.

Inline fixes carried by the new version: empty-extraction guard before
any write (#1392), shrink-guard ordering so GRAPH_REPORT/analysis never
describe a graph.json that was refused (#479), root= relativization for
build/manifest parity across clones (#1361/#1417), stale-cache cleanup
and code-only semantic pre-write (#1392), edge-direction preserving
merge (#801). Adds FalkorDB export (--falkordb/--falkordb-push) and
rewrites the frontmatter description (drops the obsolete trigger: field).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0169vjUD1sP9Nx4ZiCa8wvAw

2026-06-24 14:22:14 +02:00

12 KiB

Raw Permalink Blame History

graphify reference: query, path, explain

Load this when the user asks a question against an existing graph, or runs /graphify path or /graphify explain. The core's query stub points here for the full traversal flow. These flows use the graphify query CLI when it is available and fall back to an inline NetworkX traversal otherwise.

Two traversal modes - choose based on the question:

Mode	Flag	Best for
BFS (default)	(none)	"What is X connected to?" - broad context, nearest neighbors first
DFS	`--dfs`	"How does X reach Y?" - trace a specific chain or dependency path

First check the graph exists:

$(cat graphify-out/.graphify_python) -c "
from pathlib import Path
if not Path('graphify-out/graph.json').exists():
    print('ERROR: No graph found. Run /graphify <path> first to build the graph.')
    raise SystemExit(1)
"

If it fails, stop and tell the user to run /graphify <path> first.

Step 0 — Constrained query expansion (REQUIRED before traversal)

graphify's query CLI matches nodes via case-folded substring + IDF — there is no stemming, no synonyms, no cross-language match inside the binary, and the inline fallback below matches the same way. If the user's question uses different language or different domain vocabulary than the graph's labels (user says "обработчик" / graph says "handler"; user says "authentication" / graph says "Guardian"), the literal matcher returns 0 hits and the answer collapses to noise.

Fix this without inventing tokens by expanding the query against the actual graph vocabulary first:

Extract the token vocabulary from node labels:

$(cat graphify-out/.graphify_python) -c "
import json, re
from pathlib import Path
data = json.loads(Path('graphify-out/graph.json').read_text())
vocab = set()
for n in data['nodes']:
    for c in re.findall(r'[^\W\d_]+', n.get('label','') or '', re.UNICODE):
        parts = re.findall(r'[A-Z]+(?=[A-Z][a-z])|[A-Z]?[a-z]+|[A-Z]+', c) or [c]
        for p in parts:
            t = p.lower()
            if 3 <= len(t) <= 30:
                vocab.add(t)
Path('graphify-out/.vocab.txt').write_text('\n'.join(sorted(vocab)))
print(f'vocab: {len(vocab)} tokens')
"

Read graphify-out/.vocab.txt. Then for the user's question, select up to 12 tokens from this exact list that semantically match the query intent. Hard constraints:
- You MUST pick only tokens present in the vocabulary file. Do NOT invent tokens.
- If a query concept has no plausible token in the vocab, skip it — do not substitute a near-synonym from training memory.
- If no vocab tokens match the query at all, output an empty list and tell the user the corpus has no relevant vocabulary for this question. Do not fabricate a search.
- Translate cross-language: Russian "аутентификация" → look for auth, credential, token, security IFF present in vocab.
- Morphology: "handlers" maps to handler IFF present; "todos" maps to todo IFF present.
Print the selection explicitly to the user before running the query, so the expansion is auditable:

Query expanded to (from graph vocab, N tokens): [token1, token2, ...]

If the list is empty, say so plainly and stop — do not proceed to traversal.

Step 1 — Traversal

Build the expanded query string by joining the selected tokens with spaces. Use this string as QUESTION below — NOT the original user question. (The original question is preserved only for save-result at the end.)

Prefer the CLI when it is installed:

graphify query "QUESTION"
# or: graphify query "QUESTION" --dfs --budget 3000

If the CLI is unavailable, load graphify-out/graph.json and run the traversal inline:

Find the 1-3 nodes whose label best matches the expanded tokens.
Run the appropriate traversal from each starting node.
Read the subgraph - node labels, edge relations, confidence tags, source locations.
Answer using only what the graph contains. Quote source_location when citing a specific fact.
If the graph lacks enough information, say so - do not hallucinate edges.

$(cat graphify-out/.graphify_python) -c "
import sys, json
from networkx.readwrite import json_graph
import networkx as nx
from pathlib import Path

data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')

question = 'QUESTION'
mode = 'MODE'  # 'bfs' or 'dfs'
terms = [t.lower() for t in question.split() if len(t) >= 3]  # match the vocab threshold; keeps api/jwt/ios (#1392)

# Find best-matching start nodes
scored = []
for nid, ndata in G.nodes(data=True):
    label = ndata.get('label', '').lower()
    score = sum(1 for t in terms if t in label)
    if score > 0:
        scored.append((score, nid))
scored.sort(reverse=True)
start_nodes = [nid for _, nid in scored[:3]]

if not start_nodes:
    print('No matching nodes found for query terms:', terms)
    sys.exit(0)

subgraph_nodes = set()
subgraph_edges = []

if mode == 'dfs':
    # DFS: follow one path as deep as possible before backtracking.
    # Depth-limited to 6 to avoid traversing the whole graph.
    visited = set()
    stack = [(n, 0) for n in reversed(start_nodes)]
    while stack:
        node, depth = stack.pop()
        if node in visited or depth > 6:
            continue
        visited.add(node)
        subgraph_nodes.add(node)
        for neighbor in G.neighbors(node):
            if neighbor not in visited:
                stack.append((neighbor, depth + 1))
                subgraph_edges.append((node, neighbor))
else:
    # BFS: explore all neighbors layer by layer up to depth 3.
    frontier = set(start_nodes)
    subgraph_nodes = set(start_nodes)
    for _ in range(3):
        next_frontier = set()
        for n in frontier:
            for neighbor in G.neighbors(n):
                if neighbor not in subgraph_nodes:
                    next_frontier.add(neighbor)
                    subgraph_edges.append((n, neighbor))
        subgraph_nodes.update(next_frontier)
        frontier = next_frontier

# Token-budget aware output: rank by relevance, cut at budget (~4 chars/token)
token_budget = BUDGET  # default 2000
char_budget = token_budget * 4

# Score each node by term overlap for ranked output
def relevance(nid):
    label = G.nodes[nid].get('label', '').lower()
    return sum(1 for t in terms if t in label)

ranked_nodes = sorted(subgraph_nodes, key=relevance, reverse=True)

lines = [f'Traversal: {mode.upper()} | Start: {[G.nodes[n].get(\"label\",n) for n in start_nodes]} | {len(subgraph_nodes)} nodes']
for nid in ranked_nodes:
    d = G.nodes[nid]
    lines.append(f'  NODE {d.get(\"label\", nid)} [src={d.get(\"source_file\",\"\")} loc={d.get(\"source_location\",\"\")}]')
for u, v in subgraph_edges:
    if u in subgraph_nodes and v in subgraph_nodes:
        _raw = G[u][v]; d = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
        lines.append(f'  EDGE {G.nodes[u].get(\"label\",u)} --{d.get(\"relation\",\"\")} [{d.get(\"confidence\",\"\")}]--> {G.nodes[v].get(\"label\",v)}')

output = '\n'.join(lines)
if len(output) > char_budget:
    output = output[:char_budget] + f'\n... (truncated at ~{token_budget} token budget - use --budget N for more)'
print(output)
"

Replace QUESTION with the expanded query string, MODE with bfs or dfs, and BUDGET with the token budget (default 2000, or whatever --budget N specifies). Then answer based on the subgraph output above, using only what the graph contains.

After writing the answer, save it back into the graph so it improves future queries. Include the expanded tokens inside the --answer text (e.g. "Expanded from original query via vocab: [tokens]. Then traversed...") so the next --update extracts the expansion history as a graph node:

$(cat graphify-out/.graphify_python) -m graphify save-result --question "ORIGINAL_QUESTION" --answer "ANSWER" --type query --nodes NODE1 NODE2

Replace ORIGINAL_QUESTION with the user's verbatim question, ANSWER with your full answer text (containing the expanded-token trace), NODE1 NODE2 with the list of node labels you cited. This closes the feedback loop: the next --update will extract this Q&A as a node in the graph.

For /graphify path

Find the shortest path between two named concepts in the graph. Prefer the CLI when installed:

graphify path "NODE_A" "NODE_B"

If the CLI is unavailable, run it inline:

$(cat graphify-out/.graphify_python) -c "
import json, sys
import networkx as nx
from networkx.readwrite import json_graph
from pathlib import Path

data = json.loads(Path('graphify-out/graph.json').read_text())
G = json_graph.node_link_graph(data, edges='links')

a_term = 'NODE_A'
b_term = 'NODE_B'

def find_node(term):
    term = term.lower()
    scored = sorted(
        [(sum(1 for w in term.split() if w in G.nodes[n].get('label','').lower()), n)
         for n in G.nodes()],
        reverse=True
    )
    return scored[0][1] if scored and scored[0][0] > 0 else None

src = find_node(a_term)
tgt = find_node(b_term)

if not src or not tgt:
    print(f'Could not find nodes matching: {a_term!r} or {b_term!r}')
    sys.exit(0)

try:
    path = nx.shortest_path(G, src, tgt)
    print(f'Shortest path ({len(path)-1} hops):')
    for i, nid in enumerate(path):
        label = G.nodes[nid].get('label', nid)
        if i < len(path) - 1:
            _raw = G[nid][path[i+1]]; edge = next(iter(_raw.values()), {}) if isinstance(G, nx.MultiGraph) else _raw
            rel = edge.get('relation', '')
            conf = edge.get('confidence', '')
            print(f'  {label} --{rel}--> [{conf}]')
        else:
            print(f'  {label}')
except nx.NetworkXNoPath:
    print(f'No path found between {a_term!r} and {b_term!r}')
except nx.NodeNotFound as e:
    print(f'Node not found: {e}')
"

Replace NODE_A and NODE_B with the actual concept names from the user. Then explain the path in plain language - what each hop means, why it's significant.

After writing the explanation, save it back:

$(cat graphify-out/.graphify_python) -m graphify save-result --question "Path from NODE_A to NODE_B" --answer "ANSWER" --type path_query --nodes NODE_A NODE_B

For /graphify explain

Give a plain-language explanation of a single node - everything connected to it. Prefer the CLI when installed:

graphify explain "NODE_NAME"