claude/skills/graphify/references/transcribe.md
Bastien Chanot ed5b54e87e chore(graphify): update skill to v0.8.45
Bump 0.8.13 -> 0.8.45. Extract the SKILL.md monolith (~530 lines) into
references/ for progressive disclosure: github-and-merge, transcribe,
extraction-spec, exports, update, query, add-watch, hooks. SKILL.md now
points to each reference and loads it only on the path that needs it.

Inline fixes carried by the new version: empty-extraction guard before
any write (#1392), shrink-guard ordering so GRAPH_REPORT/analysis never
describe a graph.json that was refused (#479), root= relativization for
build/manifest parity across clones (#1361/#1417), stale-cache cleanup
and code-only semantic pre-write (#1392), edge-direction preserving
merge (#801). Adds FalkorDB export (--falkordb/--falkordb-push) and
rewrites the frontmatter description (drops the obsolete trigger: field).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0169vjUD1sP9Nx4ZiCa8wvAw
2026-06-24 14:22:14 +02:00

3.1 KiB

graphify reference: transcribe video and audio

Load this only when detect reported one or more video files. A corpus with no video never reads this.

Step 2.5 - Transcribe video / audio files (only if video files detected)

Skip this step entirely if detect returned zero video files.

Video and audio files cannot be read directly. Transcribe them to text first, then treat the transcripts as doc files in Step 3.

Strategy: Read the god nodes from graphify-out/.graphify_detect.json (or the analysis file if it exists from a previous run). You are already a language model — write a one-sentence domain hint yourself from those labels. Then pass it to Whisper as the initial prompt. No separate API call needed.

However, if the corpus has only video files and no other docs/code, use the generic fallback prompt: "Use proper punctuation and paragraph breaks."

Step 1 - Write the Whisper prompt yourself.

Read the top god node labels from detect output or analysis, then compose a short domain hint sentence, for example:

  • Labels: transformer, attention, encoder, decoder"Machine learning research on transformer architectures and attention mechanisms. Use proper punctuation and paragraph breaks."
  • Labels: kubernetes, deployment, pod, helm"DevOps discussion about Kubernetes deployments and Helm charts. Use proper punctuation and paragraph breaks."

Export it as GRAPHIFY_WHISPER_PROMPT (the exact name the transcriber reads — and it must be exported so the child Python process sees it) for the next command.

Step 2 - Transcribe:

export GRAPHIFY_WHISPER_MODEL=base  # or whatever --whisper-model the user passed (must be exported)
export GRAPHIFY_WHISPER_PROMPT="<the one-sentence domain hint you composed in Step 1>"
$(cat graphify-out/.graphify_python) -c "
import json, os, sys
from pathlib import Path
from graphify.transcribe import transcribe_all

detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
video_files = detect.get('files', {}).get('video', [])
prompt = os.environ.get('GRAPHIFY_WHISPER_PROMPT', 'Use proper punctuation and paragraph breaks.')

transcript_paths = transcribe_all(video_files, initial_prompt=prompt)
# Write the JSON from Python (NOT a shell '>' redirect): transcribe_all/Whisper
# print progress to stdout, which would otherwise corrupt the JSON file (#1392).
Path('graphify-out/.graphify_transcripts.json').write_text(json.dumps(transcript_paths, ensure_ascii=False), encoding=\"utf-8\")
print(f'Transcribed {len(transcript_paths)} file(s)', file=sys.stderr)
"

After transcription:

  • Read the transcript paths from graphify-out/.graphify_transcripts.json
  • Add them to the docs list before dispatching semantic subagents in Step 3B
  • Print how many transcripts were created: Transcribed N video file(s) -> treating as docs
  • If transcription fails for a file, print a warning and continue with the rest

Whisper model: Default is base. If the user passed --whisper-model <name>, export GRAPHIFY_WHISPER_MODEL=<name> (it must be exported, not just assigned) before running the command above.