feat(skills): add pdf-translate — PDF to translated HTML via Vision

OCR/image-based PDF pipeline: convert pages to PNGs, read with Claude Vision (bypasses unreliable OCR text layer), translate with cross-page glossary consistency, reconstruct faithful HTML via /design-html. 5 steps: deps check → page images + assets → style analysis → page-by-page read+translate → HTML reconstruction → visual QA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-26 03:38:16 +02:00 · 2026-05-26 03:38:16 +02:00 · c44852e665
commit c44852e665
parent 5407d66da9
2 changed files with 164 additions and 1 deletions
--- a/lib/profiles/full.profile
+++ b/lib/profiles/full.profile
@ -57,9 +57,10 @@ canary
 qa
 qa-only
-# === Docs ============================================================
+# === Docs + translation ==============================================
 doc                               personal
 document-release
 pdf-translate                     personal
 # === Session hygiene + memory ========================================
 close                             personal
--- a/skills/pdf-translate/SKILL.md
+++ b/skills/pdf-translate/SKILL.md
@ -0,0 +1,162 @@
 ---
 name: pdf-translate
 description: Use when translating a PDF (especially OCR or image-based) to another language and producing faithful HTML output. Handles image extraction, layout preservation, contextual translation, and style-matched reconstruction. Triggers on "translate this PDF", "PDF en francais", "convert PDF to HTML translated", "traduire ce document".
 disable-model-invocation: true
 ---
 # PDF Translate
 Translate a PDF into another language and produce an HTML document that preserves the original layout, images, and visual style. Optimized for OCR/image-based PDFs where the text layer is unreliable.
 ## Pipeline
 ```dot
 digraph pipeline {
  rankdir=LR;
  PDF [shape=folder];
  Images [shape=box, label="Page PNGs\n+ embedded images"];
  Analysis [shape=box, label="Claude Vision\nread + translate\n+ layout map"];
  HTML [shape=box, label="Faithful HTML\n/design-html"];
  QA [shape=diamond, label="Visual QA\nPDF vs HTML"];
  PDF -> Images [label="STEP 1"];
  Images -> Analysis [label="STEP 2-3"];
  Analysis -> HTML [label="STEP 4"];
  HTML -> QA [label="STEP 5"];
  QA -> Analysis [label="fix", style=dashed];
 }
 ```
 ## STEP 0: Dependencies
 Check before starting. Install what's missing.
 ```bash
 # Option A: poppler (lighter)
 command -v pdftoppm && echo "OK" || echo "INSTALL: sudo apt install poppler-utils"
 # Option B: PyMuPDF (more powerful — extracts embedded images with coordinates)
 python3 -c "import fitz; print('OK')" 2>/dev/null || echo "INSTALL: pip install pymupdf"
 ```
 Prefer PyMuPDF if both available — it extracts embedded images + gives page dimensions.
 ## STEP 1: PDF → Page Images + Embedded Assets
 ```bash
 # Create working directory
 mkdir -p pdf-translate-work/{pages,assets}
 # Convert pages to high-res PNGs
 pdftoppm -png -r 300 input.pdf pdf-translate-work/pages/page
 # OR with PyMuPDF:
 python3 -c "
 import fitz
 doc = fitz.open('input.pdf')
 for i, page in enumerate(doc):
    pix = page.get_pixmap(dpi=300)
    pix.save(f'pdf-translate-work/pages/page-{i+1:03d}.png')
    for img_idx, img in enumerate(page.get_images(full=True)):
        xref = img[0]
        base = doc.extract_image(xref)
        with open(f'pdf-translate-work/assets/img-p{i+1}-{img_idx+1}.{base[\"ext\"]}', 'wb') as f:
            f.write(base['image'])
 "
 ```
 ## STEP 2: First Pass — Style Analysis
 Read page 1 (and optionally 2-3 more) with Claude Vision. Extract:
 - **Typography**: font style (serif/sans), heading sizes, body size, weight
 - **Colors**: background, text, accent, header colors
 - **Layout**: single/multi column, margins, header/footer pattern
 - **Spacing**: line height, paragraph gaps, section gaps
 - **Special elements**: callout boxes, sidebars, tables, captions, footnotes
 Output a style brief — this feeds into STEP 4.
 ## STEP 3: Page-by-Page Read + Translate
 For each page image, use Claude Vision (Read tool on PNG):
 1. **Read** the text content from the image (ignore OCR text layer)
 2. **Map layout**: identify text blocks, headings, images, tables, their relative positions
 3. **Translate** to target language preserving:
   - Register and tone (formal/informal/technical)
   - Technical terms (keep original in parentheses on first occurrence if ambiguous)
   - Sentence structure adapted to target language (not word-for-word)
 4. **Note** image references: what each image shows, where it sits relative to text
 ### Cross-page context
 Maintain a running glossary of translated terms across pages. If page 1 translates "stakeholder" as "partie prenante", every subsequent page must use the same term.
 Output per page:
 ```markdown
 ## Page N
 ### Layout
 [column structure, image positions]
 ### Content (translated)
 [translated text with markdown structure]
 ### Images
 - img-pN-1.png: [description], position: [top-right / inline / full-width]
 ```
 ## STEP 4: HTML Reconstruction
 Invoke `/design-html` (or `/frontend-design`) with:
 1. The style brief from STEP 2
 2. All translated page content from STEP 3
 3. Extracted images from `pdf-translate-work/assets/`
 Requirements for the HTML:
 - Single self-contained HTML file (inline CSS, base64 images or relative paths)
 - Match original typography feel (use closest web-safe or Google Font)
 - Preserve column layout, spacing, color scheme
 - Images at original positions with proper sizing
 - Print-friendly: `@media print` styles, page breaks where original had them
 - Responsive: readable on screen, faithful on print
 ## STEP 5: Visual QA
 Compare original PDF and translated HTML side by side:
 1. Read a few pages of the original PDF (Read tool, pages parameter)
 2. Take screenshot of the HTML (if /browse available)
 3. Check: layout match, no missing content, images present, style fidelity
 4. Fix discrepancies → iterate STEP 4
 ## Decision: OCR vs Native PDF
 ```dot
 digraph ocr_check {
  Check [shape=diamond, label="Does PDF have\nreliable text layer?"];
  Native [shape=box, label="Can use marker\n+ Claude translate"];
  OCR [shape=box, label="Use Vision pipeline\n(this skill)"];
  Test [shape=box, label="Copy text from PDF.\nGarbled or missing?"];
  Check -> Test [label="unsure"];
  Test -> OCR [label="yes"];
  Test -> Native [label="no, text is clean"];
  Check -> Native [label="native PDF"];
  Check -> OCR [label="scanned/OCR"];
 }
 ```
 If the PDF has a clean text layer, `marker` (pip install marker-pdf) is faster. This skill's Vision pipeline is for when the text layer is unreliable.
 ## Common Mistakes
 | Mistake | Fix |
 |---|---|
 | Using OCR text layer from scanned PDF | Read page images with Vision instead |
 | Translating page-by-page without glossary | Maintain cross-page term consistency |
 | Generic HTML that doesn't match original style | Extract style brief first (STEP 2) |
 | Word-for-word translation | Adapt sentence structure to target language |
 | Forgetting `prefers-reduced-motion` or print styles | Include in HTML output |
 | Images as decoration only | Preserve original placement and sizing |