feat(skills): add pdf-translate — PDF to translated HTML via Vision

OCR/image-based PDF pipeline: convert pages to PNGs, read with Claude
Vision (bypasses unreliable OCR text layer), translate with cross-page
glossary consistency, reconstruct faithful HTML via /design-html.

5 steps: deps check → page images + assets → style analysis →
page-by-page read+translate → HTML reconstruction → visual QA.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Bastien Chanot 2026-05-26 03:38:16 +02:00
parent 5407d66da9
commit c44852e665
2 changed files with 164 additions and 1 deletions

View File

@ -57,9 +57,10 @@ canary
qa qa
qa-only qa-only
# === Docs ============================================================ # === Docs + translation ==============================================
doc personal doc personal
document-release document-release
pdf-translate personal
# === Session hygiene + memory ======================================== # === Session hygiene + memory ========================================
close personal close personal

View File

@ -0,0 +1,162 @@
---
name: pdf-translate
description: Use when translating a PDF (especially OCR or image-based) to another language and producing faithful HTML output. Handles image extraction, layout preservation, contextual translation, and style-matched reconstruction. Triggers on "translate this PDF", "PDF en francais", "convert PDF to HTML translated", "traduire ce document".
disable-model-invocation: true
---
# PDF Translate
Translate a PDF into another language and produce an HTML document that preserves the original layout, images, and visual style. Optimized for OCR/image-based PDFs where the text layer is unreliable.
## Pipeline
```dot
digraph pipeline {
rankdir=LR;
PDF [shape=folder];
Images [shape=box, label="Page PNGs\n+ embedded images"];
Analysis [shape=box, label="Claude Vision\nread + translate\n+ layout map"];
HTML [shape=box, label="Faithful HTML\n/design-html"];
QA [shape=diamond, label="Visual QA\nPDF vs HTML"];
PDF -> Images [label="STEP 1"];
Images -> Analysis [label="STEP 2-3"];
Analysis -> HTML [label="STEP 4"];
HTML -> QA [label="STEP 5"];
QA -> Analysis [label="fix", style=dashed];
}
```
## STEP 0: Dependencies
Check before starting. Install what's missing.
```bash
# Option A: poppler (lighter)
command -v pdftoppm && echo "OK" || echo "INSTALL: sudo apt install poppler-utils"
# Option B: PyMuPDF (more powerful — extracts embedded images with coordinates)
python3 -c "import fitz; print('OK')" 2>/dev/null || echo "INSTALL: pip install pymupdf"
```
Prefer PyMuPDF if both available — it extracts embedded images + gives page dimensions.
## STEP 1: PDF → Page Images + Embedded Assets
```bash
# Create working directory
mkdir -p pdf-translate-work/{pages,assets}
# Convert pages to high-res PNGs
pdftoppm -png -r 300 input.pdf pdf-translate-work/pages/page
# OR with PyMuPDF:
python3 -c "
import fitz
doc = fitz.open('input.pdf')
for i, page in enumerate(doc):
pix = page.get_pixmap(dpi=300)
pix.save(f'pdf-translate-work/pages/page-{i+1:03d}.png')
for img_idx, img in enumerate(page.get_images(full=True)):
xref = img[0]
base = doc.extract_image(xref)
with open(f'pdf-translate-work/assets/img-p{i+1}-{img_idx+1}.{base[\"ext\"]}', 'wb') as f:
f.write(base['image'])
"
```
## STEP 2: First Pass — Style Analysis
Read page 1 (and optionally 2-3 more) with Claude Vision. Extract:
- **Typography**: font style (serif/sans), heading sizes, body size, weight
- **Colors**: background, text, accent, header colors
- **Layout**: single/multi column, margins, header/footer pattern
- **Spacing**: line height, paragraph gaps, section gaps
- **Special elements**: callout boxes, sidebars, tables, captions, footnotes
Output a style brief — this feeds into STEP 4.
## STEP 3: Page-by-Page Read + Translate
For each page image, use Claude Vision (Read tool on PNG):
1. **Read** the text content from the image (ignore OCR text layer)
2. **Map layout**: identify text blocks, headings, images, tables, their relative positions
3. **Translate** to target language preserving:
- Register and tone (formal/informal/technical)
- Technical terms (keep original in parentheses on first occurrence if ambiguous)
- Sentence structure adapted to target language (not word-for-word)
4. **Note** image references: what each image shows, where it sits relative to text
### Cross-page context
Maintain a running glossary of translated terms across pages. If page 1 translates "stakeholder" as "partie prenante", every subsequent page must use the same term.
Output per page:
```markdown
## Page N
### Layout
[column structure, image positions]
### Content (translated)
[translated text with markdown structure]
### Images
- img-pN-1.png: [description], position: [top-right / inline / full-width]
```
## STEP 4: HTML Reconstruction
Invoke `/design-html` (or `/frontend-design`) with:
1. The style brief from STEP 2
2. All translated page content from STEP 3
3. Extracted images from `pdf-translate-work/assets/`
Requirements for the HTML:
- Single self-contained HTML file (inline CSS, base64 images or relative paths)
- Match original typography feel (use closest web-safe or Google Font)
- Preserve column layout, spacing, color scheme
- Images at original positions with proper sizing
- Print-friendly: `@media print` styles, page breaks where original had them
- Responsive: readable on screen, faithful on print
## STEP 5: Visual QA
Compare original PDF and translated HTML side by side:
1. Read a few pages of the original PDF (Read tool, pages parameter)
2. Take screenshot of the HTML (if /browse available)
3. Check: layout match, no missing content, images present, style fidelity
4. Fix discrepancies → iterate STEP 4
## Decision: OCR vs Native PDF
```dot
digraph ocr_check {
Check [shape=diamond, label="Does PDF have\nreliable text layer?"];
Native [shape=box, label="Can use marker\n+ Claude translate"];
OCR [shape=box, label="Use Vision pipeline\n(this skill)"];
Test [shape=box, label="Copy text from PDF.\nGarbled or missing?"];
Check -> Test [label="unsure"];
Test -> OCR [label="yes"];
Test -> Native [label="no, text is clean"];
Check -> Native [label="native PDF"];
Check -> OCR [label="scanned/OCR"];
}
```
If the PDF has a clean text layer, `marker` (pip install marker-pdf) is faster. This skill's Vision pipeline is for when the text layer is unreliable.
## Common Mistakes
| Mistake | Fix |
|---|---|
| Using OCR text layer from scanned PDF | Read page images with Vision instead |
| Translating page-by-page without glossary | Maintain cross-page term consistency |
| Generic HTML that doesn't match original style | Extract style brief first (STEP 2) |
| Word-for-word translation | Adapt sentence structure to target language |
| Forgetting `prefers-reduced-motion` or print styles | Include in HTML output |
| Images as decoration only | Preserve original placement and sizing |