OCR/image-based PDF pipeline: convert pages to PNGs, read with Claude Vision (bypasses unreliable OCR text layer), translate with cross-page glossary consistency, reconstruct faithful HTML via /design-html. 5 steps: deps check → page images + assets → style analysis → page-by-page read+translate → HTML reconstruction → visual QA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.8 KiB
| name | description | disable-model-invocation |
|---|---|---|
| pdf-translate | Use when translating a PDF (especially OCR or image-based) to another language and producing faithful HTML output. Handles image extraction, layout preservation, contextual translation, and style-matched reconstruction. Triggers on "translate this PDF", "PDF en francais", "convert PDF to HTML translated", "traduire ce document". | true |
PDF Translate
Translate a PDF into another language and produce an HTML document that preserves the original layout, images, and visual style. Optimized for OCR/image-based PDFs where the text layer is unreliable.
Pipeline
digraph pipeline {
rankdir=LR;
PDF [shape=folder];
Images [shape=box, label="Page PNGs\n+ embedded images"];
Analysis [shape=box, label="Claude Vision\nread + translate\n+ layout map"];
HTML [shape=box, label="Faithful HTML\n/design-html"];
QA [shape=diamond, label="Visual QA\nPDF vs HTML"];
PDF -> Images [label="STEP 1"];
Images -> Analysis [label="STEP 2-3"];
Analysis -> HTML [label="STEP 4"];
HTML -> QA [label="STEP 5"];
QA -> Analysis [label="fix", style=dashed];
}
STEP 0: Dependencies
Check before starting. Install what's missing.
# Option A: poppler (lighter)
command -v pdftoppm && echo "OK" || echo "INSTALL: sudo apt install poppler-utils"
# Option B: PyMuPDF (more powerful — extracts embedded images with coordinates)
python3 -c "import fitz; print('OK')" 2>/dev/null || echo "INSTALL: pip install pymupdf"
Prefer PyMuPDF if both available — it extracts embedded images + gives page dimensions.
STEP 1: PDF → Page Images + Embedded Assets
# Create working directory
mkdir -p pdf-translate-work/{pages,assets}
# Convert pages to high-res PNGs
pdftoppm -png -r 300 input.pdf pdf-translate-work/pages/page
# OR with PyMuPDF:
python3 -c "
import fitz
doc = fitz.open('input.pdf')
for i, page in enumerate(doc):
pix = page.get_pixmap(dpi=300)
pix.save(f'pdf-translate-work/pages/page-{i+1:03d}.png')
for img_idx, img in enumerate(page.get_images(full=True)):
xref = img[0]
base = doc.extract_image(xref)
with open(f'pdf-translate-work/assets/img-p{i+1}-{img_idx+1}.{base[\"ext\"]}', 'wb') as f:
f.write(base['image'])
"
STEP 2: First Pass — Style Analysis
Read page 1 (and optionally 2-3 more) with Claude Vision. Extract:
- Typography: font style (serif/sans), heading sizes, body size, weight
- Colors: background, text, accent, header colors
- Layout: single/multi column, margins, header/footer pattern
- Spacing: line height, paragraph gaps, section gaps
- Special elements: callout boxes, sidebars, tables, captions, footnotes
Output a style brief — this feeds into STEP 4.
STEP 3: Page-by-Page Read + Translate
For each page image, use Claude Vision (Read tool on PNG):
- Read the text content from the image (ignore OCR text layer)
- Map layout: identify text blocks, headings, images, tables, their relative positions
- Translate to target language preserving:
- Register and tone (formal/informal/technical)
- Technical terms (keep original in parentheses on first occurrence if ambiguous)
- Sentence structure adapted to target language (not word-for-word)
- Note image references: what each image shows, where it sits relative to text
Cross-page context
Maintain a running glossary of translated terms across pages. If page 1 translates "stakeholder" as "partie prenante", every subsequent page must use the same term.
Output per page:
## Page N
### Layout
[column structure, image positions]
### Content (translated)
[translated text with markdown structure]
### Images
- img-pN-1.png: [description], position: [top-right / inline / full-width]
STEP 4: HTML Reconstruction
Invoke /design-html (or /frontend-design) with:
- The style brief from STEP 2
- All translated page content from STEP 3
- Extracted images from
pdf-translate-work/assets/
Requirements for the HTML:
- Single self-contained HTML file (inline CSS, base64 images or relative paths)
- Match original typography feel (use closest web-safe or Google Font)
- Preserve column layout, spacing, color scheme
- Images at original positions with proper sizing
- Print-friendly:
@media printstyles, page breaks where original had them - Responsive: readable on screen, faithful on print
STEP 5: Visual QA
Compare original PDF and translated HTML side by side:
- Read a few pages of the original PDF (Read tool, pages parameter)
- Take screenshot of the HTML (if /browse available)
- Check: layout match, no missing content, images present, style fidelity
- Fix discrepancies → iterate STEP 4
Decision: OCR vs Native PDF
digraph ocr_check {
Check [shape=diamond, label="Does PDF have\nreliable text layer?"];
Native [shape=box, label="Can use marker\n+ Claude translate"];
OCR [shape=box, label="Use Vision pipeline\n(this skill)"];
Test [shape=box, label="Copy text from PDF.\nGarbled or missing?"];
Check -> Test [label="unsure"];
Test -> OCR [label="yes"];
Test -> Native [label="no, text is clean"];
Check -> Native [label="native PDF"];
Check -> OCR [label="scanned/OCR"];
}
If the PDF has a clean text layer, marker (pip install marker-pdf) is faster. This skill's Vision pipeline is for when the text layer is unreliable.
Common Mistakes
| Mistake | Fix |
|---|---|
| Using OCR text layer from scanned PDF | Read page images with Vision instead |
| Translating page-by-page without glossary | Maintain cross-page term consistency |
| Generic HTML that doesn't match original style | Extract style brief first (STEP 2) |
| Word-for-word translation | Adapt sentence structure to target language |
Forgetting prefers-reduced-motion or print styles |
Include in HTML output |
| Images as decoration only | Preserve original placement and sizing |