diff --git a/lib/profiles/full.profile b/lib/profiles/full.profile index 08b7de3..4cf7b1e 100644 --- a/lib/profiles/full.profile +++ b/lib/profiles/full.profile @@ -57,9 +57,10 @@ canary qa qa-only -# === Docs ============================================================ +# === Docs + translation ============================================== doc personal document-release +pdf-translate personal # === Session hygiene + memory ======================================== close personal diff --git a/skills/pdf-translate/SKILL.md b/skills/pdf-translate/SKILL.md new file mode 100644 index 0000000..7553ccf --- /dev/null +++ b/skills/pdf-translate/SKILL.md @@ -0,0 +1,162 @@ +--- +name: pdf-translate +description: Use when translating a PDF (especially OCR or image-based) to another language and producing faithful HTML output. Handles image extraction, layout preservation, contextual translation, and style-matched reconstruction. Triggers on "translate this PDF", "PDF en francais", "convert PDF to HTML translated", "traduire ce document". +disable-model-invocation: true +--- + +# PDF Translate + +Translate a PDF into another language and produce an HTML document that preserves the original layout, images, and visual style. Optimized for OCR/image-based PDFs where the text layer is unreliable. + +## Pipeline + +```dot +digraph pipeline { + rankdir=LR; + PDF [shape=folder]; + Images [shape=box, label="Page PNGs\n+ embedded images"]; + Analysis [shape=box, label="Claude Vision\nread + translate\n+ layout map"]; + HTML [shape=box, label="Faithful HTML\n/design-html"]; + QA [shape=diamond, label="Visual QA\nPDF vs HTML"]; + + PDF -> Images [label="STEP 1"]; + Images -> Analysis [label="STEP 2-3"]; + Analysis -> HTML [label="STEP 4"]; + HTML -> QA [label="STEP 5"]; + QA -> Analysis [label="fix", style=dashed]; +} +``` + +## STEP 0: Dependencies + +Check before starting. Install what's missing. + +```bash +# Option A: poppler (lighter) +command -v pdftoppm && echo "OK" || echo "INSTALL: sudo apt install poppler-utils" + +# Option B: PyMuPDF (more powerful — extracts embedded images with coordinates) +python3 -c "import fitz; print('OK')" 2>/dev/null || echo "INSTALL: pip install pymupdf" +``` + +Prefer PyMuPDF if both available — it extracts embedded images + gives page dimensions. + +## STEP 1: PDF → Page Images + Embedded Assets + +```bash +# Create working directory +mkdir -p pdf-translate-work/{pages,assets} + +# Convert pages to high-res PNGs +pdftoppm -png -r 300 input.pdf pdf-translate-work/pages/page +# OR with PyMuPDF: +python3 -c " +import fitz +doc = fitz.open('input.pdf') +for i, page in enumerate(doc): + pix = page.get_pixmap(dpi=300) + pix.save(f'pdf-translate-work/pages/page-{i+1:03d}.png') + for img_idx, img in enumerate(page.get_images(full=True)): + xref = img[0] + base = doc.extract_image(xref) + with open(f'pdf-translate-work/assets/img-p{i+1}-{img_idx+1}.{base[\"ext\"]}', 'wb') as f: + f.write(base['image']) +" +``` + +## STEP 2: First Pass — Style Analysis + +Read page 1 (and optionally 2-3 more) with Claude Vision. Extract: + +- **Typography**: font style (serif/sans), heading sizes, body size, weight +- **Colors**: background, text, accent, header colors +- **Layout**: single/multi column, margins, header/footer pattern +- **Spacing**: line height, paragraph gaps, section gaps +- **Special elements**: callout boxes, sidebars, tables, captions, footnotes + +Output a style brief — this feeds into STEP 4. + +## STEP 3: Page-by-Page Read + Translate + +For each page image, use Claude Vision (Read tool on PNG): + +1. **Read** the text content from the image (ignore OCR text layer) +2. **Map layout**: identify text blocks, headings, images, tables, their relative positions +3. **Translate** to target language preserving: + - Register and tone (formal/informal/technical) + - Technical terms (keep original in parentheses on first occurrence if ambiguous) + - Sentence structure adapted to target language (not word-for-word) +4. **Note** image references: what each image shows, where it sits relative to text + +### Cross-page context + +Maintain a running glossary of translated terms across pages. If page 1 translates "stakeholder" as "partie prenante", every subsequent page must use the same term. + +Output per page: +```markdown +## Page N + +### Layout +[column structure, image positions] + +### Content (translated) +[translated text with markdown structure] + +### Images +- img-pN-1.png: [description], position: [top-right / inline / full-width] +``` + +## STEP 4: HTML Reconstruction + +Invoke `/design-html` (or `/frontend-design`) with: + +1. The style brief from STEP 2 +2. All translated page content from STEP 3 +3. Extracted images from `pdf-translate-work/assets/` + +Requirements for the HTML: +- Single self-contained HTML file (inline CSS, base64 images or relative paths) +- Match original typography feel (use closest web-safe or Google Font) +- Preserve column layout, spacing, color scheme +- Images at original positions with proper sizing +- Print-friendly: `@media print` styles, page breaks where original had them +- Responsive: readable on screen, faithful on print + +## STEP 5: Visual QA + +Compare original PDF and translated HTML side by side: + +1. Read a few pages of the original PDF (Read tool, pages parameter) +2. Take screenshot of the HTML (if /browse available) +3. Check: layout match, no missing content, images present, style fidelity +4. Fix discrepancies → iterate STEP 4 + +## Decision: OCR vs Native PDF + +```dot +digraph ocr_check { + Check [shape=diamond, label="Does PDF have\nreliable text layer?"]; + Native [shape=box, label="Can use marker\n+ Claude translate"]; + OCR [shape=box, label="Use Vision pipeline\n(this skill)"]; + Test [shape=box, label="Copy text from PDF.\nGarbled or missing?"]; + + Check -> Test [label="unsure"]; + Test -> OCR [label="yes"]; + Test -> Native [label="no, text is clean"]; + Check -> Native [label="native PDF"]; + Check -> OCR [label="scanned/OCR"]; +} +``` + +If the PDF has a clean text layer, `marker` (pip install marker-pdf) is faster. This skill's Vision pipeline is for when the text layer is unreliable. + +## Common Mistakes + +| Mistake | Fix | +|---|---| +| Using OCR text layer from scanned PDF | Read page images with Vision instead | +| Translating page-by-page without glossary | Maintain cross-page term consistency | +| Generic HTML that doesn't match original style | Extract style brief first (STEP 2) | +| Word-for-word translation | Adapt sentence structure to target language | +| Forgetting `prefers-reduced-motion` or print styles | Include in HTML output | +| Images as decoration only | Preserve original placement and sizing |