Preflight checks for document extraction pipelines — validate, render, and screen PDFs before they reach your LLM. Pure-Python wheel, in-memory only.
-
Updated
Apr 13, 2026 - Python
Preflight checks for document extraction pipelines — validate, render, and screen PDFs before they reach your LLM. Pure-Python wheel, in-memory only.
Mi herramienta de escritorio para comprimir archivos PDF
A Python tool for extracting highlighted text from PDF files while preserving formatting attributes (headers, bold, italic) and removing unwanted line breaks and page breaks. Perfect for integrating with content management systems.
20 PDF classifiers, one verdict matrix: should this PDF go through fast text extraction, or do we need OCR?
PDF tables, word boxes, form fields & page render for DuckDB (Python, pdfplumber)
Add a description, image, and links to the pypdfium2 topic page so that developers can more easily learn about it.
To associate your repository with the pypdfium2 topic, visit your repo's landing page and select "manage topics."