Skip to content

fix(deps): update pdf-extract 0.7 -> 0.8 to fix PDF parsing crashes (#4)#5

Merged
andrehrferreira merged 1 commit into
mainfrom
fix/issue-4-pdf-extract-0.8
Jun 18, 2026
Merged

fix(deps): update pdf-extract 0.7 -> 0.8 to fix PDF parsing crashes (#4)#5
andrehrferreira merged 1 commit into
mainfrom
fix/issue-4-pdf-extract-0.8

Conversation

@andrehrferreira

Copy link
Copy Markdown
Contributor

Description

Certain PDF files caused the application to crash during parsing with pdf-extract 0.7.x. This bumps pdf-extract to 0.8 (resolves to 0.8.2), which fixes the upstream parsing bugs. The extraction API (extract_text, extract_text_from_mem) is unchanged, so no source changes were needed. Also bumps the crate version to 0.3.3 and updates the CHANGELOG.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring

Related Issue

Fixes #4

Changes Made

  • Updated pdf-extract dependency from 0.7 to 0.8 (resolves to 0.8.2) in Cargo.toml
  • Bumped crate version from 0.3.2 to 0.3.3
  • Added a 0.3.3 bugfix entry to CHANGELOG.md

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • Benchmarks run (if performance-related)

Verified locally:

  • cargo build succeeds with pdf-extract 0.8.2 — no API changes required.
  • Full test suite passes (88 passed; 0 failed).
  • Manually re-ran the test_pdf_extract example against data/1706.03762v7.pdf; the PDF parses and extracts text without crashing.

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Performance Impact

  • Conversion speed: No measurable change (dependency-only update)
  • Memory usage: No measurable change
  • Binary size: Negligible change

Additional Notes

This is a dependency-only fix — no source code changes. pdf-extract 0.10.0 is also available, but this PR stays within the requested 0.8.x line for a minimal, low-risk update. Note: the extraction example still prints upstream Unicode mismatch / missing char warnings from pdf-extract, which are non-fatal and unrelated to the crash.

Certain PDF files caused the application to crash during parsing with
pdf-extract 0.7.x. Bump pdf-extract to 0.8 (resolves to 0.8.2), which
fixes the upstream parsing bugs. The extraction API (extract_text,
extract_text_from_mem) is unchanged, so no source changes were needed.

Bump version to 0.3.3 and update CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@andrehrferreira andrehrferreira merged commit 5241af0 into main Jun 18, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Some PDF will raise application crash

1 participant