Skip to content

feat: improve PDF extraction for financial reports and garbled layouts#4

Open
ivanvanderbyl wants to merge 5 commits into
mainfrom
feat-improve-financial-report
Open

feat: improve PDF extraction for financial reports and garbled layouts#4
ivanvanderbyl wants to merge 5 commits into
mainfrom
feat-improve-financial-report

Conversation

@ivanvanderbyl

@ivanvanderbyl ivanvanderbyl commented Mar 10, 2026

Copy link
Copy Markdown
Owner

Summary

  • improve financial-report extraction with region-aware table rendering, safer subtitle/note handling, and better preserved leading table columns
  • add suspicious local block recovery for rotated and fragmented text, including recovered issue-140 table output and conservative garble cleanup
  • add high-confidence encoding-noise suppression for issue-192 so structured anchors are preserved while low-confidence gibberish is dropped

Test Plan

  • go test ./... -count=1
  • go run ./cmd/pdfmarkdown --input '/Users/ivanvanderbyl/dev/Alcova-AI/pdf-markdown/testdata/issue-140-example.pdf'
  • go run ./cmd/pdfmarkdown --input '/Users/ivanvanderbyl/dev/Alcova-AI/pdf-markdown/testdata/issue-192-example.pdf'

@ivanvanderbyl

Copy link
Copy Markdown
Owner Author

This change is part of the following stack:

Change managed by git-spice.

@ivanvanderbyl ivanvanderbyl changed the title feat: improve financial report extraction with rotated table recovery and dash-prefix handling feat: improve PDF extraction for financial reports and garbled layouts Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant