Skip to content

dreamyoungs/trex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฆ– TREX

๐ŸŒ ํ•œ๊ตญ์–ด | ๆ—ฅๆœฌ่ชž | Espaรฑol

Table Rust EXtractor โ€” Extract tables from PDFs with zero native dependencies. Built in Rust, usable from Node.js and Python.

MIT/Apache-2.0

Quick Start

Node.js

Two packages are available โ€” choose the one that fits your use case:

Package Install How it works
@dreamyoungs/trex npm i @dreamyoungs/trex CLI wrapper โ€” auto-downloads TREX binary
@dreamyoungs/trex-node npm i @dreamyoungs/trex-node Native NAPI-RS binding โ€” no subprocess
// Both packages share the same API
const { extract } = require("@dreamyoungs/trex"); // CLI wrapper
// const { extract } = require("@dreamyoungs/trex-node"); // or native binding

const tables = await extract("invoice.pdf", {
    pages: [1, 2],
    mode: "auto"
});

console.log(tables[0].headers); // ["Item", "Qty", "Unit Price", "Amount"]
console.log(tables[0].rows); // [["A4 Paper", "10", "5,000", "50,000"], ...]

CLI

trex extract invoice.pdf --format json
trex extract invoice.pdf --format csv > output.csv
trex extract invoice.pdf --pages 3,5,7 --mode lattice

Docker

docker build -t trex .
docker run --rm -p 8080:8080 trex

curl -X POST http://localhost:8080/extract \
  -F "[email protected]" \
  -F "mode=auto" \
  -F "format=json"

Why TREX?

PDF table extraction has long been dominated by the Python ecosystem โ€” tools like Camelot, Tabula, and pdfplumber all require heavy runtimes (OpenCV, Ghostscript, Java) and struggle with memory limits in serverless environments.

TREX takes a different approach:

Python tools TREX
Runtime Python + OpenCV + Ghostscript Single Rust binary
Memory 200โ€“500 MB+ ~30 MB
Container size 500 MB+ ~15 MB
Language support Python only Rust, Node.js, Python, Docker
Improvement loop Manual DL Router + ML training scripts

Key Advantages

  • ๐Ÿš€ Lightweight & Fast โ€” Single binary, no native dependencies. Runs instantly in serverless containers (Cloud Run, Lambda) without OOM issues.
  • ๐Ÿง  Improvable with DL โ€” An optional DL Router can be retrained on extraction failures to improve table detection accuracy. You run the training pipeline manually or via your own scheduler (e.g. GitHub Actions cron).
  • ๐ŸŒ Multi-Runtime โ€” Use TREX from Node.js (npm install), Python (pip install), Docker REST API, or the CLI. The same Rust core powers all of them.
  • ๐Ÿ”ง Production-Ready Telemetry โ€” Built-in event logging (--event-log) captures extraction metrics for production monitoring. Collected events can be fed into the ML training pipeline to retrain the router model.

Parsing Engine

TREX detects tables using three strategies:

Lattice โ€” For tables with visible gridlines. Detects line segments and computes cell regions from intersections. No OpenCV required.

Stream โ€” For tables without gridlines. Clusters text box coordinates to infer columns and rows.

DL Router (optional) โ€” A lightweight ONNX model analyzes page features and routes each page to the optimal strategy (Lattice / Stream / Blend). When no model is provided, a built-in heuristic router is used instead.

graph LR
    A[PDF] --> B{DL Router}
    B -->|gridlines| C[Lattice]
    B -->|no lines| D[Stream]
    B -->|mixed| G[Blend]
    C --> E[Cell Merge]
    D --> E
    G --> E
    E --> F[JSON / CSV]
Loading

Feedback Loop

Collect extraction events in production and retrain the router model in batch:

# 1. Run TREX with event logging enabled
trex extract report.pdf \
  --event-log logs/extraction_events.ndjson \
  --event-document-key "doc-123" \
  --event-training-opt-in

# 2. Retrain the router model
python3 ml/update_router.py \
  --events logs/extraction_events.ndjson \
  --work-dir ml/artifacts/update

This is not an always-on server โ€” run it manually or via a scheduler (e.g. GitHub Actions cron). See ml/README.md for the full pipeline and ml/MODEL_CONTRACT.md for model I/O specs.


Usage Details

CLI Options

trex extract <file.pdf> [OPTIONS]

Options:
  --pages <1,3,5 | 1-10>     Pages to process
  --mode <auto|lattice|stream|dl>  Parsing mode (default: auto)
  --format <json|csv>         Output format (default: json)
  --dl-model <path.onnx>      DL router model path (requires --features dl)
  --dl-min-confidence <0.55>  Min confidence for DL routing
  --event-log <path.ndjson>   Write extraction events for feedback loop
  --event-document-key <key>  Document identifier for events
  --event-tenant-id <id>      Tenant identifier
  --event-training-opt-in     Allow this data for model training

Language output follows system locale (LC_ALL, LANG). Override with TREX_LANG=ko or TREX_LANG=en.

Node.js

@dreamyoungs/trex โ€” CLI wrapper (recommended)

npm install @dreamyoungs/trex

Auto-downloads a platform TREX binary on install. If download fails, set TREX_BIN or pass binPath.

const { extract, extractCsv, extractFromBuffer } = require("@dreamyoungs/trex");

const tables = await extract("invoice.pdf", {
    pages: [1, 2],
    mode: "auto"
    // binPath: "/usr/local/bin/trex",  // optional: override binary path
});

@dreamyoungs/trex-node โ€” Native binding (faster)

npm install @dreamyoungs/trex-node

NAPI-RS native binding โ€” calls Rust directly with no subprocess overhead. Same API as the CLI wrapper.

const { extract } = require("@dreamyoungs/trex-node");
const tables = extract("invoice.pdf", { mode: "Auto" }); // synchronous

Python

import trex

tables = trex.extract("invoice.pdf", pages=[1, 2])
print(tables[0].rows)

Docker REST API

docker build -t trex .
docker run --rm -p 8080:8080 trex

Per-request language: Accept-Language: ko-KR header.


Design Principles

TREX does one thing: converts the physical table layout on a page into a 2D array.

Things it intentionally does not do: LLM-based analysis, cross-page table merging, header normalization, or data type inference. These belong in the application layer consuming TREX's output.


Tech Stack

Area Choice Note
Language Rust Core engine
PDF Parser lopdf / pdf-extract Low-level PDF access
DL Runtime tract-onnx (optional) ONNX model inference
HTTP Server Axum Docker REST API
Node.js CLI wrapper + NAPI-RS npm/trex, bindings/node
Python Bindings PyO3 + maturin pip install support

Roadmap

  • Lattice mode (gridline-based extraction)
  • Stream mode (coordinate-based inference)
  • DL Router with feedback pipeline
  • CLI interface
  • Docker REST API server
  • Node.js npm wrapper + NAPI-RS bindings
  • PyO3 Python bindings
  • WebAssembly build (in-browser)
  • Benchmark suite with real-world comparisons

License

MIT OR Apache-2.0



๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด

Table Rust EXtractor โ€” ์™ธ๋ถ€ ์˜์กด์„ฑ ์—†์ด PDF์—์„œ ํ‘œ๋ฅผ ์ถ”์ถœํ•˜๋Š” Rust ์—”์ง„. Node.js์™€ Python์—์„œ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋น ๋ฅธ ์‹œ์ž‘

Node.js

๋‘ ๊ฐ€์ง€ ํŒจํ‚ค์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค โ€” ์šฉ๋„์— ๋งž๊ฒŒ ์„ ํƒํ•˜์„ธ์š”:

ํŒจํ‚ค์ง€ ์„ค์น˜ ๋ฐฉ์‹
@dreamyoungs/trex npm i @dreamyoungs/trex CLI ๋ž˜ํผ โ€” TREX ๋ฐ”์ด๋„ˆ๋ฆฌ ์ž๋™ ๋‹ค์šด๋กœ๋“œ
@dreamyoungs/trex-node npm i @dreamyoungs/trex-node NAPI-RS ๋„ค์ดํ‹ฐ๋ธŒ ๋ฐ”์ธ๋”ฉ โ€” ์„œ๋ธŒํ”„๋กœ์„ธ์Šค ์—†์Œ
// ๋‘ ํŒจํ‚ค์ง€ ๋ชจ๋‘ ๋™์ผํ•œ API
const { extract } = require("@dreamyoungs/trex"); // CLI ๋ž˜ํผ
// const { extract } = require("@dreamyoungs/trex-node"); // ๋˜๋Š” ๋„ค์ดํ‹ฐ๋ธŒ ๋ฐ”์ธ๋”ฉ

const tables = await extract("invoice.pdf", {
    pages: [1, 2],
    mode: "auto"
});

console.log(tables[0].headers); // ["ํ•ญ๋ชฉ", "์ˆ˜๋Ÿ‰", "๋‹จ๊ฐ€", "๊ธˆ์•ก"]
console.log(tables[0].rows); // [["A4 ์šฉ์ง€", "10", "5,000", "50,000"], ...]

CLI

trex extract invoice.pdf --format json
trex extract invoice.pdf --format csv > output.csv
trex extract invoice.pdf --pages 3,5,7 --mode lattice

Docker

docker build -t trex .
docker run --rm -p 8080:8080 trex

curl -X POST http://localhost:8080/extract \
  -F "[email protected]" \
  -F "mode=auto" \
  -F "format=json"

์™œ TREX์ธ๊ฐ€

PDF ํ…Œ์ด๋ธ” ์ถ”์ถœ์€ ์˜ค๋žซ๋™์•ˆ ํŒŒ์ด์ฌ ์ƒํƒœ๊ณ„๊ฐ€ ๋…์ ํ•ด ์™”์Šต๋‹ˆ๋‹ค. Camelot, Tabula, pdfplumber ๋“ฑ ๋ชจ๋“  ๋„๊ตฌ๊ฐ€ ๋ฌด๊ฑฐ์šด ๋Ÿฐํƒ€์ž„(OpenCV, Ghostscript, Java)์„ ํ•„์š”๋กœ ํ•˜๋ฉฐ, ์„œ๋ฒ„๋ฆฌ์Šค ํ™˜๊ฒฝ์—์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ œํ•œ์œผ๋กœ ๋Œ€์šฉ๋Ÿ‰ ์ฒ˜๋ฆฌ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

TREX๋Š” ๋‹ค๋ฅธ ์ ‘๊ทผ ๋ฐฉ์‹์„ ํƒํ•ฉ๋‹ˆ๋‹ค:

๊ธฐ์กด ํŒŒ์ด์ฌ ๋„๊ตฌ TREX
๋Ÿฐํƒ€์ž„ Python + OpenCV + Ghostscript ๋‹จ์ผ Rust ๋ฐ”์ด๋„ˆ๋ฆฌ
๋ฉ”๋ชจ๋ฆฌ 200โ€“500 MB+ ~30 MB
์ปจํ…Œ์ด๋„ˆ ํฌ๊ธฐ 500 MB+ ~15 MB
์–ธ์–ด ์ง€์› Python๋งŒ ๊ฐ€๋Šฅ Rust, Node.js, Python, Docker
๊ฐœ์„  ๋ฃจํ”„ ์ˆ˜๋™ DL Router + ML ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ

ํ•ต์‹ฌ ์žฅ์ 

  • ๐Ÿš€ ๊ฒฝ๋Ÿ‰ & ๊ณ ์† โ€” ๋‹จ์ผ ๋ฐ”์ด๋„ˆ๋ฆฌ, ๋„ค์ดํ‹ฐ๋ธŒ ์˜์กด์„ฑ ์ œ๋กœ. ์„œ๋ฒ„๋ฆฌ์Šค ์ปจํ…Œ์ด๋„ˆ(Cloud Run, Lambda)์—์„œ OOM ์—†์ด ์ฆ‰์‹œ ์‹คํ–‰.
  • ๐Ÿง  DL ๊ธฐ๋ฐ˜ ๊ฐœ์„  ๊ฐ€๋Šฅ โ€” ์„ ํƒ์  DL Router๋ฅผ ์ถ”์ถœ ์‹คํŒจ ๋ฐ์ดํ„ฐ๋กœ ์žฌํ•™์Šตํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์€ ์ˆ˜๋™ ์‹คํ–‰ ๋˜๋Š” ์Šค์ผ€์ค„๋Ÿฌ(์˜ˆ: GitHub Actions cron)๋กœ ์šด์˜ํ•ฉ๋‹ˆ๋‹ค.
  • ๐ŸŒ ๋ฉ€ํ‹ฐ ๋Ÿฐํƒ€์ž„ โ€” Node.js(npm install), Python(pip install), Docker REST API, CLI ๋ชจ๋‘ ์ง€์›. ๋™์ผํ•œ Rust ์ฝ”์–ด๊ฐ€ ๋ชจ๋“  ํ™˜๊ฒฝ์„ ๊ตฌ๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”ง ํ”„๋กœ๋•์…˜ ๋ ˆ๋”” ํ…”๋ ˆ๋ฉ”ํŠธ๋ฆฌ โ€” ๋‚ด์žฅ ์ด๋ฒคํŠธ ๋กœ๊ทธ(--event-log)๋กœ ์ถ”์ถœ ๋ฉ”ํŠธ๋ฆญ์„ ์บก์ฒ˜ํ•˜์—ฌ ๋ชจ๋‹ˆํ„ฐ๋ง์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜์ง‘๋œ ์ด๋ฒคํŠธ๋ฅผ ML ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์— ๋„ฃ์–ด ๋ผ์šฐํ„ฐ ๋ชจ๋ธ์„ ์žฌํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํŒŒ์‹ฑ ์—”์ง„

TREX๋Š” ์„ธ ๊ฐ€์ง€ ์ „๋žต์œผ๋กœ ํ‘œ๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.

Lattice โ€” ๊ฒฉ์ž์„ ์ด ์žˆ๋Š” ํ‘œ๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์„ ๋ถ„์„ ํƒ์ง€ํ•˜๊ณ  ๊ต์ฐจ์ ์œผ๋กœ๋ถ€ํ„ฐ ์…€ ์˜์—ญ์„ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. OpenCV ๋ถˆํ•„์š”.

Stream โ€” ๊ฒฉ์ž์„ ์ด ์—†๋Š” ํ‘œ๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ๋ฐ•์Šค ์ขŒํ‘œ๋ฅผ ๊ตฐ์ง‘ํ™”ํ•˜์—ฌ ์—ด๊ณผ ํ–‰์„ ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค.

DL Router (์„ ํƒ) โ€” ๊ฒฝ๋Ÿ‰ ONNX ๋ชจ๋ธ์ด ํŽ˜์ด์ง€ ํ”ผ์ฒ˜๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์ตœ์  ์ „๋žต(Lattice / Stream / Blend)์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์—†์œผ๋ฉด ๋‚ด์žฅ ํœด๋ฆฌ์Šคํ‹ฑ ๋ผ์šฐํ„ฐ๊ฐ€ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

graph LR
    A[PDF] --> B{DL Router}
    B -->|๊ฒฉ์ž์„ | C[Lattice]
    B -->|ํ…์ŠคํŠธ๋งŒ| D[Stream]
    B -->|ํ˜ผํ•ฉ| G[Blend]
    C --> E[Cell Merge]
    D --> E
    G --> E
    E --> F[JSON / CSV]
Loading

ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„

์šด์˜ ํ™˜๊ฒฝ์—์„œ ์ถ”์ถœ ์ด๋ฒคํŠธ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ๋ผ์šฐํ„ฐ ๋ชจ๋ธ์„ ๋ฐฐ์น˜ ์žฌํ•™์Šตํ•ฉ๋‹ˆ๋‹ค:

# 1. ์ด๋ฒคํŠธ ๋กœ๊ทธ ํ™œ์„ฑํ™”ํ•˜์—ฌ ์‹คํ–‰
trex extract report.pdf \
  --event-log logs/extraction_events.ndjson \
  --event-document-key "doc-123" \
  --event-training-opt-in

# 2. ๋ผ์šฐํ„ฐ ๋ชจ๋ธ ์žฌํ•™์Šต
python3 ml/update_router.py \
  --events logs/extraction_events.ndjson \
  --work-dir ml/artifacts/update

์ƒ์‹œ ์‹คํ–‰ ์„œ๋ฒ„๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค โ€” ์ˆ˜๋™ ์‹คํ–‰ ๋˜๋Š” ์Šค์ผ€์ค„๋Ÿฌ(์˜ˆ: GitHub Actions cron)๋กœ ์šด์˜ํ•˜์„ธ์š”. ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์€ ml/README.md, ๋ชจ๋ธ I/O ์ŠคํŽ™์€ ml/MODEL_CONTRACT.md๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.


์ƒ์„ธ ์‚ฌ์šฉ๋ฒ•

CLI ์˜ต์…˜

trex extract <file.pdf> [OPTIONS]

Options:
  --pages <1,3,5 | 1-10>     ์ฒ˜๋ฆฌํ•  ํŽ˜์ด์ง€
  --mode <auto|lattice|stream|dl>  ํŒŒ์‹ฑ ๋ชจ๋“œ (๊ธฐ๋ณธ: auto)
  --format <json|csv>         ์ถœ๋ ฅ ํ˜•์‹ (๊ธฐ๋ณธ: json)
  --dl-model <path.onnx>      DL ๋ผ์šฐํ„ฐ ๋ชจ๋ธ ๊ฒฝ๋กœ (--features dl ํ•„์š”)
  --dl-min-confidence <0.55>  DL ๋ผ์šฐํŒ… ์ตœ์†Œ ์‹ ๋ขฐ๋„
  --event-log <path.ndjson>   ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„์šฉ ์ด๋ฒคํŠธ ๊ธฐ๋ก
  --event-document-key <key>  ์ด๋ฒคํŠธ ๋ฌธ์„œ ์‹๋ณ„์ž
  --event-tenant-id <id>      ํ…Œ๋„ŒํŠธ ์‹๋ณ„์ž
  --event-training-opt-in     ํ•™์Šต ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๋™์˜

์ถœ๋ ฅ ์–ธ์–ด๋Š” ์‹œ์Šคํ…œ ๋กœ์ผ€์ผ(LC_ALL, LANG)์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. TREX_LANG=ko ๋˜๋Š” TREX_LANG=en์œผ๋กœ ๋ช…์‹œ์  ์ง€์ • ๊ฐ€๋Šฅ.

Node.js

@dreamyoungs/trex โ€” CLI ๋ž˜ํผ (๊ถŒ์žฅ)

npm install @dreamyoungs/trex

์„ค์น˜ ์‹œ ํ”Œ๋žซํผ TREX ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ์ž๋™ ๋‹ค์šด๋กœ๋“œ. ์‹คํŒจ ์‹œ TREX_BIN ๋˜๋Š” binPath๋กœ ์ง€์ •.

const { extract, extractCsv, extractFromBuffer } = require("@dreamyoungs/trex");

const tables = await extract("invoice.pdf", {
    pages: [1, 2],
    mode: "auto"
    // binPath: "/usr/local/bin/trex",  // ์„ ํƒ: ๋ฐ”์ด๋„ˆ๋ฆฌ ๊ฒฝ๋กœ ์ง์ ‘ ์ง€์ •
});

@dreamyoungs/trex-node โ€” ๋„ค์ดํ‹ฐ๋ธŒ ๋ฐ”์ธ๋”ฉ (๊ณ ์†)

npm install @dreamyoungs/trex-node

NAPI-RS ๋„ค์ดํ‹ฐ๋ธŒ ๋ฐ”์ธ๋”ฉ โ€” Rust๋ฅผ ์ง์ ‘ ํ˜ธ์ถœํ•˜์—ฌ ์„œ๋ธŒํ”„๋กœ์„ธ์Šค ์˜ค๋ฒ„ํ—ค๋“œ ์—†์Œ. CLI ๋ž˜ํผ์™€ ๋™์ผํ•œ API.

const { extract } = require("@dreamyoungs/trex-node");
const tables = extract("invoice.pdf", { mode: "Auto" }); // ๋™๊ธฐ ํ˜ธ์ถœ

Python

import trex

tables = trex.extract("invoice.pdf", pages=[1, 2])
print(tables[0].rows)

Docker REST API

docker build -t trex .
docker run --rm -p 8080:8080 trex

์š”์ฒญ ๋‹จ์œ„ ์–ธ์–ด: Accept-Language: ko-KR ํ—ค๋” ์‚ฌ์šฉ.


์„ค๊ณ„ ์›์น™

TREX๋Š” ํ•œ ๊ฐ€์ง€ ์ผ๋งŒ ํ•ฉ๋‹ˆ๋‹ค: ํŽ˜์ด์ง€ ์œ„์˜ ๋ฌผ๋ฆฌ์  ํ‘œ ๋ ˆ์ด์•„์›ƒ์„ 2D ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜.

์˜๋„์ ์œผ๋กœ ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ: LLM ๊ธฐ๋ฐ˜ ๋ถ„์„, ํŽ˜์ด์ง€ ๊ฐ„ ํ‘œ ๋ณ‘ํ•ฉ, ํ—ค๋” ์ •๊ทœํ™”, ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ถ”๋ก . ์ด๋Ÿฐ ํ›„์ฒ˜๋ฆฌ๋Š” TREX ์ถœ๋ ฅ์„ ์†Œ๋น„ํ•˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ ˆ์ด์–ด์—์„œ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


๊ธฐ์ˆ  ์Šคํƒ

์˜์—ญ ์„ ํƒ ๋น„๊ณ 
์–ธ์–ด Rust ์ฝ”์–ด ์—”์ง„
PDF ํŒŒ์„œ lopdf / pdf-extract ์ €์ˆ˜์ค€ PDF ๊ตฌ์กฐ ์ ‘๊ทผ
DL ๋Ÿฐํƒ€์ž„ tract-onnx (์„ ํƒ) ONNX ๋ชจ๋ธ ์ถ”๋ก 
HTTP ์„œ๋ฒ„ Axum Docker REST API
Node.js CLI ๋ž˜ํผ + NAPI-RS npm/trex, bindings/node
Python ๋ฐ”์ธ๋”ฉ PyO3 + maturin pip install ์ง€์›

๋กœ๋“œ๋งต

  • Lattice ๋ชจ๋“œ (๊ฒฉ์ž์„  ๊ธฐ๋ฐ˜ ์ถ”์ถœ)
  • Stream ๋ชจ๋“œ (์ขŒํ‘œ ๊ธฐ๋ฐ˜ ์ถ”๋ก )
  • DL Router + ํ”ผ๋“œ๋ฐฑ ํŒŒ์ดํ”„๋ผ์ธ
  • CLI ์ธํ„ฐํŽ˜์ด์Šค
  • Docker REST API ์„œ๋ฒ„
  • Node.js npm ๋ž˜ํผ + NAPI-RS ๋ฐ”์ธ๋”ฉ
  • PyO3 Python ๋ฐ”์ธ๋”ฉ
  • WebAssembly ๋นŒ๋“œ (๋ธŒ๋ผ์šฐ์ € ๋‚ด ๋™์ž‘)
  • ๋ฒค์น˜๋งˆํฌ ์Šค์œ„ํŠธ ๋ฐ ์‹ค์ธก ๋น„๊ต

๋ผ์ด์„ ์Šค

MIT OR Apache-2.0

About

๐Ÿฆ– Lightweight Rust engine for extracting tables from PDFs โ€” zero external dependencies, single binary.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Contributors