Releases · scribeocr/scribe.js

27 May 07:37

v0.12.0

95c2f4b

v0.12.0 Latest

Latest

This release makes it possible to open and work with multiple documents at the same time. This required updating the interface, so this is a breaking release that likely requires changes to your code. See Migration below. New Guide, API Reference, and CLI Reference.

Highlights

ScribeDoc API. scribe.openDocument(files) returns a document object you operate on directly. Multiple documents can be open at once. (Guide)
Settings off globals. Per-operation options moved from scribe.opt onto the method or ScribeDoc that uses them. (Configuration)
Faster bundled OCR. The updated internal OCR model runs significantly faster (0-30% depending on corpus).

Migration

// Before
await scribe.importFiles(files);
await scribe.recognize({ langs: ['eng'] });
await scribe.download('pdf', 'out.pdf');

// After
const doc = await scribe.openDocument(files);
await doc.recognize({ langs: ['eng'] });
await doc.download('pdf', 'out.pdf');
await doc.terminate();

The removed module-level functions (importFiles, recognize, download, exportData, addHighlights, clear, compareOCR, convertOCRPage, evalOCRPage, extractInternalPDFText, …) all have ScribeDoc equivalents. scribe.extractText() is unchanged.

Full Changelog: v0.11.10...v0.12.0

Assets 2

06 May 03:43

Balearica

v0.11.3

541bf40

v0.11.3

What's Changed

Fixed significant number of bugs in new PDF codebase released in v0.11.0.

Full Changelog: v0.11.0...v0.11.3

Assets 2

04 May 06:37

Balearica

v0.11.0

bf1abb8

v0.11.0

What's Changed

Implemented new JavaScript-native PDF parsing + rendering code.
Switched Node.js canvas library from canvaskit-wasm to new library.
- Significantly improved performance for image rendering on Node.js.
Many other minor changes. This release is a major refactor. See changelog for details.

Full Changelog: v0.10.1...v0.11.0

Assets 2

14 Mar 22:30

Balearica

v0.10.1

6e3a076

v0.10.1

What's Changed

Fixed bug with .pdf export where existing invisible text layer was included alongside new invisible text layer.
Highlight annotations are omitted when rendering pages for recognition and re-added upon export.
- This should produce a small improvement to recognition accuracy in highlighted documents.
- Adding new highlight annotations will be supported in a future version.
Improvements to support for various third party OCR formats.
Misc minor changes and bug fixes.

Assets 2

08 Feb 02:31

Balearica

v0.10.0

36db01a

v0.10.0

What's Changed

Added import/export support for ALTO XML
Improved recognition speed for internal OCR model
Many small bug fixes and performance improvements.

Full Changelog: v0.9.3...v0.10.0

Assets 2

15 Nov 07:28

Balearica

v0.9.3

b3bfe2c

v0.9.3

What's Changed

Fixed bug causing text layer in PDF exports to be broken (#58)
- This issue impacts all PDFs created with two patch releases from the last ~week (0.9.1 and 0.9.2). Anybody using those versions should update ASAP.

Full Changelog: v0.9.2...v0.9.3

Assets 2

14 Nov 04:42

Balearica

v0.9.2

73f9d81

v0.9.2

What's Changed

Fixed bug causing crash on single-core systems (#56)
Updated scribe.opt.workerN option to cap workers created for PDF rendering

Full Changelog: v0.9.1...v0.9.2

Assets 2

07 Nov 07:28

Balearica

v0.9.1

4cfa415

v0.9.1

What's Changed

Various updates to experimental and debugging-related features.
- None of the documented features should change with this release.

Full Changelog: v0.9.0...v0.9.1

Assets 2

08 Sep 08:15

Balearica

v0.9.0

7f10834

v0.9.0

What's Changed

Added URW Gothic font
Added Deno support
Updated .html export format
- This format contains a .html file that should closely resemble the original document.
- This should be useful for converting .pdf files to a format that can be displayed natively in the browser.
Added experimental .txt import format
- For obvious reasons, importing .txt files will not work with most operations.
- This mode is currently exclusively useful for development/debugging purposes and making basic .pdf files from .txt files.
Performance improvements to PDF exports
Various refactoring and minor updates.

Full Changelog: v0.8.0...v0.9.0

Assets 2

09 Mar 09:39

Balearica

v0.8.0

6e179e2

v0.8.0

What's Changed

Added scribe CLI command
- If scribe.js is installed globally (npm i -g scribe.js-ocr), the scribe command can be used to process documents from the command line.
  - For example, scribe recognize analyst_report.png runs OCR on an image and saves the result as a PDF.
- This feature is still experimental and command/argument names and features may change without warning.
Added new intermediate data format .scribe for storing and loading document data.
- Given OCR is computationally expensive, it is often desirable to save results for later use without losing data.
- By saving results to .scribe files, results can be re-loaded later (e.g. to export with slightly different settings).
  - While several other output formats can be re-loaded later (notably .hocr and .pdf), only .scribe can be re-loaded without any data being lost in the export/import process.
  - .scribe files only contain the text layer; they do not contain embedded images or PDF files.
    - .scribe files can be loaded alongside image/PDF files to restore both image and text data.

Full Changelog: v0.7.4...v0.8.0

Assets 2

Releases: scribeocr/scribe.js

v0.12.0

Highlights

Migration

Uh oh!

v0.11.3

What's Changed

Uh oh!

v0.11.0

What's Changed

Uh oh!

v0.10.1

What's Changed

Uh oh!

v0.10.0

What's Changed

Uh oh!

v0.9.3

What's Changed

Uh oh!

v0.9.2

What's Changed

Uh oh!

v0.9.1

What's Changed

Uh oh!

v0.9.0

What's Changed

Uh oh!

v0.8.0

What's Changed

Uh oh!