Skip to content

riddleling/docOCR

Repository files navigation

docOCR

docOCR is a macOS command-line OCR tool that converts document images into Markdown text. It can run as a batch CLI tool or as a local HTTP server for browser uploads and API clients.

Image

Features

  • Converts image files to Markdown text.
  • Writes batch OCR output next to each source image using the same basename and a .md extension.
  • Converts detected paragraphs, lists, and tables into Markdown when Apple's document recognition API identifies them.
  • Provides a local web UI for uploading an image and viewing OCR output.
  • Provides a JSON API for image upload and OCR response.
  • Uses Apple's RecognizeDocumentsRequest API, available on macOS 26+.
  • Performs OCR locally on the Mac. OCR recognition does not require sending images to an external network service.

The HTTP server is implemented with Vapor.

Requirements

  • macOS 26 or later.
  • Xcode / Swift toolchain that supports the package's Swift tools version.
  • Network access may be required during the first build so Swift Package Manager can fetch dependencies such as Vapor.

CLI Usage

Show help:

docOCR -h
docOCR --help

Show version:

docOCR -V
docOCR --version

Convert image files to Markdown:

docOCR ~/Desktop/book_imgs/*.jpg

This prints the OCR Markdown text to the terminal.

Write Markdown files next to the source images:

docOCR -o ~/Desktop/book_imgs/*.jpg

Each input file is written as a Markdown file next to the image:

~/Desktop/book_imgs/01.jpg -> ~/Desktop/book_imgs/01.md
~/Desktop/book_imgs/02.jpg -> ~/Desktop/book_imgs/02.md

Existing .md files with the same name are overwritten.

Start the HTTP server:

docOCR -s

By default, the server listens on port 8080.

Use a custom port:

docOCR -s -p 8000

The -s and -o modes are mutually exclusive.

HTTP Server

When the server is running, open:

http://0.0.0.0:8080

If you start the server with a custom port, use that port instead:

http://0.0.0.0:8000

The web page uses:

POST /upload

This route is intended for browser form uploads and returns an HTML result page.

API Usage

Use the JSON API endpoint:

POST /api/ocr

Example:

curl -X POST http://127.0.0.1:8000/api/ocr \
  -F "[email protected]"

The API also accepts image as the multipart field name:

curl -X POST http://127.0.0.1:8000/api/ocr \
  -F "[email protected]"

Successful response:

{
  "success": true,
  "message": "OK",
  "text": "OCR text..."
}

Error response:

{
  "success": false,
  "message": "Error message",
  "text": ""
}

Build

Build a debug executable:

swift build

Build a release executable:

swift build -c release

The release binary is generated at:

.build/release/docOCR

Install

Build the release binary:

swift build -c release

Install it somewhere on your PATH, for example:

install -m 755 .build/release/docOCR /usr/local/bin/docOCR

Then run:

docOCR -h

If /usr/local/bin is not writable or not on your PATH, choose another directory such as ~/bin and make sure that directory is included in your shell PATH.

Development

Run directly with SwiftPM:

swift run docOCR -o ~/Desktop/book_imgs/*.jpg
swift run docOCR -s -p 8000

macOS Shortcuts: Screenshot to Markdown

docOCR can also be used with the Shortcuts app on macOS to turn a screenshot into Markdown text.

In this workflow, the shortcut captures a screen selection, passes the screenshot image path to docOCR, reads the Markdown text from stdout, copies it to the clipboard, and then lets you paste the result into any text editor.

Then run the macOS shortcut:

screenshot_to_md

The shortcut flow is:

  1. Capture a screenshot.
  2. Save the screenshot as a temporary image file.
  3. Run docOCR <screenshot-image-path>.
  4. Read the OCR Markdown text from stdout and copy it to the clipboard.

Paste the result into your editor.

image3

Alternatively, the shortcut can call the /api/ocr API instead of running docOCR directly. Start the local server first:

docOCR -s

image2

Codex Skill

If you use Codex, you can install the companion skill for docOCR: dococr-skill

The skill gives Codex reusable context for docOCR CLI usage, local HTTP API calls, OCR execution, and troubleshooting.

About

macOS CLI and HTTP OCR tool for converting document images to Markdown.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages