docOCR

docOCR is a macOS command-line OCR tool that converts document images into Markdown text. It can run as a batch CLI tool or as a local HTTP server for browser uploads and API clients.

Features

Converts image files to Markdown text.
Writes batch OCR output next to each source image using the same basename and a .md extension.
Converts detected paragraphs, lists, and tables into Markdown when Apple's document recognition API identifies them.
Provides a local web UI for uploading an image and viewing OCR output.
Provides a JSON API for image upload and OCR response.
Uses Apple's RecognizeDocumentsRequest API, available on macOS 26+.
Performs OCR locally on the Mac. OCR recognition does not require sending images to an external network service.

The HTTP server is implemented with Vapor.

Requirements

macOS 26 or later.
Xcode / Swift toolchain that supports the package's Swift tools version.
Network access may be required during the first build so Swift Package Manager can fetch dependencies such as Vapor.

CLI Usage

Show help:

docOCR -h
docOCR --help

Show version:

docOCR -V
docOCR --version

Convert image files to Markdown:

docOCR ~/Desktop/book_imgs/*.jpg

This prints the OCR Markdown text to the terminal.

Write Markdown files next to the source images:

docOCR -o ~/Desktop/book_imgs/*.jpg

Each input file is written as a Markdown file next to the image:

~/Desktop/book_imgs/01.jpg -> ~/Desktop/book_imgs/01.md
~/Desktop/book_imgs/02.jpg -> ~/Desktop/book_imgs/02.md

Existing .md files with the same name are overwritten.

Start the HTTP server:

docOCR -s

By default, the server listens on port 8080.

Use a custom port:

docOCR -s -p 8000

The -s and -o modes are mutually exclusive.

HTTP Server

When the server is running, open:

http://0.0.0.0:8080

If you start the server with a custom port, use that port instead:

http://0.0.0.0:8000

The web page uses:

POST /upload

This route is intended for browser form uploads and returns an HTML result page.

API Usage

Use the JSON API endpoint:

POST /api/ocr

Example:

curl -X POST http://127.0.0.1:8000/api/ocr \
  -F "[email protected]"

The API also accepts image as the multipart field name:

curl -X POST http://127.0.0.1:8000/api/ocr \
  -F "[email protected]"

Successful response:

{
  "success": true,
  "message": "OK",
  "text": "OCR text..."
}

Error response:

{
  "success": false,
  "message": "Error message",
  "text": ""
}

Build

Build a debug executable:

swift build

Build a release executable:

swift build -c release

The release binary is generated at:

.build/release/docOCR

Install

Build the release binary:

swift build -c release

Install it somewhere on your PATH, for example:

install -m 755 .build/release/docOCR /usr/local/bin/docOCR

Then run:

docOCR -h

If /usr/local/bin is not writable or not on your PATH, choose another directory such as ~/bin and make sure that directory is included in your shell PATH.

Development

Run directly with SwiftPM:

swift run docOCR -o ~/Desktop/book_imgs/*.jpg
swift run docOCR -s -p 8000

macOS Shortcuts: Screenshot to Markdown

docOCR can also be used with the Shortcuts app on macOS to turn a screenshot into Markdown text.

In this workflow, the shortcut captures a screen selection, passes the screenshot image path to docOCR, reads the Markdown text from stdout, copies it to the clipboard, and then lets you paste the result into any text editor.

Then run the macOS shortcut:

The shortcut flow is:

Capture a screenshot.
Save the screenshot as a temporary image file.
Run docOCR <screenshot-image-path>.
Read the OCR Markdown text from stdout and copy it to the clipboard.

Paste the result into your editor.

Alternatively, the shortcut can call the /api/ocr API instead of running docOCR directly. Start the local server first:

docOCR -s

Codex Skill

If you use Codex, you can install the companion skill for docOCR: dococr-skill

The skill gives Codex reusable context for docOCR CLI usage, local HTTP API calls, OCR execution, and troubleshooting.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
Sources/docOCR		Sources/docOCR
Tests/docOCRTests		Tests/docOCRTests
.gitignore		.gitignore
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
image.png		image.png
image2.png		image2.png
image3.png		image3.png
screenshot_to_md.gif		screenshot_to_md.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docOCR

Features

Requirements

CLI Usage

HTTP Server

API Usage

Build

Install

Development

macOS Shortcuts: Screenshot to Markdown

Codex Skill

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

docOCR

Features

Requirements

CLI Usage

HTTP Server

API Usage

Build

Install

Development

macOS Shortcuts: Screenshot to Markdown

Codex Skill

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages