Keyphrase

keyphrase is a command-line tool that automatically detects key phrases and important sentences in PDF or Markdown files using an LLM (Large Language Model) and annotates them with color highlights. It is designed for academic papers, technical documents, and any text where understanding the main points at a glance is helpful.

Example Outputs

docs/icpc-2022-zhu-annotated.pdf
docs/kbse-202405-kamiya-annotated.pdf (Japanese)

Features

Supports both PDF and Markdown (.md) files
AI-based detection and color-coding of key concepts:
- Approach/methodology (blue): The main novelty or core contribution of the paper
- Experimental results (green): Key observations and experimental outcomes
- Threats to validity (pink): Weaknesses or potential problems with the approach
Generates a new, annotated file with color-coded highlights
Flexible output filename options, with overwrite protection
All LLM inference is done locally via Ollama
Customizable highlight colors for each category via command-line options

Installation

1. Install via pipx (recommended)

pipx install git+https://github.com/tos-kamiya/keyphrase.git

If you don't have pipx:

python -m pip install --user pipx
python -m pipx ensurepath

2. Install and set up Ollama

Keyphrase uses Ollama for local LLM inference. Follow the instructions for your platform on the official Ollama site.

3. Download the gpt-oss model for Ollama

Install the required model in your local Ollama server:

ollama pull gpt-oss:20b

Usage

Basic usage

For PDF:

keyphrase input.pdf

Annotates input.pdf, outputs as out.pdf (if not present).

For Markdown:

keyphrase input.md

Annotates input.md, outputs as out.md using HTML <span> tags for highlights.

Output options

-o OUTPUT, --output OUTPUT: Specify output file name. Use -o - to write output to standard output (Markdown only).
-O, --output-auto: Output to INPUT-annotated.pdf or INPUT-annotated.md.
By default, output will be out.pdf or out.md. If the file exists, an error is raised unless --overwrite is specified.
--overwrite: Overwrite output file if it already exists

Color options

You can fully customize and preview highlight colors for each category using the options below.

Customizing highlight colors

Use --color-map to specify colors for each category.
Format: name:#rgba or name:#rrrggbbaa (e.g., approach:#8edefbb0)
Available category names: approach, experiment, threat
To disable a specific marker, specify name:0 (e.g., threat:0)
This option can be used multiple times.

Example:

# Change 'approach' to yellow, 'experiment' to teal, and disable 'threat'
keyphrase input.pdf --color-map approach:#ffcc00ff --color-map experiment:#44cc99ff --color-map threat:0

Checking your current color settings (legend output)

You can check the currently active highlight colors as a legend in your terminal. This is especially useful when adjusting colors with --color-map.

keyphrase --color-legend text   # Show legend as plain text
keyphrase --color-legend ansi   # Show legend with 24-bit color blocks (background + black text)
keyphrase --color-legend html   # Show legend as a compact HTML table snippet

You can combine this with --color-map to preview your custom color settings:

keyphrase --color-legend ansi --color-map approach:#ffcc00ff --color-map experiment:#44cc99ff

ANSI output uses a background color block and black text for visibility (works best in 24-bit color terminals).
HTML output can be copy-pasted into documentation.

Skim mode (experimental)

--skim: Enable skim mode, a simplified highlighting mode intended for survey papers (i.e., papers not following the typical problem → approach → experiment structure). Instead of categorizing sentences by type, this mode highlights only important sentences using a single highlight color.

Logging and verbosity options

-q, --quiet: Suppress all progress output and messages.
--debug: Enable debug output (show prompts/responses) and progress bar.
--verbose: Show progress bar (default behavior if no --quiet).

Other options

-m MODEL, --model MODEL: Specify the Ollama model to use (default: gpt-oss:20b)
--max-sentence-length N: Maximum sentence length for analysis (default: 80)
--buffer-size N: Buffer size for batch LLM queries (in characters, default: 2000). Sentences are processed in batches for efficiency.

More usage examples

keyphrase paper.pdf -O
# -> Annotates 'paper.pdf', outputs as 'paper-annotated.pdf'

keyphrase notes.md -o highlights.md --buffer-size 5000 --max-sentence-length 100 --verbose
# -> Annotates 'notes.md', outputs to 'highlights.md', using a larger buffer, longer sentences, and showing progress.

Requirements

Python 3.10 or newer
Ollama running locally
gpt-oss:20b model installed in Ollama (ollama pull gpt-oss:20b)

License

MIT

Notes

No data is sent to any third-party APIs: all processing is local via Ollama.
For best results on scientific papers, use high-quality, clean PDF or Markdown sources.
Markdown output uses HTML <span style="background-color:...">...</span> for color highlights.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
docs		docs
src/keyphrase		src/keyphrase
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README-ja_JP.md		README-ja_JP.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyphrase

Features

Installation

1. Install via pipx (recommended)

2. Install and set up Ollama

3. Download the gpt-oss model for Ollama

Usage

Basic usage

Output options

Color options

Customizing highlight colors

Checking your current color settings (legend output)

Skim mode (experimental)

Logging and verbosity options

Other options

More usage examples

Requirements

License

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Keyphrase

Features

Installation

1. Install via pipx (recommended)

2. Install and set up Ollama

3. Download the gpt-oss model for Ollama

Usage

Basic usage

Output options

Color options

Customizing highlight colors

Checking your current color settings (legend output)

Skim mode (experimental)

Logging and verbosity options

Other options

More usage examples

Requirements

License

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages