keyphrase is a command-line tool that automatically detects key phrases and important sentences in PDF or Markdown files using an LLM (Large Language Model) and annotates them with color highlights. It is designed for academic papers, technical documents, and any text where understanding the main points at a glance is helpful.
Example Outputs
-
Supports both PDF and Markdown (
.md) files -
AI-based detection and color-coding of key concepts:
- Approach/methodology (blue): The main novelty or core contribution of the paper
- Experimental results (green): Key observations and experimental outcomes
- Threats to validity (pink): Weaknesses or potential problems with the approach
-
Generates a new, annotated file with color-coded highlights
-
Flexible output filename options, with overwrite protection
-
All LLM inference is done locally via Ollama
-
Customizable highlight colors for each category via command-line options
pipx install git+https://github.com/tos-kamiya/keyphrase.gitIf you don't have pipx:
python -m pip install --user pipx
python -m pipx ensurepathKeyphrase uses Ollama for local LLM inference. Follow the instructions for your platform on the official Ollama site.
Install the required model in your local Ollama server:
ollama pull gpt-oss:20bFor PDF:
keyphrase input.pdf- Annotates
input.pdf, outputs asout.pdf(if not present).
For Markdown:
keyphrase input.md- Annotates
input.md, outputs asout.mdusing HTML<span>tags for highlights.
-o OUTPUT,--output OUTPUT: Specify output file name. Use-o -to write output to standard output (Markdown only).-O,--output-auto: Output toINPUT-annotated.pdforINPUT-annotated.md.- By default, output will be
out.pdforout.md. If the file exists, an error is raised unless--overwriteis specified. --overwrite: Overwrite output file if it already exists
You can fully customize and preview highlight colors for each category using the options below.
- Use
--color-mapto specify colors for each category. - Format:
name:#rgbaorname:#rrrggbbaa(e.g.,approach:#8edefbb0) - Available category names:
approach,experiment,threat - To disable a specific marker, specify
name:0(e.g.,threat:0) - This option can be used multiple times.
Example:
# Change 'approach' to yellow, 'experiment' to teal, and disable 'threat'
keyphrase input.pdf --color-map approach:#ffcc00ff --color-map experiment:#44cc99ff --color-map threat:0You can check the currently active highlight colors as a legend in your terminal.
This is especially useful when adjusting colors with --color-map.
keyphrase --color-legend text # Show legend as plain text
keyphrase --color-legend ansi # Show legend with 24-bit color blocks (background + black text)
keyphrase --color-legend html # Show legend as a compact HTML table snippetYou can combine this with --color-map to preview your custom color settings:
keyphrase --color-legend ansi --color-map approach:#ffcc00ff --color-map experiment:#44cc99ff- ANSI output uses a background color block and black text for visibility (works best in 24-bit color terminals).
- HTML output can be copy-pasted into documentation.
--skim: Enable skim mode, a simplified highlighting mode intended for survey papers (i.e., papers not following the typical problem → approach → experiment structure). Instead of categorizing sentences by type, this mode highlights only important sentences using a single highlight color.
-
-q,--quiet: Suppress all progress output and messages. -
--debug: Enable debug output (show prompts/responses) and progress bar. -
--verbose: Show progress bar (default behavior if no --quiet).
-m MODEL,--model MODEL: Specify the Ollama model to use (default:gpt-oss:20b)--max-sentence-length N: Maximum sentence length for analysis (default: 80)--buffer-size N: Buffer size for batch LLM queries (in characters, default: 2000). Sentences are processed in batches for efficiency.
keyphrase paper.pdf -O
# -> Annotates 'paper.pdf', outputs as 'paper-annotated.pdf'
keyphrase notes.md -o highlights.md --buffer-size 5000 --max-sentence-length 100 --verbose
# -> Annotates 'notes.md', outputs to 'highlights.md', using a larger buffer, longer sentences, and showing progress.- Python 3.10 or newer
- Ollama running locally
gpt-oss:20bmodel installed in Ollama (ollama pull gpt-oss:20b)
MIT
- No data is sent to any third-party APIs: all processing is local via Ollama.
- For best results on scientific papers, use high-quality, clean PDF or Markdown sources.
- Markdown output uses HTML
<span style="background-color:...">...</span>for color highlights.