Thai-first context compression for Claude Code. An MCP server that actually shrinks Thai — 50–88% fewer tokens — and caches every original so nothing is ever lost.
สรุป means "to summarize." Headroom routes Thai through
noop(0% savings) because its whitespace tokenizer can't find Thai word boundaries. Sarup uses PyThaiNLP segmentation, so it compresses Thai as well as English — and caches every original so nothing is ever lost.
- Highlights
- Why it's safe — the two-tier guarantee
- How it works
- Tools
- Compression modes
- Measured results
- Example
- Install
- Register with Claude Code
- Auto-compression hook
- Privacy & data
- Configuration
- Project structure
- Tech stack & techniques
- Testing
- Roadmap
- License
- 🇹🇭 Real Thai compression — PyThaiNLP
newmmword segmentation, not whitespace. - ♻️ Lossless by guarantee — every compress caches the original;
verified: trueproves a byte-for-byte round-trip. - 🎚️ Five modes — from offline 1 ms TF-IDF to an 88%-savings cascade.
- 🧠 Optional local LLM — embeddings + rewrite via Ollama, with automatic offline fallback.
- 📏 Honest metrics — token counts from a real tokenizer (tiktoken), not byte guesses.
- 🔌 Content-aware — JSON compaction, log dedup, and verbatim code-fence preservation built in.
- 🛟 Can't break Claude — it's an MCP tool, not an API proxy; if the server is down the tools just go away and Claude keeps working.
| Tier | What | Guarantee |
|---|---|---|
| Compressed view | the shrunk text the model works on | lossy · small · cheap |
| Retrieval store | the original, keyed by a stable hash | lossless · recoverable |
Aggressive lossy compression is safe because the original is always one sarup_retrieve(hash)
away. This is how "maximum savings" and "100% accuracy" coexist — they live in different tiers.
Two entry points feed one engine: a cheap compressed view the model reads, and a lossless retrieval store that can restore the original byte-for-byte.
flowchart TD
M["🧑 Manual<br/>sarup_compress()"]:::entry --> R
A["⚙️ Automatic<br/>PostToolUse hook<br/>(Read · Bash · Grep)"]:::entry --> R
R{"Sarup compress<br/>extractive · semantic · abstractive · pipeline"}:::engine
R -- "compressed view<br/>50–88% fewer tokens" --> V["📄 Model context"]:::lossy
R -. "cache original" .-> S[("🗄️ Retrieval store<br/>hash → original")]:::lossless
V -. "need full detail?" .-> RET["🔑 sarup_retrieve(hash)"]:::lossless
RET --> S
S == "byte-for-byte ✓" ==> V
classDef entry fill:#e0e7ff,stroke:#6366f1,color:#111
classDef engine fill:#fde68a,stroke:#d97706,color:#111
classDef lossy fill:#fef3c7,stroke:#f59e0b,color:#111
classDef lossless fill:#bbf7d0,stroke:#16a34a,color:#111
- Manual — the model calls
sarup_compress/sarup_retrieveitself. - Automatic — the hook intercepts large tool outputs, caches the original to
SARUP_DB_PATH, and substitutes the compressed view + a retrieval hash. Source code is skipped; small outputs pass through untouched.
| Tool | Purpose |
|---|---|
sarup_compress(content, target_ratio?, lossless?, query?, mode?) |
Compress; returns compressed text, hash, token metrics,verified, token_method. |
sarup_retrieve(hash) |
Recover the original content byte-for-byte. |
sarup_stats() |
Cumulative session savings. |
sarup_compress arguments
| Arg | Type | Default | Meaning |
|---|---|---|---|
content |
string | — | Text to compress (required). |
target_ratio |
number | 0.5 |
Fraction of prose to keep (0.1–0.9). |
lossless |
boolean | false |
Only apply lossless transforms (whitespace / JSON compact). |
query |
string | "" |
Relevance hint — sentences matching it are kept. |
mode |
string | extractive |
See modes below. |
| Mode | How | Needs Ollama | Savings¹ | Speed¹ | Output |
|---|---|---|---|---|---|
extractive (default) |
TF-IDF scoring + n-gram dedup | no | 50.8% | ~1 ms | verbatim subset |
semantic |
Embedding centrality + cosine dedup | yes | 64.6% | ~1–2 s | verbatim subset |
abstractive |
Local-LLM rewrite | yes | ~51% | ~8–20 s | paraphrased |
pipeline |
Cascade: semantic → abstractive | yes | 88.1% | ~2 s | paraphrased |
auto |
semantic if Ollama is up, else extractive | optional | 64.6% | ~90 ms | subset |
¹ Measured on a 10-sentence Thai paragraph (522 tokens). Every mode stays 100% recoverable via the store; Ollama modes degrade gracefully to extractive when the backend is down.
$ .\.venv\Scripts\python.exe bench\benchmark.py
sample before after savings verify
Thai prose 522 257 50.8% OK
Thai prose (aggressive) 522 217 58.4% OK
English prose 105 54 48.6% OK
JSON (lossless) 67 44 34.3% OK
Logs 563 300 46.7% OK
TOTAL 1779 872 51.0% ALL OK → 100% recoverable
Mode comparison (Thai prose, 522 tok):
extractive 50.8% (1ms) · auto 64.6% (~90ms) · semantic 64.6% (2.1s)
abstractive 51.1% (8s) · pipeline 88.1% (2.3s) ← all verified recoverable
Token counts via tiktoken cl100k_base — a real tokenizer, not a byte heuristic.
A real sarup_compress call on a Thai paragraph (mode="auto", Ollama up → semantic):
The model keeps working on the 154-token view; the full 518-token original is one call away:
// → sarup_retrieve(hash="caa568140bec0ff734937cf5")
{ "content": "…the exact original text, restored byte-for-byte…" }One command (creates the venv, installs everything, registers the MCP server for all projects — idempotent):
.\scripts\setup.ps1 -All # Windows (-All also adds the hook, the /sarup-setup skill, pulls Ollama models)
./scripts/setup.sh --all # Linux / WSL / macOSTip:
-All/--allinstalls a global/sarup-setupskill, so on any other machine you can just type/sarup-setupin Claude Code and it walks through the install. (Or runscripts/install-skill.ps1/install-skill.shon its own.)
Uninstall just as cleanly (only removes what Sarup added; -Purge/--purge also
deletes the venv + cache):
.\scripts\uninstall.ps1 # Windows
./scripts/uninstall.sh # Linux / WSL / macOSManual install
py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -e ".[dev]"Optional local-LLM modes (semantic / abstractive / pipeline) need Ollama:
ollama pull nomic-embed-text # embeddings → semantic mode
ollama pull gemma3:12b # rewrite → abstractive / pipeline (Thai-validated)One-command setup (recommended). Detects this machine's paths, probes Ollama
(picks the best mode + models), and merges into .mcp.json / .claude/settings.json
without clobbering anything already there (a .bak is written first):
.\.venv\Scripts\python.exe scripts\install.py --with-hook --pull- No Ollama? It configures offline
extractivemode — still fully works. - Ollama up? It auto-selects
nomic-embed-text(semantic) +gemma3:12b(rewrite) and sets the hook toauto.--pullfetches any missing models. - Idempotent — safe to re-run;
--globalwrites to~/.claudeinstead.
Manual — or add it yourself to your MCP config (e.g. .mcp.json or ~/.claude.json).
Replace <SARUP_DIR> with the absolute path where you cloned this repo (the installer
above fills these in for you):
{
"mcpServers": {
"sarup": {
"command": "<SARUP_DIR>/.venv/Scripts/python.exe",
"args": ["-m", "sarup.server"],
"env": { "SARUP_DB_PATH": "<SARUP_DIR>/.sarup-cache.db" }
}
}
}On Linux/macOS the interpreter is
<SARUP_DIR>/.venv/bin/python.
Or run it directly over stdio:
.\.venv\Scripts\python.exe -m sarup.serverSkip manual tool calls entirely: install the PostToolUse hook and large Read/Bash/Grep
outputs are compressed before they enter context, with the original cached for retrieval.
Source-code reads are skipped for safety. Full setup in hooks/README.md.
Experimental — verify on your build. The hook fires and emits a valid
updatedToolOutput, but whether Claude Code applies it is surface-dependent: as of testing, the VS Code extension (2.1.193) does NOT apply it — the model still receives the full output, so the hook is a no-op there. Use the manualsarup_compresstool instead (it works everywhere); the hook may apply on other/CLI builds. Replace<SARUP_DIR>with your clone path, or runinstall.py --with-hook.
{
"hooks": {
"PostToolUse": [
{ "matcher": "Read|Bash|Grep",
"hooks": [{ "type": "command",
"command": "<SARUP_DIR>/.venv/Scripts/python.exe <SARUP_DIR>/hooks/sarup_hook.py" }] }
]
},
"env": { "SARUP_DB_PATH": "<SARUP_DIR>/.sarup-cache.db" }
}To guarantee recovery, Sarup caches the original content in the store. Two things to know:
- With
SARUP_DB_PATHset, originals are written to that SQLite file in plaintext (no encryption). Treat it like a cache of whatever you compressed. - If you compress tool outputs that contain secrets (e.g. a
.envdump or credentials in a log), those land in the cache too. The auto-hook skips source-code/config file reads, butBashoutput is fair game — review what you point it at.
*.db is git-ignored, so the cache never gets committed. For zero on-disk
footprint, leave SARUP_DB_PATH unset (memory-only; the MCP server then loses
the cache on restart, and the hook will not substitute — see the hook docs).
| Var | Default | Meaning |
|---|---|---|
SARUP_DB_PATH |
(in-memory) | SQLite path for a persistent, cross-process store.Required for hook retrieval. |
OLLAMA_HOST |
http://localhost:11434 |
Ollama endpoint. |
SARUP_ABSTRACTIVE_MODEL |
gemma3:12b |
Model for abstractive / pipeline rewrite. |
SARUP_EMBED_MODEL |
nomic-embed-text |
Model for semantic embeddings. |
SARUP_HOOK_MODE |
auto |
Hook compression mode. |
SARUP_HOOK_MIN_TOKENS |
400 |
Hook only compresses outputs with at least this many tokens (token-based, fair across languages). |
sarup/
├── src/sarup/
│ ├── server.py # MCP stdio server — 3 tools
│ ├── compressor.py # router + modes (extractive/semantic/abstractive/pipeline/auto)
│ ├── thai.py # PyThaiNLP tokenization, sentence split, TF-IDF
│ ├── semantic.py # embedding centrality + cosine dedup
│ ├── llm.py # optional Ollama backend (generate + embed)
│ ├── tokens.py # real token counting (tiktoken)
│ └── store.py # CCR store: hash → original (memory + SQLite)
├── hooks/
│ ├── sarup_hook.py # PostToolUse auto-compression hook
│ └── README.md # hook install guide
├── bench/benchmark.py # before/after measurement
├── tests/ # test_thai, test_mcp, test_hook, ...
├── README.md
└── STACK.md # full stack + techniques
Python 3.11 · MCP · PyThaiNLP newmm · tiktoken · Ollama (optional) · SQLite · hatchling · pytest.
The technique behind each mode — TF-IDF scoring, embedding centrality, cascade pipeline, content routing, and graceful degradation — is documented in STACK.md.
.\.venv\Scripts\python.exe -m pytest tests/ -qThe suite covers Thai NLP, the MCP tool contracts, every mode (including Ollama-fallback paths), the roundtrip-verify guarantee, and the auto-compression hook (incl. cross-process retrieval).
- Make
autothe default mode forsarup_compress(currentlyextractive). - Optional Typhoon 2.1 abstractive (blocked on an Ollama template fix).
- Per-content adaptive
target_ratio. - Published PyPI package.
MIT