gzipt — gzip as a language model

gzipt generates text using gzip as its only model. No neural network, no training, no parameters. You prime it with a corpus, and it continues a prompt by searching for the byte sequences that compress best, because what compresses well is what the model predicts. Below is an example output:

gzipt --corpus data/tinyshakespeare.txt --prompt $'MENENIUS:\n' --length 200

MENENIUS:
'Though all at once canq

MARCIUS:
Pray now, nocamest thou to a morsel .

LARTIUS:
Hence, and
I' the end admire, where G
again; and after it ag .

LARTIUS:
Hence, and
I' the end ad

LARTIUS:
fame and

This is somewhat of a cherry picked example (it is normally slightly worse than this) but isn't it cool that we can do this at all!

Usage

To download the Shakespeare dataset I used:

# Download the dataset (Shakespeare text)
wget https://github.com/nathan-barry/tiny-diffusion/releases/download/v2.0.0/data.txt

Below are the default values for the CLI arguments.

gzipt \
  --corpus FILE \         # primes gzip's window with context
  --prompt "text" \       # promt to continue
  --length 200 \          # bytes to generate
  --horizon 24 \          # beam depth: bytes looked ahead and committed per span
  --beam-width 32 \       # partial continuations kept each step
  --temperature 0.5 \ 
  --tail 80 \             # generated bytes kept in scoring context (anti-copy)
  --window 30000 \        # corpus bytes shown to gzip (<= 32768)
  --workers 8             # threads for scoring (zlib releases the GIL)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
gzipt.py		gzipt.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gzipt — gzip as a language model

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gzipt — gzip as a language model

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages