Comicbox is a Python library and command line tool that reads, writes, and synthesizes comic book archive metadata. It understands every popular comic metadata standard, merges them into one consistent data model, converts between them, tags comics from online databases, and extracts pages and covers.
It is the metadata engine behind the Codex comic reader, but works just as well as a standalone command line tool for organizing a comic library.
- Reads many archive types — CBZ, CBR, CBT, CB7, and (optionally) PDF.
- Reads and writes every popular metadata standard — ComicInfo.xml, MetronInfo.xml, ComicBookInfo, CoMet, PDF metadata, and its own YAML/JSON.
- Merges every source into one model — combines metadata from each embedded format and the filename into a single normalized view, then writes it back out to whichever formats you choose.
- Tags comics online — looks up and matches comics against Metron and ComicVine, then writes the result.
- Converts archives — repacks CBR/CBT/CB7 (and comic PDFs) to CBZ, and translates metadata between formats.
- Extracts images — pulls cover art or arbitrary page ranges out of any supported archive.
- Is scriptable and embeddable — a rich CLI, a Python API, and published JSON Schemas for every format.
| Format | Read | Write |
|---|---|---|
| CBZ (zip) | ✅ | ✅ |
| CBR (rar) | ✅ | converts to CBZ |
| CBT (tar) | ✅ | converts to CBZ |
| CB7 (7z) | ✅ | converts to CBZ |
| ✅ | ✅ embedded metadata |
CBR extraction and conversion require the unrar binary on your PATH. PDF
support is an optional extra.
Comicbox reads and writes all of the following, normalizing each into a common schema:
| Format | Read | Write | Notes |
|---|---|---|---|
| ComicInfo.xml (ComicRack) | ✅ | ✅ | v2.1 (draft) schema |
| MetronInfo.xml | ✅ | ✅ | v1.0 schema |
| ComicBookInfo (Comic Book Lover) | ✅ | ✅ | archive comment JSON |
| CoMet | ✅ | ✅ | |
| PDF metadata | ✅ | ✅ | can embed ComicInfo.xml / MetronInfo.xml |
| Comicbox YAML / JSON | ✅ | ✅ | native, lossless |
| Filename | ✅ | — | parses metadata out of the file name |
A full cross-format tag translation table is available.
Different formats spell the same idea in different ways. Comicbox reconciles them so you never have to:
- Identifiers — IDs, GTINs, and URLs from every format are aggregated into a
single
identifiersstructure, and written back out as URNs in the Notes field. - Reprints — Alternate Names, Aliases, and "is version of" relationships
collapse into one
reprintslist. - Notes mining — the heavily-abused Notes field is parsed for embedded data (tagger, timestamps, and identifiers) that formats don't otherwise carry.
- Liberal value parsing — fuzzy, caseless values for enum-like fields (Age Rating, Format, credit roles) are accepted, tidied to Title Case, and converted to each output format's own enum on write.
- Filename parsing — series, issue, year, and more are extracted from a wide variety of naming conventions via comicfn2dict.
Comicbox can identify a comic and tag it from an online database. Metron and ComicVine are supported. It searches by the series, issue, and year it knows about, ranks candidates, breaks close calls with cover-image matching, and writes the best result.
# Interactive: prompts only when the match isn't clear.
comicbox --online metron "GI Joe #007 (1952).cbz"
# Tag by an exact database id (skips searching).
comicbox --id metron:42 "comic.cbz"
# Unattended batch run: never prompts, 4 files at a time.
comicbox --online all --recurse --prompts never -j 4 ./comics/--match controls how confidently comicbox writes without asking (ask ·
careful · auto · eager), and credentials come from --auth, COMICBOX_*
environment variables, the config file, or your system keyring. See
comicbox -h for the full set of online, caching, and tuning options.
# Extract the cover image.
comicbox --extract-covers --dest-path ./out "comic.cbz"
# Extract a range of pages (zero-based) by index.
comicbox --extract-pages 0:5 --dest-path ./out "comic.cbz"
# Convert a CBR to a CBZ, carrying metadata across.
comicbox --cbz "comic.cbr"
# Convert a single-image-per-page comic PDF to CBZ without re-encoding.
comicbox --cbz --pdf-pages image "comic.pdf"
# Rename a file to comicbox's canonical filename format.
comicbox --rename "comic.cbz"pip install comicboxFor PDF support, install the pdf extra:
pip install comicbox[pdf]Comicbox needs no binary dependencies for CBZ, CBT, and CB7. Reading or
converting CBR archives requires the unrar binary on your PATH.
The optional PDF extra pulls in
pymupdf, which
ships wheels with a bundled libmupdf for most platforms. Some platforms (e.g.
Linux on ARM) may need libstdc++ plus C/C++ build tools to compile it.
pymupdf has no pre-built AARCH64 wheels, so pip must build it. On some Python versions the build fails unless this environment variable is set:
PYMUPDF_SETUP_PY_LIMITED_API=0 pip install comicbox[pdf]You will also need the build-essential and python3-dev (or equivalent)
packages.
Comicbox ships a thorough, self-documenting CLI. Run:
comicbox -hfor the complete reference, including every metadata format key, the --print
phases, and the online tagging tables. A few representative commands:
# Print the merged metadata comicbox reads from a comic.
comicbox -p "comic.cbz"
# Set a field and write it as ComicInfo.xml inside the archive.
comicbox -m "{publisher: SmallComics}" -w cix "comic.cbz"
# Recursively set a field across an entire library.
comicbox --recurse -m "{publisher: 'SC Comics'}" -w cix ./comics/
# Export and re-import metadata as a file.
comicbox --export cix "comic.cbz"
comicbox --import ComicInfo.xml -w cix "comic.cbz"-m/--metadata accepts a compact "linear YAML" using tag names from any of
the supported formats. Put a space after each colon so it parses as YAML,
and quote values containing YAML special characters (:[]{},). See
comicbox -h for many more -m examples, and
"escaping YAML" for the
escaping details.
💡 Preview before writing. Add
-pto print exactly what would be written, or-n/--dry-runto perform an action without touching the filesystem.
The cleanest way to edit or remove existing tags is to round-trip through a file:
# 1. Export the current metadata to an editable file.
comicbox --export cix "My Overtagged Comic.cbz"
# 2. Edit it.
nvim ComicInfo.xml
# 3. Preview the re-import.
comicbox --import ComicInfo.xml -p "My Overtagged Comic.cbz"
# 4. Wipe the old tags, then write the edited file back (careful!).
comicbox --delete-all-tags "My Overtagged Comic.cbz"
comicbox --import ComicInfo.xml -w cix "My Overtagged Comic.cbz"You can also drop individual keys with -D/--delete-keys using dotted glom
paths, e.g. -D series,reprints.0.series.
Comicbox is primarily a library. The
Comicbox
class in comicbox.box is the main read interface, and comicbox.write exposes
a documented write API. Auto-generated API docs are
published with the HTML docs.
from comicbox.box import Comicbox
with Comicbox("comic.cbz") as cb:
metadata = cb.to_dict() # merged, normalized metadata
file_type = cb.get_file_type() # "CBZ", "PDF", ...
mtime = cb.get_metadata_mtime() # last metadata modification time
cover = cb.get_cover_page() # cover image bytesWriting is done through the public write_metadata (single file) and
bulk_write (batched) helpers:
from comicbox.write import write_metadata
result = write_metadata(
"comic.cbz",
# The patch is the contents under the "comicbox" root tag. The
# root-wrapped dict Comicbox.to_dict() returns is also accepted.
{"publisher": {"name": "SmallComics"}, "genres": ["Science Fiction"]},
formats=["COMIC_INFO"], # MetadataFormats names; e.g. COMIC_INFO, METRON_INFO
)
print(result.written)Every operational error these APIs raise derives from
comicbox.exceptions.ComicboxError — ArchiveError, ArchiveWriteError,
MetadataError, ExportError, WriteValidationError,
OnlineConfigurationError, OnlineLookupAbortedError, and
UnsupportedArchiveTypeError — so consumers can except ComicboxError without
swallowing unrelated programming errors.
Comicbox is configured by command line arguments, an optional config file, and environment variables (in that order of precedence).
- Defaults live in
config_default.yaml, which also documents the nested config groups (general,read,write,convert,compute, andonline). - Config file — point at one with
-c PATH, or place it at~/.config/comicbox/config.yaml. - Environment variables are prefixed with
COMICBOX_. - Log level is set with the
LOGLEVELenvironment variable:
LOGLEVEL=ERROR comicbox -p "comic.cbz"Installing comicbox also installs two small sibling libraries, each usable on its own:
- comicfn2dict — parses metadata out of comic filenames into Python dicts (also used by ComicTagger).
- pdffile — presents a
ZipFile-like interface for PDF files (installed with the[pdf]extra).
Comicbox is hosted on GitHub. Most
development tasks are driven by the Makefile — run make to see what's
available.
The DEBUG_TRANSFORM environment variable prints verbose schema-transform
information, useful when debugging format conversions.
Comicbox is licensed under the LGPL-3.0-only license.