mtase-motif is a local-first Python package for finding bacterial DNA
methyltransferase candidates and transferring or inferring methylation motifs
from a single genome.
The package keeps the database management code in mtase_motif/, but it does
not bundle downloaded Pfam, TIGRFAMs, or REBASE payloads in the published
artifact.
Install the Python package in a local virtual environment:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e '.[dev]'For a PyPI release, install the distribution as:
python -m pip install mtasemotifThe installed CLI command remains:
mtase-motif --helpInstall native executables separately. The repo includes an optional
environment.yml for a conda env that provides the non-Python tools:
conda env create -f environment.yml
conda activate mtaseThe commands below cover the full sequence-first workflow from a clean checkout
to a completed run on the example E. coli genome stored on this machine at
/Users/li/data/hammerhead_test/ecoli.fa. That path is machine-local, not part
of the checkout.
Activate the native-tool conda environment first, then the local Python venv:
conda activate mtase
source .venv/bin/activateThis keeps prodigal, hmmscan, hmmsearch, hmmpress, mmseqs or
blastp, and fimo on PATH while the Python package stays in .venv.
The conda environment provides the HMMER tools used for TIGRFAMs, but it does
not download the TIGRFAMs model library itself.
The walkthrough below uses a project-local DB directory so the full run is self-contained:
DB_DIR=$PWD/db
mtase-motif db init --db-dir "$DB_DIR"
mtase-motif db fetch pfam --db-dir "$DB_DIR"
mtase-motif db fetch rebase --db-dir "$DB_DIR"
mtase-motif db index --db-dir "$DB_DIR"
mtase-motif db status --db-dir "$DB_DIR"Downloaded databases live outside the package by default under
~/.cache/mtase-motif/db, but using --db-dir "$DB_DIR" keeps this walkthrough
local to the repo.
The curated Pfam subset includes MTase catalytic domains plus Type I context
models (PF02384, PF12161, PF01420, and PF04313). Pfam-only runs can
therefore recover Type I-like candidates when the MTase core/support domains or
nearby S/R context are present. TIGRFAMs remains optional, but it adds direct
HsdM/S/R and Type III Mod/Res context and can make type hints more specific.
To enable TIGRFAMs, download or otherwise obtain the TIGRFAMs HMM library
separately, then point --source at the file or at a directory containing one
of these names:
TIGRFAMs.hmmTIGRFAMs.hmm.gzTIGRFAMs_15.0_HMM.LIBTIGRFAMs_15.0_HMM.LIB.gz- another
*.HMM.LIBor*.HMM.LIB.gz
Then import and rebuild indexes:
mtase-motif db fetch tigrfams --db-dir "$DB_DIR" --source /path/to/TIGRFAMs_15.0_HMM.LIB.gz
mtase-motif db index --db-dir "$DB_DIR"
mtase-motif db status --db-dir "$DB_DIR"db index needs hmmpress; mtase-motif run uses hmmsearch when the
TIGRFAMs subset is present.
Offline and local-mirror examples:
mtase-motif db fetch pfam --db-dir "$DB_DIR" --source /path/to/Pfam-A.hmm.gz
mtase-motif db fetch rebase --db-dir "$DB_DIR" --source /path/to/rebase_emboss_dir
mtase-motif db index --db-dir "$DB_DIR"For offline REBASE, --source alone is not enough. The source directory must
also contain rebase_proteins.faa or one or more REBASE *_Protein.txt
protein dumps before mtase-motif db index can build the REBASE protein
search index.
More database download and import notes are in docs/database_setup.md.
Use the example genome from /Users/li/data/hammerhead_test:
GENOME=/Users/li/data/hammerhead_test/ecoli.fa
OUT_DIR=$PWD/results/ecoli
mtase-motif run --genome "$GENOME" --db-dir "$DB_DIR" --out "$OUT_DIR" -j 4Replace "$GENOME" with your own .fa or .fna file when running on a new
genome.
Optional structure-assisted runs require local candidate structures via
--structures-dir. Foldseek-backed steps also require foldseek on PATH and
either --foldseek-db or the default DB at
<db-dir>/structures/pdb/foldseek_db; --foldseek-labels only applies when
that Foldseek DB-backed search path is available.
If you already have predicted proteins, you can skip gene calling:
mtase-motif run --genome "$GENOME" --proteins /path/to/proteins.faa --db-dir "$DB_DIR" --out "$OUT_DIR" -j 4Core outputs in "$OUT_DIR":
"$OUT_DIR"/mtase_candidates.tsv"$OUT_DIR"/motif_calls.tsv"$OUT_DIR"/motif_assignment.tsv"$OUT_DIR"/summary.tsv"$OUT_DIR"/<candidate_id>/motif/pwm.meme"$OUT_DIR"/<candidate_id>/fimo/fimo.tsv"$OUT_DIR"/<candidate_id>/qc/qc.json
motif_calls.tsv now carries derived motif semantics such as mod_position,
motif_class, canonical/reverse-complement forms, and an overall
assignment_state. motif_assignment.tsv separates the primary call from
alternate related-candidate or hint routes when a candidate is unresolved or
ambiguous.
Quick checks:
ls "$OUT_DIR"
head "$OUT_DIR/mtase_candidates.tsv"
head "$OUT_DIR/summary.tsv"These executables must be on PATH for the sequence-first workflow:
prodigalhmmscan,hmmsearch,hmmpressmmseqsorblastpplusmakeblastdbfimo
make lint
make test
make sdist-check
make package-checkRelease notes live in CHANGELOG.md. Before tagging a release, bump
mtase_motif/__init__.py, run make package-check, then push a matching tag
such as v0.1.1. Tagged releases are set up for GitHub Releases and PyPI
publishing through the GitHub Actions workflows in .github/workflows/.