Public runtime library for microorganism risk-reference data derived from NITE/MRINDA.
ra_bio is the canonical non-MCP API for direct consumers.
It ships a bundled SQLite database generated by ra_bio_scraper.
- resolves microorganism names against canonical names and synonyms
- supports fuzzy search for misspellings and historical names
- returns aggregated organism profiles with structured law/risk annotations
- keeps raw evidence under
risk_annotationswhile also exposing grouped sections likeregulationsandbiosafety
uv add git+https://github.com/Ameyanagi/ra_bio.gitpip install "ra-bio @ git+https://github.com/Ameyanagi/ra_bio.git"from ra_bio import get_bio_database, search_organisms, lookup_bio_profile
db = get_bio_database()
lookup = db.lookup(query="Mortierella wolfii", language="ja")
search = db.search(query="Granulicatella adiacen", limit=5, min_score=0.6)
sources = db.get_source_snapshots()
search_via_helper = search_organisms(query="Granulicatella adiacen", limit=5, min_score=0.6)
lookup_via_helper = lookup_bio_profile(query="Mortierella wolfii", language="ja")Search a synonym or misspelling and get the current accepted name:
from ra_bio import get_bio_database
db = get_bio_database()
result = db.search(query="Mortierella wolfii", limit=3, min_score=0.6)
print(result["hits"][0]["canonical_name"])
# Actinomortierella wolfiiLook up a canonical profile with host and disease information:
from ra_bio import get_bio_database
db = get_bio_database()
profile = db.lookup(query="Vibrio salmonicida", language="ja")
print(profile["profile"]["canonical_name"])
# Aliivibrio salmonicida
print(profile["profile"]["hosts"])
# ['サケ科魚類']
print(profile["profile"]["diseases"])
# ['冷水性ビブリオ病*']Look up law and biosafety annotations in a stable structure:
from ra_bio import get_bio_database
db = get_bio_database()
profile = db.lookup(query="Anaplasma bovis", language="ja")
print(profile["profile"]["regulations"]["cartagena"]["values"])
# ['クラス2']
print(profile["profile"]["biosafety"]["bsl_bsj"]["values"])
# ['BSL2']Inspect the bundled source-update metadata:
from ra_bio import get_bio_database
db = get_bio_database()
for row in db.get_source_snapshots():
print(row["dataset_id"], row["source_filename"], row["source_version"], row["fetched_at"])Example search result:
{
"cluster_id": "ORG-000093",
"canonical_name": "Actinomortierella wolfii",
"preferred_scientific_name": "Actinomortierella wolfii",
"datasets": ["fungi"],
"score": 1.0,
"match_type": "exact_raw",
"matched_value": "Mortierella wolfii",
"match_sources": ["canonical_name", "scientific_name", "synonym:m"],
"regulation_keys": [],
"biosafety_keys": ["trba"]
}Example lookup summary:
{
"matched": true,
"profile": {
"canonical_name": "Granulicatella adiacens",
"scientific_names": [
"Abiotrophia adiacens",
"Granulicatella adiacens",
"Streptococcus adjacens"
],
"datasets": ["bacteria"],
"regulations": {
"cartagena": {
"values": ["クラス2"]
}
},
"biosafety": {
"bsl_bsj": {
"values": ["BSL1*"]
},
"trba": {
"values": ["2"]
}
}
}
}Example source snapshot metadata:
[
{
"dataset_id": "bacteria",
"source_filename": "risk_bacteria_20260120.csv",
"source_version": "20260120"
},
{
"dataset_id": "bacteria_fish",
"source_filename": "risk_bacteria_fish_20240924.csv",
"source_version": "20240924"
},
{
"dataset_id": "fungi",
"source_filename": "risk_fungi.xlsx",
"source_version": "20260120"
}
]Public runtime API:
get_bio_database(db_path: str | None = None)search_organisms(query, mode="auto", dataset=None, limit=20, min_score=0.6, db_path=None)lookup_bio_profile(query=None, scientific_name=None, language="ja", db_path=None)get_bio_source_snapshots(db_path=None)get_bio_runtime_status(db_path=None)BioDatabase.get_source_snapshots()BioDatabase.get_runtime_status()BioDatabase.lookup(query=None, scientific_name=None, language="ja")BioDatabase.search(query, mode="auto", dataset=None, limit=20, min_score=0.6)
Lookup payload highlights:
profile["regulations"]: 法令・制度の注記を安定キーで参照profile["biosafety"]: BSL / TRBA などの注記を参照profile["designations"]: 魚病菌・植物病原菌・住環境菌などの指定区分を参照profile["pathogen_profiles"]: 魚病データセット由来の宿主・疾病プロファイルを参照profile["risk_annotations"]: 元データに近い証跡を保持
Default behavior:
- if
db_pathis omitted,ra_biouses the packaged bundled SQLite database db_pathmay point to:- a direct SQLite file
- a checked-out
ra_biodirectory containingbio.sqlite3
The canonical runtime artifact is the bundled SQLite database:
- packaged path:
src/ra_bio/data/bio.sqlite3 - published repo artifact:
bio.sqlite3
Normal installed consumers should rely on the packaged bundled database.
The public repo intentionally stays small: detailed raw CSV/HTML retention is handled by ra_bio_scraper, not by ra_bio.
The current dataset is derived from these NITE/MRINDA downloads:
https://www.nite.go.jp/mrinda/list/risk/download/bacteriahttps://www.nite.go.jp/mrinda/list/risk/download/bacteria_fishhttps://www.nite.go.jp/mrinda/list/risk/download/fungi
Source update information is part of the public runtime data.
- the SQLite bundle stores
source_filename,source_version,fetched_at, andcontent_hash - the public repo keeps
parsed/source_snapshots.jsonl - consumers can inspect the same information via
BioDatabase.get_source_snapshots()
Detailed raw CSV files, extracted CSVs, and HTML snapshots are retained in ra_bio_scraper.