Skip to content

Ameyanagi/ra-bio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ra_bio

Public runtime library for microorganism risk-reference data derived from NITE/MRINDA.

ra_bio is the canonical non-MCP API for direct consumers. It ships a bundled SQLite database generated by ra_bio_scraper.

What it does

  • resolves microorganism names against canonical names and synonyms
  • supports fuzzy search for misspellings and historical names
  • returns aggregated organism profiles with structured law/risk annotations
  • keeps raw evidence under risk_annotations while also exposing grouped sections like regulations and biosafety

Install

uv add git+https://github.com/Ameyanagi/ra_bio.git
pip install "ra-bio @ git+https://github.com/Ameyanagi/ra_bio.git"

Usage

from ra_bio import get_bio_database, search_organisms, lookup_bio_profile

db = get_bio_database()

lookup = db.lookup(query="Mortierella wolfii", language="ja")
search = db.search(query="Granulicatella adiacen", limit=5, min_score=0.6)
sources = db.get_source_snapshots()

search_via_helper = search_organisms(query="Granulicatella adiacen", limit=5, min_score=0.6)
lookup_via_helper = lookup_bio_profile(query="Mortierella wolfii", language="ja")

Examples

Search a synonym or misspelling and get the current accepted name:

from ra_bio import get_bio_database

db = get_bio_database()
result = db.search(query="Mortierella wolfii", limit=3, min_score=0.6)

print(result["hits"][0]["canonical_name"])
# Actinomortierella wolfii

Look up a canonical profile with host and disease information:

from ra_bio import get_bio_database

db = get_bio_database()
profile = db.lookup(query="Vibrio salmonicida", language="ja")

print(profile["profile"]["canonical_name"])
# Aliivibrio salmonicida

print(profile["profile"]["hosts"])
# ['サケ科魚類']

print(profile["profile"]["diseases"])
# ['冷水性ビブリオ病*']

Look up law and biosafety annotations in a stable structure:

from ra_bio import get_bio_database

db = get_bio_database()
profile = db.lookup(query="Anaplasma bovis", language="ja")

print(profile["profile"]["regulations"]["cartagena"]["values"])
# ['クラス2']

print(profile["profile"]["biosafety"]["bsl_bsj"]["values"])
# ['BSL2']

Inspect the bundled source-update metadata:

from ra_bio import get_bio_database

db = get_bio_database()
for row in db.get_source_snapshots():
    print(row["dataset_id"], row["source_filename"], row["source_version"], row["fetched_at"])

Example search result:

{
  "cluster_id": "ORG-000093",
  "canonical_name": "Actinomortierella wolfii",
  "preferred_scientific_name": "Actinomortierella wolfii",
  "datasets": ["fungi"],
  "score": 1.0,
  "match_type": "exact_raw",
  "matched_value": "Mortierella wolfii",
  "match_sources": ["canonical_name", "scientific_name", "synonym:m"],
  "regulation_keys": [],
  "biosafety_keys": ["trba"]
}

Example lookup summary:

{
  "matched": true,
  "profile": {
    "canonical_name": "Granulicatella adiacens",
    "scientific_names": [
      "Abiotrophia adiacens",
      "Granulicatella adiacens",
      "Streptococcus adjacens"
    ],
    "datasets": ["bacteria"],
    "regulations": {
      "cartagena": {
        "values": ["クラス2"]
      }
    },
    "biosafety": {
      "bsl_bsj": {
        "values": ["BSL1*"]
      },
      "trba": {
        "values": ["2"]
      }
    }
  }
}

Example source snapshot metadata:

[
  {
    "dataset_id": "bacteria",
    "source_filename": "risk_bacteria_20260120.csv",
    "source_version": "20260120"
  },
  {
    "dataset_id": "bacteria_fish",
    "source_filename": "risk_bacteria_fish_20240924.csv",
    "source_version": "20240924"
  },
  {
    "dataset_id": "fungi",
    "source_filename": "risk_fungi.xlsx",
    "source_version": "20260120"
  }
]

Public runtime API:

  • get_bio_database(db_path: str | None = None)
  • search_organisms(query, mode="auto", dataset=None, limit=20, min_score=0.6, db_path=None)
  • lookup_bio_profile(query=None, scientific_name=None, language="ja", db_path=None)
  • get_bio_source_snapshots(db_path=None)
  • get_bio_runtime_status(db_path=None)
  • BioDatabase.get_source_snapshots()
  • BioDatabase.get_runtime_status()
  • BioDatabase.lookup(query=None, scientific_name=None, language="ja")
  • BioDatabase.search(query, mode="auto", dataset=None, limit=20, min_score=0.6)

Lookup payload highlights:

  • profile["regulations"]: 法令・制度の注記を安定キーで参照
  • profile["biosafety"]: BSL / TRBA などの注記を参照
  • profile["designations"]: 魚病菌・植物病原菌・住環境菌などの指定区分を参照
  • profile["pathogen_profiles"]: 魚病データセット由来の宿主・疾病プロファイルを参照
  • profile["risk_annotations"]: 元データに近い証跡を保持

Default behavior:

  • if db_path is omitted, ra_bio uses the packaged bundled SQLite database
  • db_path may point to:
    • a direct SQLite file
    • a checked-out ra_bio directory containing bio.sqlite3

Runtime artifact

The canonical runtime artifact is the bundled SQLite database:

  • packaged path: src/ra_bio/data/bio.sqlite3
  • published repo artifact: bio.sqlite3

Normal installed consumers should rely on the packaged bundled database. The public repo intentionally stays small: detailed raw CSV/HTML retention is handled by ra_bio_scraper, not by ra_bio.

Data sources

The current dataset is derived from these NITE/MRINDA downloads:

  • https://www.nite.go.jp/mrinda/list/risk/download/bacteria
  • https://www.nite.go.jp/mrinda/list/risk/download/bacteria_fish
  • https://www.nite.go.jp/mrinda/list/risk/download/fungi

Source update metadata

Source update information is part of the public runtime data.

  • the SQLite bundle stores source_filename, source_version, fetched_at, and content_hash
  • the public repo keeps parsed/source_snapshots.jsonl
  • consumers can inspect the same information via BioDatabase.get_source_snapshots()

Detailed raw CSV files, extracted CSVs, and HTML snapshots are retained in ra_bio_scraper.

About

Public runtime library for microorganism risk-reference data derived from NITE/MRINDA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages