This repository contains tools compatible with the European Variation Archive infrastructure. Please checkout the ETL pipeline repository to load data these tools can consume, and the REST web services API repository to retrieve information conveniently.
Web service and CLI that support queries from the EVA REST API and write the output in VCF format, for later consumption by other applications. The web service module produces a WAR deployable on Apache Tomcat; the CLI module dumps entire studies. Supports filtering by variant properties such as annotations, MAF, SIFT, and PolyPhen scores.
MongoDB schema migration tool built on MongoBee. Keeps track of which migrations were applied to each database, allowing incremental updates of the variation database schema.
Imports dbSNP variant data into the EVA MongoDB databases.
Each Python module has its own pyproject.toml and can be installed independently with pip install <module-directory>/.
Collects variant and submission statistics from the accessioning warehouse (MongoDB) and stores them in PostgreSQL. Provides two scripts: stats-accessioning for RS/SS ID counts per assembly, and stats-variant-warehouse for per-project counts.
Analyzes EVA web service and FTP usage patterns from Kibana logs. Enriches query records with geolocation data and produces summaries of API endpoint usage and query parameters. Includes scripts: ftp-usage, ws-query-analysis, ws-query-fill-in-location, summarise-endpoints, and summarise-query-params.
Automates refresh of EVA staging/testing databases from production snapshots via the Delphix data virtualization API. Manages snapshot creation, replication profiles, and integrates with GitLab to trigger downstream pipelines.
Monitors MongoDB accession collections for duplicate RS or SS IDs. Exports accessions, detects duplicates, and sends email notifications to configured recipients. Can be run standalone or via the included Nextflow workflow.
Queries PostgreSQL for EVA CLI call-home events and produces weekly usage reports as CSV files and matplotlib charts. Tracks runs per week, task types, exception rates, and validation error rates.
Updates taxonomy scientific names in the EVA PostgreSQL metadata database (evapro.taxonomy) and reorganizes the corresponding reference sequence directories on disk, creating symlinks for backward compatibility.
Benchmarks VCF release query performance against MongoDB using different lookup strategies. Runs perf stat over a configurable number of iterations and reports timing for each strategy.
Wrapper scripts (eva-sbatch, eva-srun) for submitting jobs to SLURM HPC clusters with EVA-specific defaults.
Template shell scripts for deploying Python modules from a GitHub tag or branch, either into an existing directory (deploy_python_project_as_dir.sh) or inside a dedicated virtual environment (deploy_python_project_in_venv.sh).