📖 malFuse — Supply Chain Firewall

中文 | English

malFuse is a local HTTP proxy firewall that prevents software supply chain poisoning by intercepting package install requests and blocking malicious packages before they reach your disk. Built in Go with zero runtime dependencies.

How it works: pip, npm, yarn, uv, and other package managers are configured to route through malFuse. Every install request is checked against a local database of 252,637 confirmed malicious packages sourced from the OpenSSF project, plus real-time OSV vulnerability lookups and streaming script analysis.

Multi-ecosystem database — 252,637 confirmed malicious packages across 11 ecosystems. Currently supports PyPI (pip/uv) and npm (npm/pnpm/yarn) routing. Other ecosystems (RubyGems, NuGet, Crates.io, Go, Maven, Packagist, VSCode) have data in the database but proxy routing is not yet implemented.
6-layer detection — whitelist, malicious DB, cooldown, typo-squatting, OSV API, script scan
Zero friction — no CA certificates, no HTTPS interception, plain HTTP on localhost
Offline capable — SQLite-backed, works without network after initial DB setup

P0 + P1 complete. P2 in progress (whitelist, CI/CD done). 252,637 malicious package records across 11 ecosystems.

Usage

Quick Start

# 1. Generate malicious package database
./malfuse-db --mode direct --db malfuse.db --repo ossf-malicious-packages

# 2. Configure package manager
./malfuse link

# 3. Start proxy
./malfuse start            # daemon mode (background)
# or
./malfuse -config config.json  # foreground

# 4. Install dependencies normally (traffic routed through proxy)
pip install requests
npm install lodash

CLI Commands

`malfuse` — Proxy + Management

# Running
malfuse                         # foreground (default config.json)
malfuse -c /path/to/config.json # custom config
malfuse start                   # daemon (background)
malfuse stop                    # stop daemon
malfuse status                  # check daemon status

# Package manager configuration
malfuse link                    # configure all installed managers
malfuse link --target pip       # pip only
malfuse link --target npm       # npm only
malfuse link --target pnpm      # pnpm only
malfuse link --target yarn      # yarn v1 only
malfuse unlink                  # restore all
malfuse unlink --target pip     # restore pip only

# Whitelist management
malfuse allow add requests --ecosystem pypi               # allow all versions
malfuse allow add lodash --ecosystem npm --version 4.17.21 # allow specific version
malfuse allow remove requests --ecosystem pypi              # remove from whitelist
malfuse allow list                                          # list all
malfuse allow list --ecosystem npm                          # filter by ecosystem

`malfuse-db` — Database Management

# Direct SQLite write (online use)
malfuse-db --mode direct --db malfuse.db --repo ossf-malicious-packages

# Generate SQL incremental file (offline/air-gapped use)
malfuse-db --mode sql --output updates-20260527.sql --repo ossf-malicious-packages

# Specify config (reads repo_proxy for GitHub acceleration)
malfuse-db --config config.json --mode direct

Flag	Default	Description
`--mode`	`direct`	`direct` write to SQLite / `sql` generate incremental SQL file
`--db`	`malfuse.db`	SQLite database path
`--repo`	`ossf-malicious-packages`	git repo cache directory
`--output`	`updates-YYYYMMDDHHmm.sql`	SQL output path (sql mode only)
`--config`	`config.json`	config file path (reads `repo_proxy` for proxy)

Update mechanism: First run does a full scan (clone + parse all OSV JSON). Subsequent runs use git fetch + git diff --name-status for incremental updates, completing in seconds.

Output: direct mode writes to malfuse.db (WAL mode, readable by proxy concurrently). sql mode generates INSERT OR REPLACE + DELETE statements for importing into air-gapped databases.

Package Manager Configuration

One-click (`malfuse link`)

Tool	Command	What it does
pip	`malfuse link --target pip`	`pip config set global.index-url http://127.0.0.1:8080/pypi/simple/`
npm	`malfuse link --target npm`	`npm config set registry http://127.0.0.1:8080/npm/`
pnpm	`malfuse link --target pnpm`	`pnpm config set registry http://127.0.0.1:8080/npm/`
yarn v1	`malfuse link --target yarn`	`yarn config set registry http://127.0.0.1:8080/npm/`

malfuse link backs up original values to ~/.malfuse_backup.json. malfuse unlink restores from backup.

Manual Configuration

These tools don't support CLI config commands—manual setup required:

Tool	Documentation
yarn v2+ (Berry)	docs/yarn-v2-config.md
uv	docs/uv-config.md
poetry	docs/poetry-config.md
conda	docs/conda-config.md

Blocking Granularity

Note: Currently only PyPI (pip/uv) and npm (npm/pnpm/yarn) package managers are supported for proxy routing. Other ecosystems exist in the malicious database but do not have proxy routing yet.

Different package managers have different proxy behaviors, resulting in different version-matching precision:

Ecosystem	Granularity	Mechanism
PyPI (pip)	Version-precise	Simple API only blocks `version=NULL` entries. Proxy rewrites HTML download links so pip's tarball requests go back through the proxy, enabling exact version matching
npm	Package-level (all-or-nothing)	npm's `pacote` download library bypasses proxy caching, preventing precise version matching on download. Only `version=NULL` entries block at Simple API stage with 403

npm Version-Precise Workaround

For npm version-level control, use whitelist with all-or-nothing blocking:

# 1. Mark package as all-versions blocked (version=NULL)
sqlite3 malfuse.db "INSERT INTO malicious_packages VALUES ('bad-lib', NULL, 'npm', '', '');"

# 2. Whitelist the safe version
./malfuse allow add bad-lib --ecosystem npm --version 2.0.5

Now npm install [email protected] passes, all other versions are blocked.

Blocking Behavior Matrix

Scenario	pip	npm
DB `version=NULL`	All versions blocked	All versions blocked
DB `version="1.0"`, install 1.0	Blocked (exact match on download)	Blocked (unless whitelisted)
DB `version="1.0"`, install 2.0	Pass	Blocked (unless whitelisted)

Detection Pipeline

Each install request passes through 6 checks (including whitelist). The first match stops the pipeline:

#	Check	Data Source	Result	Default
0	Whitelist	SQLite `whitelist` table	Match → immediate PASS, skip all remaining checks	On
1	Malicious DB	SQLite `malicious_packages` (252,637 records)	Match → 403 Forbidden	On
2	Cooldown	`malicious_packages.published` (OSV report timestamp)	Report age < 48h → 403	Off
3	Typo-Squatting	Embedded 2,790 popular packages + Levenshtein distance	Name similarity → 403	Off
4	OSV API	`api.osv.dev/v1/query` + in-memory TTL cache	Vuln found + `block_on_vuln=true` → 403	On (log only)
5	Stream Script Scan	TeeReader streaming (tar/gzip extraction)	Malicious script → connection reset	Off

Failure policies:

Whitelist / typo — no failure possible (pure memory / SQLite)
Malicious DB — skip on DB missing or corrupt (log WARN)
Cooldown — skip on missing DB or published field; DB-only query, no extra network calls
OSV API — network unreachable → pass (fail-open); block_on_vuln=false → log only
Script scan — parse/archive error → pass (fail-open)

Note: Cooldown and typo-squatting are off by default due to false-positive risk and stability concerns. Enable only when you understand the trade-offs.

Script scan (#5) attack vectors covered:

JS Ecosystem	Python Ecosystem
`package.json` `scripts.preinstall/postinstall/install`	`setup.py` full content
`package.json` scripts-referenced `.js` files	`__init__.py` full content
Standalone `.js` files	`.pth` file `import` lines
—	`pyproject.toml` `build-system.build-backend`

Each vector analyzed by three detectors: Shannon entropy (threshold 4.5), code obfuscation (base64/hex/eval chains), network detection (URLs/IPs).

Health Check

$ curl http://127.0.0.1:8080/health
{"db":true,"status":"ok","uptime":"2h34m5s"}

Field	Description
`status`	`ok` (healthy) or `degraded` (DB unavailable)
`db`	SQLite connection status
`uptime`	Proxy process uptime

Configuration (config.json)

{
  "port": "8080",
  "host": "127.0.0.1",
  "db_path": "malfuse.db",
  "pid_file": "malfuse.pid",
  "repo_proxy": "ghfast.top",
  "logging": {
    "level": "info",
    "format": "text",
    "output": "stdout"
  },
  "routing": [
    {"prefix": "/pypi/", "upstream": "https://pypi.tuna.tsinghua.edu.cn", "ecosystem": "pypi"},
    {"prefix": "/npm/", "upstream": "https://registry.npmmirror.com", "ecosystem": "npm"}
  ],
  "cooldown": {
    "enabled": false,
    "duration": "48h"
  },
  "typo": {
    "enabled": true,
    "threshold": 2
  },
  "osv": {
    "enabled": true,
    "block_on_vuln": false,
    "ttl": "1h",
    "base_url": "https://api.osv.dev"
  },
  "script_scan": {
    "enabled": false,
    "max_file_size": 5242880,
    "max_total_size": 52428800,
    "entropy": { "enabled": true, "threshold": 4.5 },
    "obfuscation": { "enabled": true, "base64_min_length": 100, "hex_min_length": 20 },
    "network": { "enabled": true, "allow_private_ips": false }
  }
}

Configuration reference:

Section	Field	Description
Base	`port` / `host`	Proxy listen address
Base	`db_path`	SQLite path; auto-skip DB check if file absent
Base	`pid_file`	Daemon mode PID file path (default `malfuse.pid`)
Base	`repo_proxy`	GitHub acceleration proxy domain (e.g. `ghfast.top`); omit for no proxy
`logging`	`level`	`debug` / `info` / `warn` / `error`
`logging`	`format`	`text` or `json` (JSON for log collection systems)
`logging`	`output`	`stdout` or file path (file mode also writes to stdout)
`routing`	`prefix`	URL prefix matching request path
`routing`	`upstream`	Real registry URL (proxy internally uses HTTPS)
`routing`	`ecosystem`	Ecosystem identifier (`pypi` / `npm`, used for DB + OSV queries)
`cooldown`	`enabled`	Default off, must explicitly enable
`cooldown`	`duration`	Block if OSV report published less than this duration ago
`typo`	`threshold`	Block if Levenshtein distance ≤ this value
`osv`	`block_on_vuln`	Whether to block on vulnerability found (default `false`, log only)
`osv`	`ttl`	Query result cache duration
`osv`	`base_url`	OSV API endpoint
`script_scan`	`enabled`	Default off, must explicitly enable
`script_scan`	`max_file_size`	Max single file to analyze (bytes), skip if larger
`script_scan`	`max_total_size`	Max total stream size, stop scanning if exceeded
`script_scan.entropy`	`threshold`	Shannon entropy threshold (~4.5 = upper bound of English text)
`script_scan.obfuscation`	`base64_min_length`	Min base64 string length to trigger detection
`script_scan.obfuscation`	`hex_min_length`	Min consecutive `\xNN` count to trigger detection
`script_scan.network`	`allow_private_ips`	Whether to allow private IPs (`10.x`, `192.168.x`, etc.)

Build

CGO_ENABLED=0 go build -o malfuse .
CGO_ENABLED=0 go build -o malfuse-db ./cmd/malfuse-db/

Pure Go, zero CGo dependencies. Single build runs on Linux / macOS (Intel + Apple Silicon) / Windows.

Development

Directory Structure

malFuse/
├── main.go                    # malfuse proxy entry (cobra CLI)
├── config.json                # Configuration
├── cmd/
│   └── malfuse-db/            # Database management CLI
├── internal/
│   ├── config/                # JSON config loading + validation
│   ├── proxy/                 # HTTP proxy (routing, forwarding, health)
│   ├── engine/                # Detection pipeline (whitelist, mal-db, cooldown, typo, OSV) + StreamChecker
│   ├── scanner/               # Streaming script scan (entropy/obfuscation/network + JS/Python analysis)
│   ├── osv/                   # OSV API client + in-memory TTL cache
│   ├── logger/                # logrus structured logging wrapper
│   ├── daemon/                # Background process management (PID, signals)
│   ├── linker/                # Package manager config (pip/npm/pnpm/yarn)
│   └── db/
│       ├── schema/            # SQLite DDL + CRUD (WAL mode, DBExec interface)
│       ├── ingest/            # OSV JSON 1.5.0 parsing + Git operations
│       └── output/            # Direct DB write / SQL incremental file generation
├── .github/workflows/         # CI/CD pipelines
├── docs/                      # Manual configuration guides
├── malfuse.db                 # SQLite database (generated by malfuse-db)
└── ossf-malicious-packages/   # Git repo cache (gitignored)

Running Tests

# All unit + integration tests
go test ./internal/...

# Specific package
go test -v ./internal/scanner/
go test -v ./internal/engine/

130+ tests covering all packages.

Roadmap

✅ P0 — Core Skeleton (Complete)

HTTP reverse proxy + routing
Malicious package SQLite database (252,637 records, 11 ecosystems)
malfuse-db CLI (git incremental fetch + SQL offline mode)
Detection pipeline (malicious-db / cooldown / typo / OSV)
Streaming script scanner (entropy / obfuscation / network + JS/Python analysis)

✅ P1 — Automation & Operations (Complete)

malfuse link / malfuse unlink (pip / npm / pnpm / yarn)
logrus structured logging (level control, JSON format, file output)
/health health check endpoint
Daemon mode (malfuse start/stop/status)
End-to-end integration test suite

🟢 P2 — Deep Scanning & Ecosystem Expansion

malfuse allow whitelist management
CI/CD Pipeline (test + DB auto-update + release on tag)
More ecosystem routes (RubyGems, NuGet, Crates.io, Go modules)
Docker image distribution
Install script AST analysis

Tech Stack

Component	Library
CLI framework	`github.com/spf13/cobra`
Structured logging	`github.com/sirupsen/logrus`
SQLite	`modernc.org/sqlite` (pure Go, zero CGo)
HTTP proxy	`net/http/httputil.ReverseProxy` (stdlib)
Malicious package format	OSV Schema 1.5.0
Typo detection	Custom Levenshtein distance implementation
Entropy detection	Custom Shannon Entropy implementation
Obfuscation detection	regexp (stdlib)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
cmd/malfuse-db		cmd/malfuse-db
docs		docs
internal		internal
openspec		openspec
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CLAUDE.md		CLAUDE.md
README.md		README.md
README_CN.md		README_CN.md
config.json		config.json
config_cn.json		config_cn.json
go.mod		go.mod
go.sum		go.sum
main.go		main.go
malfuse.db		malfuse.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📖 malFuse — Supply Chain Firewall

Usage

Quick Start

CLI Commands

`malfuse` — Proxy + Management

`malfuse-db` — Database Management

Package Manager Configuration

One-click (`malfuse link`)

Manual Configuration

Blocking Granularity

npm Version-Precise Workaround

Blocking Behavior Matrix

Detection Pipeline

Health Check

Configuration (config.json)

Build

Development

Directory Structure

Running Tests

Roadmap

✅ P0 — Core Skeleton (Complete)

✅ P1 — Automation & Operations (Complete)

🟢 P2 — Deep Scanning & Ecosystem Expansion

Tech Stack

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📖 malFuse — Supply Chain Firewall

Usage

Quick Start

CLI Commands

malfuse — Proxy + Management

malfuse-db — Database Management

Package Manager Configuration

One-click (malfuse link)

Manual Configuration

Blocking Granularity

npm Version-Precise Workaround

Blocking Behavior Matrix

Detection Pipeline

Health Check

Configuration (config.json)

Build

Development

Directory Structure

Running Tests

Roadmap

✅ P0 — Core Skeleton (Complete)

✅ P1 — Automation & Operations (Complete)

🟢 P2 — Deep Scanning & Ecosystem Expansion

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`malfuse` — Proxy + Management

`malfuse-db` — Database Management

One-click (`malfuse link`)

Packages