Research-grade Python scaffold for deterministic and auditable market forecasting workflows.
See CONTRIBUTING.md for verification and merge policy.
- Repository scaffold and quality guardrails.
- CI checks (lint, typecheck, tests).
- Local PostgreSQL for development.
- Run context primitives for reproducibility (
run_id,config_hash,code_version).
No real ingestion/model/backtest implementation is included in this milestone.
- Copy environment variables:
Copy-Item .env.example .env - Start PostgreSQL:
docker compose up -d docker compose ps - Install dependencies (default):
Faster alternative with uv:
python -m pip install -e ".[dev]"
uv pip install --system -e ".[dev]"
- Run quality gates:
Single entrypoint:
ruff format --check . ruff check . mypy src pytest -q
python scripts/verify.py
make install-piporjust install-pipmake installorjust installmake lintorjust lintmake typecheckorjust typecheckmake testorjust testmake verifyorjust verifymake checkorjust check
- Alembic is initialized under
migrations/. - For DB-backed tests, set
DATABASE_URLorAPP_DATABASE_URL(APP_DATABASE_URLrecommended). - DB tests resolve DSN in this order:
DATABASE_URLthenAPP_DATABASE_URL. - Example DSN:
postgresql+psycopg://pmx:pmx_dev_password@localhost:5432/pmx - Apply schema:
python -m alembic upgrade head - See
docs/db.mdfor partition strategy and future monthly partition creation.
- Required env:
APP_DATABASE_URLorDATABASE_URL(lookup order in code:DATABASE_URLthenAPP_DATABASE_URL)GAMMA_BASE_URL(default:https://gamma-api.polymarket.com)GAMMA_TIMEOUT_SECONDS(default:20)GAMMA_PAGE_SIZE(default:200)
- Run full refresh:
python -m pmx.jobs.gamma_catalog_refresh - Run incremental refresh:
python -m pmx.jobs.gamma_catalog_refresh --since-updated-at 2026-01-01T00:00:00Z
- Dev limit:
python -m pmx.jobs.gamma_catalog_refresh --max-markets 100
- Timestamp semantics:
- Catalog ingestion writes a run-level ingestion timestamp (
audit.ingested_at) for reproducibility and audit. - As-of gating (
event/published <= decision_tsandingested_at <= decision_ts + epsilon) is enforced later by backtest/forecast selection logic, not by this catalog ingestion job.
- Catalog ingestion writes a run-level ingestion timestamp (
- Temporary audit namespace:
markets.rule_parse_jsoncurrently stores both parser stub output and audit payload underaudit.*(for exampleaudit.gamma_raw).
- Required env:
APP_DATABASE_URLorDATABASE_URLGDELT_BASE_URL(default:https://api.gdeltproject.org/api/v2/doc/doc)GDELT_TIMEOUT_SECONDS(default:20)GDELT_MAX_RETRIES(default:4)GDELT_BACKOFF_SECONDS(default:0.5)GDELT_MAX_RECORDS(default:250)NEWS_PRIMARY_SOURCES_CONFIG(default:config/primary_sources.yaml)NEWS_CRAWLER_CONNECT_TIMEOUT_SECONDS(default:5)NEWS_CRAWLER_READ_TIMEOUT_SECONDS(default:15)NEWS_CRAWLER_MAX_RETRIES(default:3)NEWS_CRAWLER_BACKOFF_SECONDS(default:0.5)
- Run:
python -m pmx.jobs.news_ingest --since-published 2026-01-01T00:00:00Z --max-articles 200 --max-per-domain 20 --crawl-primary
- Disable crawler:
python -m pmx.jobs.news_ingest --since-published 2026-01-01T00:00:00Z --no-crawl-primary
- Notes:
- Primary-source whitelist is loaded from
config/primary_sources.yamland upserted intosources. - URL canonicalization is conservative and strips tracking params (
utm_*,fbclid,gclid, etc.). - Dedupe policy:
- hard dedupe on
canonical_url; - soft dedupe on
content_hashupdates the existing row (no insert); - fallback soft dedupe on
title_hash + same domain + published_at within 24hupdates the existing row (no insert). - every dedupe update logs
dedupe_reasonin structured logs:canonical_url,content_hash,title_window.
- hard dedupe on
- Raw audit payloads are persisted in
articles.raw:raw.gdeltfor source payload;raw.crawlerfor crawler response/extraction metadata;raw.ingestfor timestamp policy metadata (published_at_source,unknown_published_at).
published_atpolicy is explicit:- prefer crawler
published_at; - fallback to GDELT
published_at; - if both missing, fallback to run
ingested_atand markraw.ingest.unknown_published_at=true.
- prefer crawler
- As-of safety is preserved by always storing
published_atand run-levelingested_at. - Market linking is deterministic (
score = 2*title_hits + body_hits + 0.5*slug_hits, topK=5, stable tie-break) with a fixed stopword list in tokenization.
- Primary-source whitelist is loaded from
- Required env:
APP_DATABASE_URLorDATABASE_URL(lookup order in code:DATABASE_URLthenAPP_DATABASE_URL)CLOB_BASE_URL(default:https://clob.polymarket.com)CLOB_TIMEOUT_SECONDS(default:20)CLOB_RATE_LIMIT_RPS(default:5)CLOB_ORDERBOOK_DEPTH(optional deterministic top-of-book depth)CLOB_API_KEY(optional, only if endpoint requires auth)
- Run:
python -m pmx.jobs.clob_ingest_rest --max-tokens 100 --since-ts 2026-01-01T00:00:00Z --interval 1m
- Notes:
- Token ingestion order is deterministic (
token_idsorted). --since-tsis inclusive:- trades: include rows with
event_ts >= since_ts - candles: include rows with
start_ts >= since_ts
- trades: include rows with
ingested_atassignment mode is explicit and deterministic:- default
--ingested-at-mode now: single run-level timestamp applied toorderbook_snapshots,trades, andcandles - optional
--ingested-at-mode event_ts: per-rowingested_at = event_ts + --ingest-latency-seconds(candlesanchor onstart_ts)
- default
- Orderbook snapshot
event_tspolicy:- use API timestamp when available and parseable
- fallback to run ingestion timestamp (
run_ingested_at) when missing.
- Orderbook normalization is deterministic:
- bids sorted by price desc, asks sorted by price asc
- price/size quantized to 8 decimals
- optional depth cap via
CLOB_ORDERBOOK_DEPTH.
- Trade identity fallback:
- when both
seqand upstream hash/id are missing, a deterministic SHA-256 hash is derived from canonical trade fields to stabilize idempotent upsert keys.
- when both
- As-of gating is applied downstream by
decision_ts + epsilonfilters.
- Token ingestion order is deterministic (
- Required env:
APP_DATABASE_URLorDATABASE_URLCLOB_WSS_URL(default:wss://clob.polymarket.com/ws)CLOB_WSS_TIMEOUT_SECONDS(default:20)CLOB_WSS_MAX_RECONNECTS(default:8)CLOB_WSS_BACKOFF_SECONDS(default:0.5)CLOB_WSS_MAX_BACKOFF_SECONDS(default:30)CLOB_WSS_SEQ_FIELDS(default:seq,sequence,offset)CLOB_WSS_SEQ_FIELD(legacy override; if set, forces a single field, empty disables seq-based detection)CLOB_RECONCILE_GAP_SECONDS(default:60)CLOB_RECONCILE_MISMATCH_BPS(default:10)- REST envs from the previous section (used by reconciler).
- Run:
python -m pmx.jobs.clob_wss_listener ` --max-tokens 100 ` --reconcile-every-seconds 60 ` --since-ts 2026-01-01T00:00:00Z ` --run-seconds 300
- Explicit token list:
python -m pmx.jobs.clob_wss_listener --token-ids tokenA,tokenB --reconcile-every-seconds 60
- Notes:
- WSS ingestion is best-effort and can be out-of-order or incomplete.
- REST is treated as source of truth during reconcile windows.
- Gap detection strategy:
- seq-based mode (active when configured seq fields include monotonic fields):
extraction order is
CLOB_WSS_SEQ_FIELD(legacy, if set) otherwiseCLOB_WSS_SEQ_FIELDS(defaultseq,sequence,offset), and gap whenseq != last_seq + 1. IfCLOB_WSS_SEQ_FIELDis explicitly set to empty/whitespace, it disables seq-based detection (it is never interpreted as a field name). - heuristic mode (no configured seq fields, or event has no seq-like value):
gap when stream is stale versus
CLOB_RECONCILE_GAP_SECONDS; if no stream trade timestamp is available, reconcile still runs every tick. msg_id/event_idare diagnostic-only and are not used for gap detection.
- seq-based mode (active when configured seq fields include monotonic fields):
extraction order is
- Mismatch strategy:
- top-of-book mismatch when REST mid diverges from stream mid by more than
CLOB_RECONCILE_MISMATCH_BPSbasis points.
- top-of-book mismatch when REST mid diverges from stream mid by more than
- Per-token state is monotonic and logged periodically:
last_seq,last_trade_ts,last_book_ts,last_reconcile_ts
- Reconcile emits structured audit events:
reconcile_gapreconcile_mismatchwss_reconnect
- Reconcile gap/mismatch logs include:
action_taken,window_start,window_end,rest_calls,rows_upserted
- Repairs are idempotent because writes use DB upsert constraints.
- Required env:
APP_DATABASE_URLorDATABASE_URLINGEST_EPSILON_SECONDS(default:300)
- Run:
python -m pmx.jobs.compute_micro_features --decision-ts 2026-02-11T12:00:00Z --max-tokens 200 --epsilon-seconds 300
- Explicit token list:
python -m pmx.jobs.compute_micro_features --token-ids tokenA,tokenB --decision-ts 2026-02-11T12:00:00Z
- Notes:
- As-of input gating is enforced on every query:
event_ts <= decision_tsingested_at <= decision_ts + epsilon
- Persisted feature set is
micro_v1, token-scoped via deterministicfeature_set_version(micro_v1:token:<token_id>) to keep idempotent snapshots per token without schema changes. - Current microstructure features:
mid_price,spread_bpstop_depth_bid,top_depth_ask,book_imbalance_1last_trade_price,last_trade_sizetrade_count_5m,volume_5m,return_5mrealized_vol_1h,stale_seconds_last_trade,stale_seconds_last_book
- As-of input gating is enforced on every query:
- Baseline A:
p = price_probwhereprice_probis selected as-of from market data (midfrom orderbook, fallback to last trade price).
- Baseline B:
- deterministic logistic scorer over
micro_v1features (no training step).
- deterministic logistic scorer over
- Run:
python -m pmx.jobs.backtest_baselines ` --token-ids tokenA,tokenB ` --from 2026-01-01T00:00:00Z ` --to 2026-01-03T00:00:00Z ` --step-hours 4 ` --epsilon-seconds 300 ` --feature-set micro_v1
- Artifact:
artifacts/backtests/<run_id>.json- includes aggregate/per-token metrics (
Brier,ECE,sharpness), plusdataset_hash,config_hash,code_version, and skip counters.
- See
docs/backtest.mdfor details.
- Deterministic components:
- Baseline A (
p = price_prob) - Baseline B (deterministic logistic on
micro_v1) - Ensemble stacker (
p_raw) - Calibration (isotonic, Platt fallback)
- Split-conformal uncertainty intervals (
50/90)
- Baseline A (
- Run:
python -m pmx.jobs.forecast_baseline_ensemble ` --token-ids tokenA,tokenB ` --from 2026-01-01T00:00:00Z ` --to 2026-01-03T00:00:00Z ` --step-hours 4 ` --epsilon-seconds 300 ` --feature-set micro_v1 ` --max-tokens 200
- Artifact:
artifacts/forecasts/<run_id>.json- includes
p_raw,p_cal,interval_50,interval_90, deterministicdrivers,no_trade_flags, and reproducibility hashes:dataset_hash,model_hash,calibration_hash,uncertainty_hash,forecast_payload_hash.
- See
docs/forecast.mdfor details.
- Run selector:
python -m pmx.jobs.select_markets ` --decision-ts 2026-02-11T12:00:00Z ` --epsilon-seconds 300 ` --max-candidates 1500 ` --k-deep 200
- Selector outputs:
- deterministic
selector_v1deep-dive set (final rank bydeep_scoreafter diversification constraints) - baseline comparator sets:
baseline_top_volumebaseline_random_stratified
- deterministic
- Persistence:
- writes
selection_runs+selection_items - writes selector audit artifact in
artifacts/selector/<run_id>.json - candidate score fallback artifact used if dedicated candidate table is absent.
- writes
- Evaluate selector vs baselines:
python -m pmx.jobs.eval_selector ` --decision-ts 2026-02-11T12:00:00Z ` --epsilon-seconds 300 ` --window-hours 72
- writes
artifacts/selector_eval/<run_id>.jsonwith comparativeBrier/ECEreport forselector_v1and baselines.
- writes
- See
docs/selector.mdfor scoring components, bucket mix, and constraints.
- Goal:
- capture real WSS message shapes and confirm seq-like fields.
- Current expectation from shipped fixtures:
supports_seq = Truebased ontests/fixtures/wss/*.json.
- Probe with explicit token ids:
python scripts/wss_probe.py --token-ids tokenA,tokenB --max-messages 200 --out wss_probe.jsonl - Probe with
TOKEN_IDSenv:# POSIX (bash/zsh) TOKEN_IDS="tokenA,tokenB" python scripts/wss_probe.py --max-messages 200 --out wss_probe.jsonl # PowerShell $env:TOKEN_IDS="tokenA,tokenB"; python scripts/wss_probe.py --max-messages 200 --out wss_probe.jsonl - Probe with custom subscribe payload file:
python scripts/wss_probe.py --subscribe-payload-json path/to/subscribe_payload.json --max-messages 200 --out wss_probe.jsonl - Fixture refresh workflow:
- rerun probe when upstream protocol changes;
- update
tests/fixtures/wss/*.jsonwith redacted representative messages; - rerun
tests/test_wss_protocol_fixtures.py; - if protocol support changes, update expected
supports_seq.
src/pmx/
config/
db/
ingest/
jobs/
news/
features/
forecast/
models/
backtest/
audit/
tests/
migrations/
docs/
See docs/guardrails.md for as-of policy, anti-leak rules, idempotency, and audit expectations.
- If tools are unavailable locally, checks are marked as not run/ skipped.
- CI in
.github/workflows/ci.ymlis the source of truth for merge readiness. - CI must pass before merge (lint, typecheck, tests).
docker-compose.ymlis provided for local/dev environments where Docker is available.- If Docker is unavailable locally, use an external Postgres and set
DATABASE_URLorAPP_DATABASE_URL.