feat: persistent baseline cache (CACHEON_BASELINE_CACHE_DIR) by ai-hpc · Pull Request #82 · latent-to/cacheon

ai-hpc · 2026-06-08T15:16:03Z

Summary

Adds CACHEON_BASELINE_CACHE_DIR env var to validator/config.py
validator/gpu_eval.py: loads BaselineCache JSON before calling run_baseline(); saves after a fresh run
tests/baseline_cache/README.md: format docs, usage, and key derivation explanation

Motivation

The vLLM v0.22.0 baseline container takes ~12 minutes per eval round (10 min model load + 2 min inference). With this cache, repeated runs against the same (block_hash, baseline_digest, PROMPT_ENGINE_VERSION) skip the container entirely — critical for fast iteration during miner development.

Usage

python3 scripts/run_validator_eval.py \
  --block-hash 0x<hash> \
  --miner-image <image> \
  --model-volume /models \
  --baseline-cache-dir ./tests/baseline_cache

First run saves tests/baseline_cache/<key>.json. Subsequent runs load it, printing:

INFO Loaded baseline from cache: key=5ea34e493462daee file=... (10 prompts)

Test plan

Run eval once with --baseline-cache-dir → verify JSON written + log line
Run again → verify baseline container NOT started + same results

Adds opt-in BaselineCache persistence that saves/loads the vLLM baseline results to disk, eliminating the 12-min baseline container on repeated runs with the same block_hash + baseline_digest + prompt version. - validator/config.py: add CACHEON_BASELINE_CACHE_DIR env var - validator/gpu_eval.py: load cache before run_baseline(), save after - tests/baseline_cache/README.md: format docs + usage examples

xavierlyu · 2026-06-09T11:47:14Z

+Run a full validator eval with `--baseline-cache-dir` pointed here:
+
+```bash
+python3 scripts/run_validator_eval.py \


That script doesn't exist in the repo. gpu_eval is an env-var-driven container entrypoint (python -m validator.gpu_eval)

xavierlyu · 2026-06-09T11:48:28Z

-        _upload_progress(state_dir)
-        return 4
+
+    # ------------------------------------------------------------------


This never hits in production because each block is 12s. the comment is a bit misleading.

This is fine if the intent is local dev/CI only (the README does say so), but the inline comment in gpu_eval.py reads like a prod optimization: # Baseline cache: skip the 12-min vLLM container if a cached result # exists for this (block_hash, baseline_digest, prompt_version) key

xavierlyu · 2026-06-09T11:49:35Z

The baseline is the teacher-forcing reference that every challenger is scored against. Loading it from an unsigned JSON file on disk (gated only by a single env var) means anyone who can write that file controls scoring. If CACHEON_BASELINE_CACHE_DIR is accidentally set on a real validator (shared .env, misconfigured deploy), it silently loads whatever is in that directory with no validation that the prompts match the current block. For a subnet where this feeds directly into weight-setting, that's a meaningful risk

xavierlyu · 2026-06-09T11:50:50Z

i think tests/baseline_cache/ should be gitignored

ai-hpc requested review from camfairchild, clementblaise and xavierlyu as code owners June 8, 2026 15:16

ai-hpc force-pushed the feat/baseline-cache branch from ab985f6 to 40215cb Compare June 8, 2026 15:20

xavierlyu reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: persistent baseline cache (CACHEON_BASELINE_CACHE_DIR)#82

feat: persistent baseline cache (CACHEON_BASELINE_CACHE_DIR)#82
ai-hpc wants to merge 1 commit into
latent-to:mainfrom
ai-hpc:feat/baseline-cache

ai-hpc commented Jun 8, 2026 •

edited

Loading

Uh oh!

xavierlyu Jun 9, 2026

Uh oh!

xavierlyu Jun 9, 2026

Uh oh!

xavierlyu commented Jun 9, 2026

Uh oh!

xavierlyu commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ai-hpc commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Usage

Test plan

Uh oh!

xavierlyu Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

xavierlyu Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

xavierlyu commented Jun 9, 2026

Uh oh!

xavierlyu commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ai-hpc commented Jun 8, 2026 •

edited

Loading