feat: persistent baseline cache (CACHEON_BASELINE_CACHE_DIR)#82
Conversation
Adds opt-in BaselineCache persistence that saves/loads the vLLM baseline results to disk, eliminating the 12-min baseline container on repeated runs with the same block_hash + baseline_digest + prompt version. - validator/config.py: add CACHEON_BASELINE_CACHE_DIR env var - validator/gpu_eval.py: load cache before run_baseline(), save after - tests/baseline_cache/README.md: format docs + usage examples
ab985f6 to
40215cb
Compare
| Run a full validator eval with `--baseline-cache-dir` pointed here: | ||
|
|
||
| ```bash | ||
| python3 scripts/run_validator_eval.py \ |
There was a problem hiding this comment.
That script doesn't exist in the repo. gpu_eval is an env-var-driven container entrypoint (python -m validator.gpu_eval)
| _upload_progress(state_dir) | ||
| return 4 | ||
|
|
||
| # ------------------------------------------------------------------ |
There was a problem hiding this comment.
This never hits in production because each block is 12s. the comment is a bit misleading.
This is fine if the intent is local dev/CI only (the README does say so), but the inline comment in gpu_eval.py reads like a prod optimization: # Baseline cache: skip the 12-min vLLM container if a cached result # exists for this (block_hash, baseline_digest, prompt_version) key
|
The baseline is the teacher-forcing reference that every challenger is scored against. Loading it from an unsigned JSON file on disk (gated only by a single env var) means anyone who can write that file controls scoring. If |
|
i think |
Summary
CACHEON_BASELINE_CACHE_DIRenv var tovalidator/config.pyvalidator/gpu_eval.py: loadsBaselineCacheJSON before callingrun_baseline(); saves after a fresh runtests/baseline_cache/README.md: format docs, usage, and key derivation explanationMotivation
The vLLM v0.22.0 baseline container takes ~12 minutes per eval round (10 min model load + 2 min inference). With this cache, repeated runs against the same
(block_hash, baseline_digest, PROMPT_ENGINE_VERSION)skip the container entirely — critical for fast iteration during miner development.Usage
First run saves
tests/baseline_cache/<key>.json. Subsequent runs load it, printing:Test plan
--baseline-cache-dir→ verify JSON written + log line