Add weekly cache research report for 2026-06-07 by thinkingfish · Pull Request #22 · pelikan-io/cache-rs

thinkingfish · 2026-06-07T16:22:24Z

Summary

Weekly cache research report covering the 2026-05-18 → 2026-06-07 window (delayed cadence, ~3 weeks since the prior 2026-05-17 report). Target was ~5; expanded to 11 primary entries because MLSys 2026 fell inside the window, vLLM v0.22 / TRT-LLM 1.3 / Dynamo Snapshot all shipped, three storage vendors made first-class KV-cache disclosures, four frontier labs landed attention/cache mechanisms, and a thick arXiv batch dropped.

Section A — production measurements and empirical studies

MLSys 2026 in window (BLASST Best Paper, LMCache invited talk, GhostServe, SkipKV, Kitty)
NVIDIA Dynamo Snapshot — KV cache unmap + CRIU; 21× startup-time reduction on gpt-oss-120b
vLLM v0.22.0/.1 + EAGLE 3.1 spec-decode correctness fix
Databricks prompt caching for OSS models — 2.5× throughput, 3× lower P50 in production
MinIO MemKV + WEKA NeuralMesh + VAST — the "G3.5" / CMX KV-cache tier crystallizes
Apple KV Prediction — small auxiliary model produces KV cache; 15–50% accuracy at fixed TTFT FLOPs
Tensormesh / LMCache Series A — $20M from AMD/NVIDIA/CoreWeave; KV-as-durable-data framing
Snowflake ZoRRo + Forest Cascade Attention — multi-prefix RL with shared-KV SMEM groups
Frontier model release wave — Anthropic Opus 4.8, Cohere Command A+, Gemma 4, MAI-Thinking-1, Gemini 3.5
SGLang v0.5.12.post1 — HiCache/HiSparse GSM8K 0.825 → 0.960 accuracy regression fix
WEKA × Dynamo × NIXL on OCI — 252 GB/s per node KV serving

Section B — academic / idea-forward work

VeriCache (arXiv:2605.17613) — speculative-style lossless wrapper over arbitrary lossy KV compressors
ObjectCache (arXiv:2605.22850) — S3-backed KV cache with layerwise overlap; +5.6% TTFT at 64K
Lodestar (arXiv:2606.00946) — online-learning vLLM cluster router; 1.41×/1.47× P99 TTFT, 4.4× on heterogeneous
AsymCache (arXiv:2606.02964) — kernel-aware eviction; 1.90–2.03× TTFT, 1.62–1.71× TPOT
KVarN (arXiv:2606.03458) — calibration-free 2-bit KV beats FP16 throughput in vLLM
Vortex (arXiv:2606.06453) — programmable sparse-attention serving for AI agents; 4.7× over full attention

Additional context

Heavy: 15+ more arXiv KV/cache items, FAST '26 Spring (SolidAttention, MOST, Seneca), NSDI '26 (CacheCatalyst, PD3, DistVS), SIGMOD '26 (LakeMem, DepCache, LINE), OSDI '26 OpenTela, TRT-LLM 1.3.0rc15–17, Together AI / Modal / Anyscale / Red Hat AI activity, CacheLib reawakening + Memcached 1.6.42 security + Redis Iris + Garnet, plus HN threads and explicit-negative coverage of frontier labs / venues with nothing in window.

Cross-cutting observations

MLSys 2026's center of gravity is KV-cache systems; conference confirms a year of vendor claims.
The "G3.5" / CMX KV-cache tier is now a vendor product category (MinIO, WEKA, VAST).
Cold-start as the new TTFT: Dynamo 21× and Modal 40× both via CRIU + KV-unmap.
"Lossy KV cache, but lossless inference" pattern (VeriCache) removes a major production objection.
Frontier labs converged on sliding-window + global attention with KV-sharing + MoE-aware spec decoding.
Cache evaluation methodology continues to be a contribution (HiSparse 0.825→0.960; Structural Protection negative result).

Test plan

Spot-check 3-5 primary references for correct URL / date / numbers
Verify novelty assessments are calibrated against prior weekly reports
Confirm no duplicates with the 2026-05-17 report's entries
Re-check borderline arXiv submission dates once arxiv.org is reachable

https://claude.ai/code/session_011iFwqJQzbbCHjXZz2oEnHA

Generated by Claude Code

Covers the 2026-05-18 → 2026-06-07 window (delayed cadence, ~3 weeks): MLSys 2026 in-window (BLASST best paper, LMCache invited talk, GhostServe, SkipKV), NVIDIA Dynamo Snapshot KV cache unmap, vLLM v0.22 + EAGLE 3.1 correctness fix, Databricks prompt caching production numbers, MinIO MemKV + WEKA + VAST "G3.5" KV-cache tier, Apple KV Prediction, Tensormesh/LMCache $20M round, Snowflake Forest Cascade Attention, frontier-lab release wave (Anthropic Opus 4.8, Cohere Command A+, Gemma 4, MAI-Thinking-1), SGLang HiCache/HiSparse accuracy fix, plus arXiv KV-cache batch (VeriCache, ObjectCache, Lodestar, AsymCache, KVarN, Vortex) and an additional-context sweep. https://claude.ai/code/session_011iFwqJQzbbCHjXZz2oEnHA

thinkingfish · 2026-06-13T06:57:28Z

Moved to iopsystems/inference-systems#3, where the weekly cache research reports now live under research/. Closing here.

Generated by Claude Code

thinkingfish closed this Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weekly cache research report for 2026-06-07#22

Add weekly cache research report for 2026-06-07#22
thinkingfish wants to merge 1 commit into
mainfrom
claude/peaceful-planck-w0Iap

thinkingfish commented Jun 7, 2026

Uh oh!

thinkingfish commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thinkingfish commented Jun 7, 2026

Summary

Section A — production measurements and empirical studies

Section B — academic / idea-forward work

Additional context

Cross-cutting observations

Test plan

Uh oh!

thinkingfish commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants