Skip to content

Add weekly cache research report for 2026-06-07#22

Closed
thinkingfish wants to merge 1 commit into
mainfrom
claude/peaceful-planck-w0Iap
Closed

Add weekly cache research report for 2026-06-07#22
thinkingfish wants to merge 1 commit into
mainfrom
claude/peaceful-planck-w0Iap

Conversation

@thinkingfish

Copy link
Copy Markdown
Member

Summary

Weekly cache research report covering the 2026-05-18 → 2026-06-07 window (delayed cadence, ~3 weeks since the prior 2026-05-17 report). Target was ~5; expanded to 11 primary entries because MLSys 2026 fell inside the window, vLLM v0.22 / TRT-LLM 1.3 / Dynamo Snapshot all shipped, three storage vendors made first-class KV-cache disclosures, four frontier labs landed attention/cache mechanisms, and a thick arXiv batch dropped.

Section A — production measurements and empirical studies

  1. MLSys 2026 in window (BLASST Best Paper, LMCache invited talk, GhostServe, SkipKV, Kitty)
  2. NVIDIA Dynamo Snapshot — KV cache unmap + CRIU; 21× startup-time reduction on gpt-oss-120b
  3. vLLM v0.22.0/.1 + EAGLE 3.1 spec-decode correctness fix
  4. Databricks prompt caching for OSS models — 2.5× throughput, 3× lower P50 in production
  5. MinIO MemKV + WEKA NeuralMesh + VAST — the "G3.5" / CMX KV-cache tier crystallizes
  6. Apple KV Prediction — small auxiliary model produces KV cache; 15–50% accuracy at fixed TTFT FLOPs
  7. Tensormesh / LMCache Series A — $20M from AMD/NVIDIA/CoreWeave; KV-as-durable-data framing
  8. Snowflake ZoRRo + Forest Cascade Attention — multi-prefix RL with shared-KV SMEM groups
  9. Frontier model release wave — Anthropic Opus 4.8, Cohere Command A+, Gemma 4, MAI-Thinking-1, Gemini 3.5
  10. SGLang v0.5.12.post1 — HiCache/HiSparse GSM8K 0.825 → 0.960 accuracy regression fix
  11. WEKA × Dynamo × NIXL on OCI — 252 GB/s per node KV serving

Section B — academic / idea-forward work

  1. VeriCache (arXiv:2605.17613) — speculative-style lossless wrapper over arbitrary lossy KV compressors
  2. ObjectCache (arXiv:2605.22850) — S3-backed KV cache with layerwise overlap; +5.6% TTFT at 64K
  3. Lodestar (arXiv:2606.00946) — online-learning vLLM cluster router; 1.41×/1.47× P99 TTFT, 4.4× on heterogeneous
  4. AsymCache (arXiv:2606.02964) — kernel-aware eviction; 1.90–2.03× TTFT, 1.62–1.71× TPOT
  5. KVarN (arXiv:2606.03458) — calibration-free 2-bit KV beats FP16 throughput in vLLM
  6. Vortex (arXiv:2606.06453) — programmable sparse-attention serving for AI agents; 4.7× over full attention

Additional context

Heavy: 15+ more arXiv KV/cache items, FAST '26 Spring (SolidAttention, MOST, Seneca), NSDI '26 (CacheCatalyst, PD3, DistVS), SIGMOD '26 (LakeMem, DepCache, LINE), OSDI '26 OpenTela, TRT-LLM 1.3.0rc15–17, Together AI / Modal / Anyscale / Red Hat AI activity, CacheLib reawakening + Memcached 1.6.42 security + Redis Iris + Garnet, plus HN threads and explicit-negative coverage of frontier labs / venues with nothing in window.

Cross-cutting observations

  • MLSys 2026's center of gravity is KV-cache systems; conference confirms a year of vendor claims.
  • The "G3.5" / CMX KV-cache tier is now a vendor product category (MinIO, WEKA, VAST).
  • Cold-start as the new TTFT: Dynamo 21× and Modal 40× both via CRIU + KV-unmap.
  • "Lossy KV cache, but lossless inference" pattern (VeriCache) removes a major production objection.
  • Frontier labs converged on sliding-window + global attention with KV-sharing + MoE-aware spec decoding.
  • Cache evaluation methodology continues to be a contribution (HiSparse 0.825→0.960; Structural Protection negative result).

Test plan

  • Spot-check 3-5 primary references for correct URL / date / numbers
  • Verify novelty assessments are calibrated against prior weekly reports
  • Confirm no duplicates with the 2026-05-17 report's entries
  • Re-check borderline arXiv submission dates once arxiv.org is reachable

https://claude.ai/code/session_011iFwqJQzbbCHjXZz2oEnHA


Generated by Claude Code

Covers the 2026-05-18 → 2026-06-07 window (delayed cadence,
~3 weeks): MLSys 2026 in-window (BLASST best paper, LMCache
invited talk, GhostServe, SkipKV), NVIDIA Dynamo Snapshot KV
cache unmap, vLLM v0.22 + EAGLE 3.1 correctness fix, Databricks
prompt caching production numbers, MinIO MemKV + WEKA + VAST
"G3.5" KV-cache tier, Apple KV Prediction, Tensormesh/LMCache
$20M round, Snowflake Forest Cascade Attention, frontier-lab
release wave (Anthropic Opus 4.8, Cohere Command A+, Gemma 4,
MAI-Thinking-1), SGLang HiCache/HiSparse accuracy fix, plus
arXiv KV-cache batch (VeriCache, ObjectCache, Lodestar,
AsymCache, KVarN, Vortex) and an additional-context sweep.

https://claude.ai/code/session_011iFwqJQzbbCHjXZz2oEnHA

Copy link
Copy Markdown
Member Author

Moved to iopsystems/inference-systems#3, where the weekly cache research reports now live under research/. Closing here.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants