Skip to content

Add weekly cache research report for 2026-05-24#19

Closed
thinkingfish wants to merge 1 commit into
mainfrom
claude/peaceful-planck-3wpPH
Closed

Add weekly cache research report for 2026-05-24#19
thinkingfish wants to merge 1 commit into
mainfrom
claude/peaceful-planck-3wpPH

Conversation

@thinkingfish

Copy link
Copy Markdown
Member

Summary

  • Adds cache-research/weekly-cache-report-2026-05-24.md covering the 2026-05-18 → 2026-05-24 window (the seven days following the prior weekly).
  • 10 primary entries — expanded from the ~5 target because MLSys 2026 ran during the window (May 18–22, Bellevue), bringing invited talks, posters, and vendor blog launches timed to the conference together with a Cohere model release and three new in-window arXiv KV-systems papers.

Entries

A. Production measurements and empirical studies

  1. vLLM × Novita AI — PegaFlow external KV cache (Rust sidecar) — vLLM blog, May 18
  2. Databricks — managed prompt caching, 2.5× throughput / 3× P50 on production GPT-OSS — May 22
  3. MLSys 2026 — LMCache invited talk + Kitty 2-bit KV poster — Bellevue, May 18–22
  4. Cohere — Command A+ launch with explicit 3:1 sliding-window/global attention ratio and full-precision KV path — May 20

B. Academic / idea-forward work

  1. KVDrive — multi-tier KV management across HBM / DRAM / NVMe (arXiv:2605.18071, SIGMOD 2026)
  2. KVServe — service-aware adaptive KV compression for disaggregated serving (arXiv:2605.13734)
  3. PEEK — context map as an orientation cache for long-context LLM agents (arXiv:2605.19932)
  4. GEM — GPU-variability-aware expert-to-GPU mapping for MoE (arXiv:2605.19945)
  5. OScaR — rotation-based extreme KV quantization, multimodal (arXiv:2605.19660)
  6. Protection Is (Nearly) All You Need — seven-policy eviction bake-off (arXiv:2605.18053)

Plus an Additional Context section (storage caching: Google Cloud Storage Rapid, MinIO MemKV; Red Hat / Snowflake adjacent reads; arXiv just-past-window items: TIDE, PulseCol, Runtime-Certified Quantized Attention) and cross-cutting observations (MLSys as a community event; external KV cache as a separate-process pattern; KV-cache footprint moving into pretraining; eviction-policy methodology catching up; the cache abstraction expanding beyond KV).

Test plan

  • Source-sweep cross-check: frontier AI labs, inference vendors, storage vendors, systems/ML venue accepted lists, arXiv 2605.13xxx–2605.21xxx sweep, curated trackers, HN.
  • Each entry has a primary reference linked in text and listed in the References section.
  • Runtime-evaluation status noted explicitly for every entry (including "partial" / "no" cases).
  • Explicit-negatives list covers every named source class.
  • Methodology caveat documented (arxiv.org / mlsys.org / vllm.ai 403s on direct fetch; figures flagged as "as-claimed").

https://claude.ai/code/session_01EroHYQeTaDJnvUoDgcLbnG


Generated by Claude Code

Covers the 2026-05-18 -> 2026-05-24 window with MLSys 2026 as the
in-window venue event: vLLM x Novita PegaFlow external KV cache,
Databricks managed prompt caching with production GPT-OSS numbers,
MLSys 2026 LMCache invited talk and Kitty poster, Cohere Command A+
with explicit KV-cache footprint engineering, and the in-window
arXiv KV-systems cluster (KVDrive, KVServe, PEEK, GEM, OScaR,
Protection Is Nearly All You Need), plus storage/inference-stack
additional context.

https://claude.ai/code/session_01EroHYQeTaDJnvUoDgcLbnG

Copy link
Copy Markdown
Member Author

Moved to iopsystems/inference-systems#3, where the weekly cache research reports now live under research/. Closing here.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants