Add weekly cache research report for 2026-05-24 by thinkingfish · Pull Request #19 · pelikan-io/cache-rs

thinkingfish · 2026-05-24T16:16:33Z

Summary

Adds cache-research/weekly-cache-report-2026-05-24.md covering the 2026-05-18 → 2026-05-24 window (the seven days following the prior weekly).
10 primary entries — expanded from the ~5 target because MLSys 2026 ran during the window (May 18–22, Bellevue), bringing invited talks, posters, and vendor blog launches timed to the conference together with a Cohere model release and three new in-window arXiv KV-systems papers.

Entries

A. Production measurements and empirical studies

vLLM × Novita AI — PegaFlow external KV cache (Rust sidecar) — vLLM blog, May 18
Databricks — managed prompt caching, 2.5× throughput / 3× P50 on production GPT-OSS — May 22
MLSys 2026 — LMCache invited talk + Kitty 2-bit KV poster — Bellevue, May 18–22
Cohere — Command A+ launch with explicit 3:1 sliding-window/global attention ratio and full-precision KV path — May 20

B. Academic / idea-forward work

KVDrive — multi-tier KV management across HBM / DRAM / NVMe (arXiv:2605.18071, SIGMOD 2026)
KVServe — service-aware adaptive KV compression for disaggregated serving (arXiv:2605.13734)
PEEK — context map as an orientation cache for long-context LLM agents (arXiv:2605.19932)
GEM — GPU-variability-aware expert-to-GPU mapping for MoE (arXiv:2605.19945)
OScaR — rotation-based extreme KV quantization, multimodal (arXiv:2605.19660)
Protection Is (Nearly) All You Need — seven-policy eviction bake-off (arXiv:2605.18053)

Plus an Additional Context section (storage caching: Google Cloud Storage Rapid, MinIO MemKV; Red Hat / Snowflake adjacent reads; arXiv just-past-window items: TIDE, PulseCol, Runtime-Certified Quantized Attention) and cross-cutting observations (MLSys as a community event; external KV cache as a separate-process pattern; KV-cache footprint moving into pretraining; eviction-policy methodology catching up; the cache abstraction expanding beyond KV).

Test plan

Source-sweep cross-check: frontier AI labs, inference vendors, storage vendors, systems/ML venue accepted lists, arXiv 2605.13xxx–2605.21xxx sweep, curated trackers, HN.
Each entry has a primary reference linked in text and listed in the References section.
Runtime-evaluation status noted explicitly for every entry (including "partial" / "no" cases).
Explicit-negatives list covers every named source class.
Methodology caveat documented (arxiv.org / mlsys.org / vllm.ai 403s on direct fetch; figures flagged as "as-claimed").

https://claude.ai/code/session_01EroHYQeTaDJnvUoDgcLbnG

Generated by Claude Code

Covers the 2026-05-18 -> 2026-05-24 window with MLSys 2026 as the in-window venue event: vLLM x Novita PegaFlow external KV cache, Databricks managed prompt caching with production GPT-OSS numbers, MLSys 2026 LMCache invited talk and Kitty poster, Cohere Command A+ with explicit KV-cache footprint engineering, and the in-window arXiv KV-systems cluster (KVDrive, KVServe, PEEK, GEM, OScaR, Protection Is Nearly All You Need), plus storage/inference-stack additional context. https://claude.ai/code/session_01EroHYQeTaDJnvUoDgcLbnG

thinkingfish · 2026-06-13T06:57:26Z

Moved to iopsystems/inference-systems#3, where the weekly cache research reports now live under research/. Closing here.

Generated by Claude Code

thinkingfish closed this Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weekly cache research report for 2026-05-24#19

Add weekly cache research report for 2026-05-24#19
thinkingfish wants to merge 1 commit into
mainfrom
claude/peaceful-planck-3wpPH

thinkingfish commented May 24, 2026

Uh oh!

thinkingfish commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thinkingfish commented May 24, 2026

Summary

Entries

Test plan

Uh oh!

thinkingfish commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants