docs(blog): add post on streaming vLLM weights from Azure Blob Storage by surajssd · Pull Request #5845 · Azure/AKS

surajssd · 2026-06-26T23:08:06Z

Summary

Adds a new AKS Engineering Blog post, "Stream Model Weights to NVIDIA GPU running vLLM from Azure Blob Storage on AKS" (website/blog/2026-06-26-runai-streamer-vllm/index.md), a runnable end-to-end walkthrough for serving microsoft/phi-4 with vLLM on AKS while streaming model weights directly from Azure Blob Storage via the RunAI Model Streamer's native az:// scheme. The post leans on a fully managed A100 GPU node pool and workload identity so no storage keys are needed, and explains why streaming beats the default download-then-load path for autoscaling inference cold starts.

Changes

New blog post (index.md, ~640 lines) walking through: deploying an AKS cluster with OIDC + workload identity and a managed GPU node pool (--enable-managed-gpu=true), creating a premium block-blob storage account, wiring up workload identity for keyless Blob access, an in-cluster upload Job that pushes microsoft/phi-4 weights to Blob, and a vLLM Deployment that streams them via --load-format runai_streamer. Includes "Why stream", "Trade-offs and downsides", a verification step (spotting the Loading safetensors using Runai Model Streamer log line), and a conclusion.
Four diagrams: a hero image plus 1-why-stream-vs-download.png, 2-identity.png (workload-identity trust chain), and 3-end-to-end.png (end-to-end flow), each with descriptive alt text.
New author entry hariharan-sethuraman added to website/blog/authors.yml; post co-authored with suraj-deshmukh.
Review-feedback hardening incorporated over the branch: wired serviceAccountName: ${SERVICE_ACCOUNT_NAME} into both pod specs (and the Job's envsubst allowlist) with an idempotent kubectl create serviceaccount step, added set -euo pipefail and pinned huggingface-hub>=0.34 in the upload Job, reserved ephemeral-storage to avoid DiskPressure eviction, added an az feature show poll so the node-pool step doesn't race feature registration, added envsubst to Prerequisites, fixed the phi-4 size figure, corrected the workload-identity label explanation, and depersonalized the Configuration variables.
Formatting: converted blockquote callouts to Docusaurus :::note/:::caution/:::tip admonitions and replaced the Mermaid diagrams with images.

Test Plan

npm run build succeeds in website/ (static site compiles; post renders at /2026/06/26/runai-streamer-vllm with both authors).
markdownlint-cli2 passes against blog/linters/.markdown-lint.yml (0 errors).
codespell clean with the repo's ignore list.
Embedded Job and Deployment YAML manifests parse as valid Kubernetes objects.
Reviewer confirms the embedded az/kubectl walkthrough on a live AKS cluster with A100 quota, and that vllm/vllm-openai:v0.23.0 pulls.

- add `2026-06-26-runai-streamer-vllm` post covering serving `microsoft/phi-4` with vLLM on AKS, streaming weights from Azure Blob via the RunAI Model Streamer (`az://`) with workload identity - add `hariharan-sethuraman` author entry to `authors.yml` (placeholder details pending) Signed-off-by: Suraj Deshmukh <[email protected]>

Copilot

Pull request overview

Adds a new AKS blog post that walks through serving microsoft/phi-4 with vLLM while streaming weights directly from Azure Blob Storage via the RunAI Model Streamer (az://) using workload identity, plus a new author profile entry to support the post.

Changes:

Added a new blog post: streaming vLLM weights from Azure Blob on AKS with workload identity.
Added a new author key (hariharan-sethuraman) to website/blog/authors.yml.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
website/blog/authors.yml	Adds a new author entry used by the new blog post.
website/blog/2026-06-26-runai-streamer-vllm/index.md	New end-to-end tutorial post for streaming vLLM weights from Azure Blob Storage on AKS.

- reword the S3/GCS-to-Azure-Blob intro for clearer phrasing - add a closing transition sentence to the cold-start section - add a note on bumping the upload Job timeout for larger (70B+) models Signed-off-by: Suraj Deshmukh <[email protected]>

Upload Job reliability: - add `set -euo pipefail` so a failed `curl | tar` azcopy download fails loudly instead of silently - pin `huggingface-hub>=0.34` to guarantee the renamed `hf` CLI is present - reserve disk via an `emptyDir` scratch volume and `ephemeral-storage` requests/limits to avoid `DiskPressure` eviction - document deleting the immutable `Job` before re-applying to avoid `AlreadyExists` - rewrite the timeout note to cover disk sizing for larger models, not just time Walkthrough correctness: - add `--overwrite` to the `kubectl annotate`/`label serviceaccount` commands so the section is idempotent - wait for `/health` before the first `curl` so it does not race `kubectl port-forward` - clarify that the `azure.workload.identity/use` pod-template label (not the SA label) drives token injection Editorial: - standardize on `NVIDIA`, `Microsoft Entra ID`, and `labeled` - add the missing trailing period on the `--load-format runai_streamer` line Signed-off-by: Suraj Deshmukh <[email protected]>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

Blog post (`2026-06-26-runai-streamer-vllm`): - replace "checkpoint" with "model weights" throughout for consistent terminology - reword list item 3 from "so pods" to "that lets pods" to fix the grammar and match the parallel list structure - add a "Tuning the streamer" note documenting `--model-loader-extra-config` options (`distributed`, `concurrency`, `memory_limit`) Author metadata: - replace the `TODO` placeholders in the `hariharan-sethuraman` `authors.yml` entry with the real LinkedIn URL and GitHub handle Signed-off-by: Suraj Deshmukh <[email protected]>

…l creation - poll `az feature show` until `ManagedGPUExperiencePreview` reports `Registered` so the step 1c node-pool command does not fail while the feature is still registering - refresh the resource provider with `az provider register` once the feature is registered Signed-off-by: Suraj Deshmukh <[email protected]>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

…nsides - swap the Mermaid "why stream" flowchart for an optimized `1-why-stream-vs-download.png` (1200px, 256-color, ~400 KB) with descriptive alt text - add a "Trade-offs and downsides" section covering per-cold-start streaming cost, the one-upload-per-model step, Safetensors-only support, and owning a second copy of the weights Signed-off-by: Suraj Deshmukh <[email protected]>

Copilot

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.

… with images - swap the §3 workload-identity trust-chain Mermaid flowchart for `2-identity.png` with descriptive alt text - swap the "How it fits together" Mermaid flowchart for `3-end-to-end.png` with descriptive alt text Signed-off-by: Suraj Deshmukh <[email protected]>

…clusion Content: - convert the five blockquote callouts (why-upload, role-assignment permissions, propagation wait, streamer tuning, scaling note) to Docusaurus `:::note`/`:::caution`/`:::tip` admonitions - add a §5 step showing how to confirm the weights loaded via the RunAI streamer by spotting the `Loading safetensors using Runai Model Streamer` log line - add a Conclusion section summarizing the approach and when to adopt it - minor intro and workload-identity prose rewording Assets: - re-export `1-why-stream-vs-download.png` and `3-end-to-end.png` (slightly smaller files) Signed-off-by: Suraj Deshmukh <[email protected]>

Copilot

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated 6 comments.

…post - replace personal names in `AZURE_RESOURCE_GROUP` and `STORAGE_ACCOUNT_NAME` with generic values - turn `AZURE_REGION`, `NODE_POOL_VM_SIZE`, and `STORAGE_ACCOUNT_NAME` into descriptive placeholders - note that readers should modify the variables to match their environment Signed-off-by: Suraj Deshmukh <[email protected]>

- wire `serviceAccountName: ${SERVICE_ACCOUNT_NAME}` into the Job and Deployment pod specs, add it to the Job `envsubst` allowlist, and add an idempotent `kubectl create serviceaccount` step so a non-default SERVICE_ACCOUNT_NAME no longer breaks Blob auth - fix the phi-4 size contradiction: `~14 GB` is now `14.7B-parameter, ~29 GB on disk in bf16`, matching the ephemeral-storage comment - remove the incorrect claim that pods inherit the `azure.workload.identity/use` label from their ServiceAccount - add `envsubst` (GNU gettext) to the Prerequisites - rewrite the front-matter `description` to ~155 chars for SEO Signed-off-by: Suraj Deshmukh <[email protected]>

Copilot

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 3 comments.

- reword the opening to mention Kubernetes and tidy phrasing ("on Kubernetes", "back-to-back", "However") - add a sentence on what faster cold starts mean for production inference on AKS (failure recovery, rollouts, autoscaling) - drop the AWS S3/GCS lead-in to focus the availability note on Azure Blob, and use "As of ... now supported" - link `HuggingFace Hub` and say "entire model" in the diagram alt text - expand the closing line to cover how the win compounds with larger models and busier autoscaling Signed-off-by: Suraj Deshmukh <[email protected]>

Copilot

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 3 comments.

surajssd requested review from a team, AllenWen-at-Azure and Copilot June 26, 2026 23:08

Copilot started reviewing on behalf of surajssd June 26, 2026 23:08 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

surajssd added 2 commits June 26, 2026 16:18

Copilot AI review requested due to automatic review settings June 27, 2026 00:02

Copilot started reviewing on behalf of surajssd June 27, 2026 00:03 View session

Copilot AI reviewed Jun 27, 2026

View reviewed changes

surajssd added 2 commits June 29, 2026 11:18

Copilot AI review requested due to automatic review settings June 29, 2026 18:23

Copilot started reviewing on behalf of surajssd June 29, 2026 18:23 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Remove sections to simplify

805a8cd

Copilot AI review requested due to automatic review settings June 29, 2026 18:55

Copilot started reviewing on behalf of surajssd June 29, 2026 18:56 View session

surajssd force-pushed the add-run-ai-blog branch from 82c3a20 to 91c179e Compare June 29, 2026 18:57

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md Outdated

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md Outdated

Copilot AI review requested due to automatic review settings June 29, 2026 19:00

Copilot started reviewing on behalf of surajssd June 29, 2026 19:01 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

surajssd added 2 commits June 29, 2026 12:13

Copilot AI review requested due to automatic review settings June 29, 2026 20:17

Copilot started reviewing on behalf of surajssd June 29, 2026 20:18 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

surajssd added 2 commits June 29, 2026 13:21

Copilot AI review requested due to automatic review settings June 29, 2026 20:51

Copilot started reviewing on behalf of surajssd June 29, 2026 20:51 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md

Add hero image

8386330

Copilot AI review requested due to automatic review settings June 30, 2026 00:15

Copilot started reviewing on behalf of surajssd June 30, 2026 00:16 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md Outdated

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md Outdated

Comment thread website/blog/2026-06-26-runai-streamer-vllm/index.md Outdated

Update title and description

9e03686

surajssd force-pushed the add-run-ai-blog branch from 3992664 to 9e03686 Compare June 30, 2026 00:20