From 023d6dfb6cbb7a5fd6b158380a012a5eba6de581 Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:27:16 +0200 Subject: [PATCH 01/10] Improve AMD accelerator example --- mkdocs/docs/examples/accelerators/amd.md | 272 +++++++++++------------ mkdocs/docs/examples/training/axolotl.md | 4 - mkdocs/docs/examples/training/trl.md | 4 - 3 files changed, 135 insertions(+), 145 deletions(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 7fbfb8072..6d4f5954f 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -1,17 +1,55 @@ --- title: AMD -description: Deploying and fine-tuning models on AMD MI300X GPUs using SGLang, vLLM, TRL, and Axolotl +description: Running dev environments, tasks, and services on AMD GPUs --- # AMD -`dstack` supports running dev environments, tasks, and services on AMD GPUs. -You can do that by setting up an [SSH fleet](../../concepts/fleets.md#ssh-fleets) -with on-prem AMD GPUs or configuring a backend that offers AMD GPUs such as the `runpod` backend. +`dstack` natively supports AMD GPUs. This page covers the basics of setting up +fleets, running inference, training, and dev environments on AMD GPUs. -## Deployment +## Fleets -Here are examples of a [service](../../concepts/services.md) that deploy +`dstack` supports native cloud provisioning, and can also work with existing +Kubernetes clusters or vanilla bare-metal hosts. + +=== "Clouds" + + `dstack` supports native provisioning of VMs with AMD GPUs across a number + of clouds, including + [AMD Developer Cloud](../../concepts/backends.md#amd-developer-cloud) and + [Hot Aisle](../../concepts/backends.md#hot-aisle). More cloud support is + coming soon. + + To provision compute in these clouds, configure the corresponding + [backend](../../concepts/backends.md) and create a + [backend fleet](../../concepts/fleets.md). + +=== "Kubernetes" + + To use `dstack` with existing Kubernetes cluster(s), configure the + [`kubernetes` backend](../../concepts/backends.md#kubernetes) and point it + to your kubeconfig file. Then create a + [backend fleet](../../concepts/fleets.md). + +=== "SSH fleets" + + If you'd like `dstack` to use a cluster or machine that is already + provisioned and that you have access to, create an + [SSH fleet](../../concepts/fleets.md). + +!!! info "Cluster placement" + For multi-node workloads, the fleet must set `placement` to `cluster`. For + Kubernetes and SSH fleets, the network must be properly configured. + + To test whether the fleet is properly configured, run the + [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). + +Once a fleet is created, you can run dev environments, tasks, and services. + +## Inference + +Here are examples of a [service](../../concepts/services.md) that deploys `Qwen/Qwen3.6-27B` on AMD MI300X GPUs using [SGLang](https://github.com/sgl-project/sglang) and [vLLM](https://docs.vllm.ai/en/latest/). @@ -22,7 +60,7 @@ Here are examples of a [service](../../concepts/services.md) that deploy ```yaml type: service - name: qwen36-service-sglang-amd + name: qwen36-sglang-amd image: lmsysorg/sglang:v0.5.10-rocm720-mi30x @@ -50,18 +88,24 @@ Here are examples of a [service](../../concepts/services.md) that deploy memory: 896GB.. shm_size: 16GB disk: 450GB.. - gpu: MI300X:4 + gpu: MI300X:4.. ``` + !!! info "PD disaggregation" + To run SGLang with prefill and decode workers on an interconnected + cluster of AMD GPU instances, see the + [SGLang PD disaggregation](../inference/sglang.md#pd-disaggregation) + example. + === "vLLM"
```yaml type: service - name: qwen36-service-vllm-amd + name: qwen36-vllm-amd image: vllm/vllm-openai-rocm:v0.19.1 @@ -87,164 +131,118 @@ Here are examples of a [service](../../concepts/services.md) that deploy memory: 896GB.. shm_size: 16GB disk: 450GB.. - gpu: MI300X:4 + gpu: MI300X:4.. ```
-!!! info "Docker image" - AMD workloads require specifying an image with ROCm-compatible userspace and - framework packages. The SGLang and vLLM examples above use pinned ROCm - images. +Use the [`dstack apply`](../../reference/cli/dstack/apply.md) command to apply +any configuration, including services, tasks, dev environments, and fleets. - If you already have a ROCm-compatible image, use it. Otherwise, choose an - image for the framework you use from - [ROCm Docker images](https://hub.docker.com/u/rocm), e.g. `rocm/sgl-dev` - for SGLang, `rocm/vllm` for vLLM, or `rocm/pytorch` for PyTorch. For - generic AMD dev environments or tasks, use `rocm/dev-ubuntu-24.04`. +
-To request multiple GPUs, specify the quantity after the GPU name, separated by a colon, e.g., `MI300X:4`. +```shell +$ dstack apply -f service.dstack.yml +``` -## Fine-tuning +
-> If you're planning multi-node AMD training, validate cluster networking first -with the [NCCL/RCCL tests](../clusters/nccl-rccl-tests.md) -example. +## Training + +Below is a [task](../../concepts/tasks.md) that fine-tunes a small language +model using the official +[Transformers causal language modeling example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling) +on AMD GPUs. + +
+ +```yaml +type: task +name: amd-qwen3-train + +image: rocm/pytorch:latest + +commands: + - git clone --depth 1 https://github.com/huggingface/transformers.git + - pip install -e ./transformers -r transformers/examples/pytorch/language-modeling/requirements.txt + - | + torchrun --standalone --nproc-per-node $DSTACK_GPUS_PER_NODE \ + transformers/examples/pytorch/language-modeling/run_clm.py \ + --model_name_or_path Qwen/Qwen3-0.6B-Base \ + --dataset_name Salesforce/wikitext \ + --dataset_config_name wikitext-2-raw-v1 \ + --do_train \ + --per_device_train_batch_size 1 \ + --gradient_accumulation_steps 8 \ + --max_steps 10 \ + --block_size 512 \ + --learning_rate 2e-5 \ + --bf16 \ + --logging_steps 1 \ + --output_dir /tmp/qwen3-clm + +resources: + gpu: MI300X:4.. + disk: 100GB.. +``` -=== "TRL" +
- Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html) - and the [`mlabonne/guanaco-llama2-1k`](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k) - dataset. +For multi-node training, see [distributed tasks](../../concepts/tasks.md#distributed-tasks). -
+## Dev environments - ```yaml - type: task - name: trl-amd-llama31-train - - # Using Runpod's ROCm Docker image - image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04 - - # Required environment variables - env: - - HF_TOKEN - # Mount files - files: - - train.py - # Commands of the task - commands: - - export PATH=/opt/conda/envs/py_3.10/bin:$PATH - - git clone https://github.com/ROCm/bitsandbytes - - cd bitsandbytes - - git checkout rocm_enabled - - pip install -r requirements-dev.txt - - cmake -DBNB_ROCM_ARCH="gfx942" -DCOMPUTE_BACKEND=hip -S . - - make - - pip install . - - pip install trl - - pip install peft - - pip install transformers datasets huggingface-hub scipy - - cd .. - - python train.py - - # Uncomment to leverage spot instances - #spot_policy: auto +Here's an example of a [dev environment](../../concepts/dev-environments.md) +that can be accessed via your desktop IDE. - resources: - gpu: MI300X - disk: 150GB - ``` +
-
+```yaml +type: dev-environment +name: amd-vscode -=== "Axolotl" - Below is an example of fine-tuning Llama 3.1 8B using [Axolotl](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html) - and the [tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) - dataset. +image: rocm/dev-ubuntu-24.04 -
+ide: vscode - ```yaml - type: task - # The name is optional, if not specified, generated randomly - name: axolotl-amd-llama31-train - - # Using Runpod's ROCm Docker image - image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04 - # Required environment variables - env: - - HF_TOKEN - - WANDB_API_KEY - - WANDB_PROJECT - - WANDB_NAME=axolotl-amd-llama31-train - - HUB_MODEL_ID - # Commands of the task - commands: - - export PATH=/opt/conda/envs/py_3.10/bin:$PATH - - pip uninstall torch torchvision torchaudio -y - - python3 -m pip install --pre torch==2.3.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0/ - - git clone https://github.com/OpenAccess-AI-Collective/axolotl - - cd axolotl - - git checkout d4f6c65 - - pip install -e . - # Latest pynvml is not compatible with axolotl commit d4f6c65, so we need to fall back to version 11.5.3 - - pip uninstall pynvml -y - - pip install pynvml==11.5.3 - - cd .. - - wget https://dstack-binaries.s3.amazonaws.com/flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl - - pip install flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl - - wget https://dstack-binaries.s3.amazonaws.com/xformers-0.0.26-cp310-cp310-linux_x86_64.whl - - pip install xformers-0.0.26-cp310-cp310-linux_x86_64.whl - - git clone --recurse https://github.com/ROCm/bitsandbytes - - cd bitsandbytes - - git checkout rocm_enabled - - pip install -r requirements-dev.txt - - cmake -DBNB_ROCM_ARCH="gfx942" -DCOMPUTE_BACKEND=hip -S . - - make - - pip install . - - cd .. - - accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml - --wandb-project "$WANDB_PROJECT" - --wandb-name "$WANDB_NAME" - --hub-model-id "$HUB_MODEL_ID" +resources: + gpu: MI300X:1 +``` - resources: - gpu: MI300X - disk: 150GB - ``` -
+
- Note, to support ROCm, we need to checkout to commit `d4f6c65`. This commit eliminates the need to manually modify the Axolotl source code to make xformers compatible with ROCm, as described in the [xformers workaround](https://docs.axolotl.ai/docs/amd_hpc.html#apply-xformers-workaround). This installation approach is also followed for building Axolotl ROCm docker image. [(See Dockerfile)](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm). +## Docker image - > To speed up installation of `flash-attention` and `xformers`, we use pre-built binaries uploaded to S3. +> If you'd like a run to use AMD GPUs, make sure to specify `image`. The image +> should include a ROCm runtime compatible with the AMD GPUs and the packages +> your workload needs. -## Running a configuration +## Metrics -Once a configuration is ready, save it to a `.dstack.yml` file. If your -configuration references environment variables such as `HF_TOKEN` or -`WANDB_API_KEY`, export them first. Then run -`dstack apply -f `, and `dstack` will automatically -provision the cloud resources and run the configuration. +Run and job [metrics](../../concepts/metrics.md) include CPU, memory, and GPU +usage. They are available in the UI and via the CLI:
```shell -$ dstack apply -f +$ dstack metrics <run name> ```
+> AMD GPU metrics require `amd-smi` to be available in the run image. If it +> isn't present, GPU metrics may be unavailable. + ## What's next? 1. Browse the dedicated [SGLang](../inference/sglang.md) - and [vLLM](../inference/vllm.md) examples, plus - [Axolotl](https://github.com/ROCm/rocm-blogs/tree/release/blogs/artificial-intelligence/axolotl), - [TRL](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/fine-tuning-and-inference.html), - and [ROCm Bitsandbytes](https://github.com/ROCm/bitsandbytes) -2. For multi-node training, run - [NCCL/RCCL tests](../clusters/nccl-rccl-tests.md) - to validate AMD cluster networking. -3. Check [dev environments](../../concepts/dev-environments.md), - [tasks](../../concepts/tasks.md), and - [services](../../concepts/services.md). + and [vLLM](../inference/vllm.md) examples, plus the + [Qwen 3.6](../models/qwen36.md) model page. +2. For multi-node inference, see + [SGLang PD disaggregation](../inference/sglang.md#pd-disaggregation). +3. For cluster validation, run + [NCCL/RCCL tests](../clusters/nccl-rccl-tests.md). +4. Check [dev environments](../../concepts/dev-environments.md), + [tasks](../../concepts/tasks.md), [services](../../concepts/services.md), + [fleets](../../concepts/fleets.md), and + [backends](../../concepts/backends.md). diff --git a/mkdocs/docs/examples/training/axolotl.md b/mkdocs/docs/examples/training/axolotl.md index 5266a8674..5d59e5802 100644 --- a/mkdocs/docs/examples/training/axolotl.md +++ b/mkdocs/docs/examples/training/axolotl.md @@ -54,9 +54,6 @@ resources: The task uses Axolotl's Docker image, where Axolotl is already pre-installed. -!!! info "AMD" - The example above uses NVIDIA accelerators. To use it with AMD, check out [AMD](../accelerators/amd.md#axolotl). - ### Run the configuration Once the configuration is ready, run `dstack apply -f `, and `dstack` will automatically provision the @@ -182,4 +179,3 @@ Provisioning... 1. Check [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), [services](../../concepts/services.md), and [fleets](../../concepts/fleets.md) 2. Read about [cluster placement](../../concepts/fleets.md#cluster-placement) -3. See the [AMD](../accelerators/amd.md#axolotl) example diff --git a/mkdocs/docs/examples/training/trl.md b/mkdocs/docs/examples/training/trl.md index ffeb3766f..e75a4d89b 100644 --- a/mkdocs/docs/examples/training/trl.md +++ b/mkdocs/docs/examples/training/trl.md @@ -75,9 +75,6 @@ resources: Change the `resources` property to specify more GPUs. -!!! info "AMD" - The example above uses NVIDIA accelerators. To use it with AMD, check out [AMD](../accelerators/amd.md#trl). - ??? info "DeepSpeed" For more memory-efficient use of multiple GPUs, consider using DeepSpeed and ZeRO Stage 3. @@ -269,4 +266,3 @@ Provisioning... 1. Check [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), [services](../../concepts/services.md), and [fleets](../../concepts/fleets.md) 2. Read about [cluster placement](../../concepts/fleets.md#cluster-placement) -3. See the [AMD](../accelerators/amd.md#trl) example From de3400f55edb69c20ee30b992cd2152ef09a91d4 Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:39:55 +0200 Subject: [PATCH 02/10] Polish AMD cluster placement note --- mkdocs/docs/examples/accelerators/amd.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 6d4f5954f..0dffb27a5 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -42,8 +42,8 @@ Kubernetes clusters or vanilla bare-metal hosts. For multi-node workloads, the fleet must set `placement` to `cluster`. For Kubernetes and SSH fleets, the network must be properly configured. - To test whether the fleet is properly configured, run the - [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). + > To test whether the fleet is properly configured, run the + > [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). Once a fleet is created, you can run dev environments, tasks, and services. From a35ab750b6389151c1bc4b3531a2f523af014648 Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:40:19 +0200 Subject: [PATCH 03/10] Polish AMD training note --- mkdocs/docs/examples/accelerators/amd.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 0dffb27a5..440d25a06 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -188,7 +188,8 @@ resources: -For multi-node training, see [distributed tasks](../../concepts/tasks.md#distributed-tasks). +> For multi-node training, see +> [distributed tasks](../../concepts/tasks.md#distributed-tasks). ## Dev environments From 5b1be2067c24723ee3248c24644e7bc3a38eeb7f Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:42:31 +0200 Subject: [PATCH 04/10] Clarify AMD distributed training note --- mkdocs/docs/examples/accelerators/amd.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 440d25a06..8063a26fd 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -188,8 +188,12 @@ resources: -> For multi-node training, see -> [distributed tasks](../../concepts/tasks.md#distributed-tasks). +!!! info "Distributed tasks" + To run training across multiple nodes, use + [distributed tasks](../../concepts/tasks.md#distributed-tasks). Distributed + tasks may run on a cluster; in that case, the fleet must set `placement` to + `cluster` and have a proper interconnect. See the cluster placement note + above. ## Dev environments From 29f316adcd7aeafa0433a20140446934756429ba Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:43:15 +0200 Subject: [PATCH 05/10] Clarify AMD PD disaggregation placement --- mkdocs/docs/examples/accelerators/amd.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 8063a26fd..83e8d7406 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -99,6 +99,10 @@ Here are examples of a [service](../../concepts/services.md) that deploys [SGLang PD disaggregation](../inference/sglang.md#pd-disaggregation) example. + For multi-node PD disaggregation, the fleet must set `placement` to + `cluster` and have a proper interconnect. See the cluster placement note + above. + === "vLLM"
From 433f253d49d4906a14a6eaa2545e5fc2fa34c1a6 Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:43:38 +0200 Subject: [PATCH 06/10] Polish AMD cluster validation wording --- mkdocs/docs/examples/accelerators/amd.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 83e8d7406..4695e1954 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -42,7 +42,7 @@ Kubernetes clusters or vanilla bare-metal hosts. For multi-node workloads, the fleet must set `placement` to `cluster`. For Kubernetes and SSH fleets, the network must be properly configured. - > To test whether the fleet is properly configured, run the + > To test whether the cluster is properly configured, run the > [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). Once a fleet is created, you can run dev environments, tasks, and services. From 9cc43281412c001c49e5325f311988b5ff88d107 Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:45:05 +0200 Subject: [PATCH 07/10] Add AMD cluster placement anchor --- mkdocs/docs/examples/accelerators/amd.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 4695e1954..983f699bc 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -38,12 +38,14 @@ Kubernetes clusters or vanilla bare-metal hosts. provisioned and that you have access to, create an [SSH fleet](../../concepts/fleets.md). + + !!! info "Cluster placement" For multi-node workloads, the fleet must set `placement` to `cluster`. For Kubernetes and SSH fleets, the network must be properly configured. - > To test whether the cluster is properly configured, run the - > [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). + To test whether the cluster is properly configured, run the + [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). Once a fleet is created, you can run dev environments, tasks, and services. @@ -100,8 +102,8 @@ Here are examples of a [service](../../concepts/services.md) that deploys example. For multi-node PD disaggregation, the fleet must set `placement` to - `cluster` and have a proper interconnect. See the cluster placement note - above. + `cluster` and have a proper interconnect. See + [cluster placement](#cluster-placement). === "vLLM" @@ -196,8 +198,8 @@ resources: To run training across multiple nodes, use [distributed tasks](../../concepts/tasks.md#distributed-tasks). Distributed tasks may run on a cluster; in that case, the fleet must set `placement` to - `cluster` and have a proper interconnect. See the cluster placement note - above. + `cluster` and have a proper interconnect. See + [cluster placement](#cluster-placement). ## Dev environments From e26d7ca405c39ca7249ff58aa388330936b224b1 Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:51:13 +0200 Subject: [PATCH 08/10] Polish AMD Docker image note --- mkdocs/docs/examples/accelerators/amd.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index 983f699bc..bf5bcff4b 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -224,9 +224,10 @@ resources: ## Docker image -> If you'd like a run to use AMD GPUs, make sure to specify `image`. The image -> should include a ROCm runtime compatible with the AMD GPUs and the packages -> your workload needs. +> If you'd like a run to use AMD GPUs, make sure to specify `image`. + +The image's ROCm runtime must be compatible with the AMD GPUs the run will use. +The image should also include the packages your workload needs. ## Metrics From 1d48b45dd9aa37e1ae3a61583ea74c91af40905e Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:55:04 +0200 Subject: [PATCH 09/10] Show AMD cluster placement in navigation --- mkdocs/docs/examples/accelerators/amd.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index bf5bcff4b..c85cece93 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -38,14 +38,15 @@ Kubernetes clusters or vanilla bare-metal hosts. provisioned and that you have access to, create an [SSH fleet](../../concepts/fleets.md). - - !!! info "Cluster placement" - For multi-node workloads, the fleet must set `placement` to `cluster`. For - Kubernetes and SSH fleets, the network must be properly configured. + ### Cluster placement { #cluster-placement style="display: none;" } + + For multi-node workloads, the fleet must + [set](../../concepts/fleets.md#cluster-placement) `placement` to `cluster`. + For Kubernetes and SSH fleets, the network must be properly configured. - To test whether the cluster is properly configured, run the - [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). + > To test whether the cluster is properly configured, run the + > [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). Once a fleet is created, you can run dev environments, tasks, and services. From 0c1697c0496a35018978c55aaea2aac74ab96cc3 Mon Sep 17 00:00:00 2001 From: Andrey Cheptsov Date: Sat, 23 May 2026 21:58:39 +0200 Subject: [PATCH 10/10] Polish AMD cluster placement links --- mkdocs/docs/examples/accelerators/amd.md | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/mkdocs/docs/examples/accelerators/amd.md b/mkdocs/docs/examples/accelerators/amd.md index c85cece93..f94af3d02 100644 --- a/mkdocs/docs/examples/accelerators/amd.md +++ b/mkdocs/docs/examples/accelerators/amd.md @@ -39,14 +39,12 @@ Kubernetes clusters or vanilla bare-metal hosts. [SSH fleet](../../concepts/fleets.md). !!! info "Cluster placement" - ### Cluster placement { #cluster-placement style="display: none;" } - For multi-node workloads, the fleet must [set](../../concepts/fleets.md#cluster-placement) `placement` to `cluster`. For Kubernetes and SSH fleets, the network must be properly configured. - > To test whether the cluster is properly configured, run the - > [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). + To test whether the cluster is properly configured, run the + [RCCL tests via a distributed task](../clusters/nccl-rccl-tests.md). Once a fleet is created, you can run dev environments, tasks, and services. @@ -102,9 +100,7 @@ Here are examples of a [service](../../concepts/services.md) that deploys [SGLang PD disaggregation](../inference/sglang.md#pd-disaggregation) example. - For multi-node PD disaggregation, the fleet must set `placement` to - `cluster` and have a proper interconnect. See - [cluster placement](#cluster-placement). + For multi-node PD disaggregation, the fleet must use [cluster placement](../../concepts/fleets.md#cluster-placement). === "vLLM" @@ -198,9 +194,8 @@ resources: !!! info "Distributed tasks" To run training across multiple nodes, use [distributed tasks](../../concepts/tasks.md#distributed-tasks). Distributed - tasks may run on a cluster; in that case, the fleet must set `placement` to - `cluster` and have a proper interconnect. See - [cluster placement](#cluster-placement). + tasks may run on a cluster; in that case, the fleet must use + [cluster placement](../../concepts/fleets.md#cluster-placement). ## Dev environments