chore: revert the kvbm workaround since trtllm v1.3.0rc3 is upgraded (#6495)

richardhuo-nv · web-flow · commit 15d217606a9d · 2026-02-23T15:05:04.000-08:00
diff --git a/docs/pages/components/kvbm/kvbm-guide.md b/docs/pages/components/kvbm/kvbm-guide.md
@@ -204,30 +204,6 @@ cd $DYNAMO_HOME/examples/backends/vllm
 
 ### Disaggregated Serving with TRT-LLM
 
-> [!NOTE]
-> The latest TensorRT-LLM release (1.3.0rc1) is currently experiencing a request hang when running disaggregated serving with KVBM.
-> Please include the TensorRT-LLM commit id `18e611da773026a55d187870ebcfa95ff00c8482` when building the Dynamo TensorRT-LLM runtime image to test the KVBM + disaggregated serving feature.
-
-```bash
-# Build the Dynamo TensorRT-LLM container using commit ID 18e611da773026a55d187870ebcfa95ff00c8482. Note: This build can take a long time.
-./container/build.sh --framework trtllm --tensorrtllm-commit 18e611da773026a55d187870ebcfa95ff00c8482 --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git
-
-# Launch the container
-./container/run.sh --framework trtllm -it --mount-workspace --use-nixl-gds
-```
-> [!NOTE]
-> Important: After logging into the Dynamo TensorRT-LLM runtime container, copy the Triton kernels into the container's virtual environment as a separate Python module.
-
-```bash
-# Clone the TensorRT-LLM repo and copy the triton_kernels folder into the container as a Python module.
-git clone https://github.com/NVIDIA/TensorRT-LLM.git /tmp/TensorRT-LLM && \
-cd /tmp/TensorRT-LLM && \
-git checkout 18e611da773026a55d187870ebcfa95ff00c8482 && \
-cp -r triton_kernels /opt/dynamo/venv/lib/python3.12/site-packages/ && \
-cd /workspace && \
-rm -rf /tmp/TensorRT-LLM
-```
-
 ```bash
 # Launch prefill worker with KVBM
 python3 -m dynamo.trtllm \
diff --git a/tests/kvbm_integration/test_determinism_disagg.py b/tests/kvbm_integration/test_determinism_disagg.py
@@ -551,10 +551,6 @@ def tester(llm_server):
 class TestDeterminismDisagg(BaseTestDeterminism):
     """Test class for determinism validation."""
 
-    @pytest.mark.skipif(
-        check_module_available("tensorrt_llm"),
-        reason="Skipping test until the TRT-LLM disagg hang issue is fixed. (https://github.com/NVIDIA/TensorRT-LLM/pull/11247)",
-    )
     @pytest.mark.parametrize(
         "llm_server",
         [