rtx-pro-6000

Here are 13 public repositories matching this topic...

0xSero / glm-5.2-sm120

GLM-5.2-NVFP4-REAP-469B serving on SM120 (4× RTX PRO 6000 Blackwell) — one-command vLLM launch recipe, 250K context, DeepSeek Sparse Attention + MTP speculative decode

moe glm reap blackwell vllm llm-inference sm120 nvfp4 rtx-pro-6000

Updated Jun 19, 2026
Shell

From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-stream decode on the 5090 (beats llama.cpp, at-or-ahead of vLLM on NVFP4). 100% written by Claude Code.

Updated Jul 4, 2026
Cuda

casualcomputer / rtx_pro_6000_vs_dgx_spark

Star

Sglang LLM Inference: RTX Pro 6000 vs DGX Spark

dgx-spark rtx-pro-6000

Updated Oct 18, 2025

theogravity / dual-rtx-6000-blackwell-qwen3.6-27b-fp8

Sponsor

Star

Optimized vLLM setup for Qwen3.6-27B-FP8 on dual RTX PRO 6000 Blackwell (192 GB GDDR7, no NVLink) ; config, benchmark sweep results, and custom chat template with thinking mode off by default.

benchmark blackwell fp8 vllm local-llm llm-inference speculative-decoding qwen3 multi-token-prediction rtx-pro-6000

Updated May 10, 2026
Shell

jcartu / qwen36-27b-blackwell-inference-study

Star

Systematic 24-hour benchmark study of Qwen3.6-27B inference on dual NVIDIA RTX PRO 6000 Blackwell SM120 (TP=2). 8 experiments comparing repne/vllm fork vs upstream vLLM across FP8/BF16/NVFP4/Q8_0 quants and MTP/DFlash speculative decoding. Peak: 2,083 tok/s at c=32. Quality: KLD vs BF16 = 0.0018 (noise floor).

benchmark inference blackwell bf16 fp8 vllm qwen speculative-decoding qwen3 nvfp4 rtx-pro-6000

Updated Jun 3, 2026
Python

djeday123 / fa-blackwell-fp8

Star

Production-grade FlashAttention FP8 e4m3 forward kernel for NVIDIA Blackwell consumer GPUs (sm_120a, e.g. RTX PRO 6000). 647–652 TFLOPS at hd=128, sl=8192. Multi-kernel dispatcher, C library with Go and Python bindings

cuda transformer attention gpu-kernels tensor-cores blackwell fp8 flash-attention fp8e4m3 sm120 rtx-pro-6000 sm-120a

Updated Jun 12, 2026
Cuda

jcartu / qwen-bench

Star

Hub for ongoing Qwen inference benchmarks on NVIDIA Blackwell. Indexes all studies, hosts the rolling SOTA leaderboard, points to the toolchain.

benchmark leaderboard inference hub blackwell vllm qwen speculative-decoding qwen3 rtx-pro-6000

Updated May 15, 2026
Python

Genesis1231 / fish-audio-s2-vllm-rtx

Star

Fish Audio OpenAudio S2-Pro on vLLM-Omni. low-latency ~100ms TTFA, OpenAI-compatible, runs on NVIDIA Blackwell (RTX 5090 / RTX PRO 6000). Self-hosted streaming TTS & voice cloning.

text-to-speech streaming real-time cuda self-hosted tts speech-synthesis low-latency openaudio voice-cloning fastapi blackwell openai-api vllm fish-speech rtx-5090 rtx-pro-6000 elevenlabs-alternative

Updated Jun 27, 2026
Python

xuconz / vllm-pro6000-nvfp4-hybrid

Star

Drop-in hybrid patch for vLLM on RTX Pro 6000 Blackwell NVFP4. Marlin for decode + CUTLASS prefill shadow. Fixes W4A16 mislabel, 1.73× prefill gain

inference blackwell vllm local-ai nvfp4 rtx-pro-6000

Updated Jul 5, 2026
Python

chensongpoixs / QuantLoom

Star

QuantLoom·量梭的野心，从不只是在手机上弹出几条信号。这座织机真正要为你织出的终极产物，是 RTX Pro 6000 —— 黑曜神机的自由召唤权。它是躺在你机箱里的黑色方尖碑，数万核心如暗夜星海它是本地训推大模型、实时织造全市场量能全景图、回溯十年资金指纹的物质根基它过去只降落在超算中心、顶级量化基金和神秘矿场 QuantLoom 每织出一匹盈利的锦缎，都是在为这座黑色圣坛添一根金线。当金线积聚成缆，黑曜神机便会从虚空货架撕开一道裂缝，降临在你的阵中。从此，你拥有了一座个人算力神殿。

rtx-pro-6000

Updated May 20, 2026
Python

jcartu / qwen36-27b-blackwell-stress-validation

Star

Stress-validation of Qwen3.6-27B inference configurations on dual RTX PRO 6000 Blackwell. 5 configs x 4 phases (gates, throughput matrix, HumanEval, MBPP) = 2,105 hard coding problems, zero crashes. Headline: FP8+MTP=3 wins HumanEval (79.3%), BF16+DFlash wins MBPP (89.5%). MTP=5 dominated on correctness despite faster raw tok/s.

benchmark inference blackwell humaneval vllm qwen speculative-decoding qwen3 mbpp rtx-pro-6000

Updated May 7, 2026
Python

D4vidHuang / benchForge

Star

Forge a reproducible code/LLM eval leaderboard on idle TU Delft DAIC RTX Pro 6000 (Blackwell, sm_120) GPUs — sibling of preCal. Strict GPU-generate / CPU-score decouple, requeue-safe SLURM backfill, published to the HF Hub.

benchmarking leaderboard slurm code-generation reproducibility tu-delft huggingface vllm llm-evaluation rtx-pro-6000

Updated Jun 19, 2026
Python

S6966277 / glm-5.2-sm120

Star

Deploy the GLM-5.2-469B model on four RTX PRO 6000 Blackwell GPUs using a turnkey vLLM Docker configuration to enable high-speed sparse attention and inference.

android open-source benchmark ocr smartthings smartapp moe glm zwave gles2 reap vllm chatglm-6b long-horizon rtx-pro-6000

Updated Jul 5, 2026
Shell

Improve this page

Add a description, image, and links to the rtx-pro-6000 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rtx-pro-6000 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtx-pro-6000

Here are 13 public repositories matching this topic...

0xSero / glm-5.2-sm120

kekzl / imp

casualcomputer / rtx_pro_6000_vs_dgx_spark

theogravity / dual-rtx-6000-blackwell-qwen3.6-27b-fp8

jcartu / qwen36-27b-blackwell-inference-study

djeday123 / fa-blackwell-fp8

jcartu / qwen-bench

Genesis1231 / fish-audio-s2-vllm-rtx

xuconz / vllm-pro6000-nvfp4-hybrid

chensongpoixs / QuantLoom

jcartu / qwen36-27b-blackwell-stress-validation

D4vidHuang / benchForge

S6966277 / glm-5.2-sm120

Improve this page

Add this topic to your repo