Skip to content

docs: add a centralized FlashDreams troubleshooting guide#286

Merged
liruilong940607 merged 1 commit into
NVIDIA:mainfrom
mvanhorn:fix/272-flashdreams-troubleshooting-guide
Jun 8, 2026
Merged

docs: add a centralized FlashDreams troubleshooting guide#286
liruilong940607 merged 1 commit into
NVIDIA:mainfrom
mvanhorn:fix/272-flashdreams-troubleshooting-guide

Conversation

@mvanhorn

@mvanhorn mvanhorn commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

FlashDreams users hitting a first-run failure now have one page to check instead of hunting across model pages and the install guide. This adds docs/source/troubleshooting.rst, a Sphinx page with one section per failure class (CUDA/PyTorch version mismatch, disk/cache exhaustion, model download/auth failures, OOM, WebRTC networking, Triton autotuning warnings, and --no-instantiate usage), each entry giving symptoms, likely cause, and a concrete fix or next diagnostic step.

Why this matters

Issue #272 (#272) notes FlashDreams has no dedicated troubleshooting page, so common first-run failures are undocumented or scattered. The page is wired into the top-level toctree in docs/source/index.rst and cross-referenced from docs/source/quickstart/installation.rst so it is discoverable from both the site index and the install flow. Content is sourced from already-documented behavior in models/omnidreams.rst, models/lingbot_world.rst, and quickstart/installation.rst; no unsupported knobs are invented.

Testing

Docs-only change covered by the doc.yml Sphinx build: the new page renders, the toctree has no orphan or duplicate warnings, and the cross-references resolve. The required SPDX header is present on the new file for reuse-lint. Full build runs in CI.

Fixes #272

@copy-pr-bot

copy-pr-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

Review with Codex GPT 5.5 for PR #286 -> Issue #272
Partially addresses it, but needs changes before closing #272.

Thanks for putting this together. The symptoms / likely cause / next step format is right shape for #272, and the coverage is broad enough to be useful for first-run failures.

Before we merge , could you add the remaining discoverability links requested in #272, to README and the relevant model pages? Also, the WebRTC entry should align with the LingBot media-path guidance: SSH -L is enough for the HTTP page/signaling path, but not necessarily for the WebRTC media path. Linking to the LingBot network section or using the same wording as there keeps the trouble-shooting guide consistent

@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a centralized docs/source/troubleshooting.rst page covering seven common first-run failure classes, wires it into the top-level toctree in index.rst, and adds a cross-reference from the installation guide.

  • New troubleshooting page (troubleshooting.rst): seven sections (CUDA mismatch, disk exhaustion, auth failures, OOM, WebRTC, Triton warmup, --no-instantiate) each with symptoms, cause, and a concrete fix sourced from existing model docs.
  • Navigation hooks: index.rst adds the page to the Getting-Started toctree; installation.rst adds an inline :doc:/troubleshooting cross-reference at the end of the Environment variables section.

Confidence Score: 3/5

Hold for the two defects in troubleshooting.rst before merging.

The new troubleshooting page has a too-short RST section underline on the --no-instantiate heading (52 dashes for a 53-character title) that will emit a docutils warning and can break a Sphinx build configured with -W. Separately, the WebRTC section sends users to lingbot_world.rst for browser-specific WebRTC settings that do not exist there — the Chrome/Brave/Firefox ICE flag instructions live only in omnidreams.rst. Both issues are in the new file and are straightforward to fix before merge.

docs/source/troubleshooting.rst — the --no-instantiate section underline and the WebRTC cross-reference both need correction.

Important Files Changed

Filename Overview
docs/source/troubleshooting.rst New 217-line troubleshooting guide; contains a too-short RST section underline (line 203) that can break the Sphinx build, and a cross-reference to lingbot_world.rst for browser-specific WebRTC settings that are not present in that file.
docs/source/index.rst Adds Troubleshooting entry to the Getting-Started toctree; path and placement are correct.
docs/source/quickstart/installation.rst Appends a :doc:/troubleshooting cross-reference sentence at the end of the Environment variables section; absolute path resolves correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([User hits a first-run failure]) --> B{Entry point}
    B -->|Site index| C[index.rst toctree\nGetting Started]
    B -->|Install flow| D[quickstart/installation.rst\nEnvironment variables]
    C -->|New toctree entry| E[troubleshooting.rst]
    D -->|New cross-reference| E
    E --> F[CUDA / PyTorch mismatch]
    E --> G[Disk / cache exhaustion]
    E --> H[Model download / auth failure]
    E --> I[GPU out of memory]
    E --> J[WebRTC connection]
    E --> K[Triton autotuning / warmup]
    E --> L[--no-instantiate usage]
    J -->|Browser ICE fix| M[models/omnidreams.rst]
    F -->|Non-perf fallback| M
    I -->|Efficient streaming preset| N[models/lingbot_world.rst]
Loading

Reviews (1): Last reviewed commit: "docs: add a centralized FlashDreams trou..." | Re-trigger Greptile

Comment on lines +202 to +204

``--no-instantiate`` is a diagnostic flag. It resolves and prints the runner
configuration, then returns before creating the runner or calling

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 RST section underline too short

The underline for the --no-instantiate section is 52 dashes, but the title text is 53 characters long. RST/docutils requires the adornment to be at least as long as the title. If the Sphinx build runs with -W (warnings-as-errors), this will fail CI; even without it, docutils emits a warning and the section may not render as a heading.

Comment thread docs/source/troubleshooting.rst
@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

/ok to test 2a126d1

@liruilong940607 liruilong940607 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM! We should probably move the #287 into the troubleshooting page as well but that can be addressed in a followup

@liruilong940607 liruilong940607 added this pull request to the merge queue Jun 8, 2026
Merged via the queue into NVIDIA:main with commit 0392d70 Jun 8, 2026
6 checks passed
@mvanhorn

Copy link
Copy Markdown
Contributor Author

Appreciate the merge, @liruilong940607. A centralized troubleshooting guide beats scattering fixes across issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add FlashDreams Troubleshooting Guide

3 participants