feat(osmo): VS Code/Cursor dev workflow on NVIDIA OSMO by smash0190 · Pull Request #352 · castacks/AirStack

smash0190 · 2026-05-15T14:41:57Z

Summary

Adds a cloud-based development workflow for AirStack on NVIDIA OSMO, positioned as the recommended path for new contributors (no local GPU required, no local install of Isaac Sim / ROS / Docker).

What this adds

airstack osmo:* CLI (.airstack/modules/osmo.sh, dispatched by airstack.sh) — setup, up, ide, webrtc, foxglove, logs, exec, etc. Auto-pins --branch to local checkout, scrubs stale SSH host keys on each connect, gracefully handles port-forward / workflow-died errors.
Workspace image (osmo/workspace/) — runs inside the OSMO pod with an inner dockerd, sshd for Remote-SSH, Foxglove/WebRTC port plumbing, and the AirStack repo cloned on demand. Inner dockerd writes to /osmo/run/docker (ext4 emptyDir) so the native overlay2 driver is used instead of fuse-overlayfs — measured ~560 MiB/s peak Harbor pull / 1.4 GB/s sequential write inside the pod.
Workflow YAML (osmo/workflows/airstack-dev.yaml) with privileged-capable platform defaults.
Tutorial + admin docs (docs/tutorials/airstack_on_osmo.md, osmo/README.md) — single `git clone` + `airstack setup` + `airstack osmo:up` flow; documents Nucleus API-token auth, SSH agent forward for in-pod `git push`, and `docker buildx build --platform linux/amd64 --push` requirement for the workspace image.
mkdocs + getting-started updates so OSMO is the headline path on docs/getting_started/index.md.
Small Isaac Sim fixes: pin Kit livestream UDP media port to 49099 so the WebRTC stream actually renders pixels; example Pegasus launch script tweaks for headless/livestream consistency.
Foxglove extensions installer learns a `--local` mode for the in-pod GCS.

Why

Lets anyone with an OSMO account spin up a full AirStack dev environment in one command, attach VS Code/Cursor via Remote-SSH, and run Isaac Sim + the autonomy stack on shared cluster GPUs. Same image is usable on a developer's laptop and on the cluster.

Commits (oldest first)

`03cd145e` feat(osmo): VS Code/Cursor dev workflow on NVIDIA OSMO
`64a2cebf` harden CLI + workspace image against stale state, port-forward race, cursor-server install hangs
`769c4a2e` correct osmo:logs CLI invocation; install Foxglove extensions locally on osmo:foxglove
`2d9b1611` pin Kit livestream UDP media port to 49099
`7c95b4e1` render Kit GUI in WebRTC stream; document SSH agent forward
`02fcc196` make osmo:setup idempotent + paste-safe
`c7f89a86` use Nucleus API-token auth, with double-dollar to survive compose parser
`6f3a8e56` docs: make OSMO the recommended dev path, single clone-the-repo flow
`b61b18b4` make osmo:logs actually stream + survive pod host-key churn
`98b00ad9` auto-pin --branch to local checkout + clean error UX when workflow dies
`17ca30d1` bump inner dockerd concurrency to saturate 10 GbE pulls
`838ec7db` docs: require buildx --platform linux/amd64 for workspace image
`2dbfdf48` move dockerd data-root to /osmo/run for native overlay2

Pre-merge checklist for the author

Bump `.env` `VERSION=` (currently still `0.18.0-alpha.10` on this branch, same as `main`) — `check-version-increment.yml` will fail until this is done. Suggested: `0.18.0-alpha.11`. See `bump-version-and-release` skill.
Update `CHANGELOG.md` for the same version with the OSMO entry.
Confirm the workspace image was pushed with `docker buildx build --platform linux/amd64 --push` (Mac/ARM dev hosts otherwise produce `no match for platform in manifest` on cluster nodes).

Test plan

`airstack osmo:setup` from a fresh clone configures the user without prompts (idempotent).
`airstack osmo:up` submits a workflow, auto-pins to current branch, surfaces a clean error if the workflow dies before port-forward.
`airstack osmo:ide` opens VS Code/Cursor Remote-SSH into the workspace; stale `[localhost]:2200` known_hosts entry is scrubbed automatically.
`airstack osmo:webrtc` shows live Isaac Sim Kit GUI pixels (UDP 49099 reachable).
`airstack osmo:foxglove` connects to the in-pod GCS and renders custom extensions.
`airstack osmo:logs` streams stdout/stderr without buffering.
Inner `docker pull` from Harbor hits ~500 MiB/s peak, extracts via overlay2.

Made with Cursor

Adds a privileged Docker-in-Docker workspace task that lets a developer run the full AirStack docker-compose stack on OSMO and attach an IDE over SSH, with Isaac Sim WebRTC livestream + Foxglove websocket exposed via osmo port-forward. Components: - osmo/workspace/{Dockerfile,entrypoint.sh,sshd_config}: airstack-osmo-workspace image. Ubuntu 24.04 + sshd (pubkey-only) + Docker CE + Docker Compose + nvidia-container-toolkit + fuse-overlayfs (DinD-on-overlayfs needs it, otherwise dockerd falls back to vfs which bloats AirStack images ~10x). - osmo/workflows/airstack-dev.yaml: single privileged GPU task. Materializes Nucleus + airlab-docker secrets from OSMO credentials, clones AirStack, starts inner dockerd, runs `airstack up` with desktop + isaac-sim-livestream Compose profiles. - simulation/isaac-sim: isaac-sim-livestream Compose service that runs Pegasus standalone with --/app/livestream/enabled=true and exposes WebRTC port ranges 47995-48012 / 49000-49007 / 49100; launch script gates headless+livestream extension on ISAAC_SIM_LIVESTREAM env var. - .airstack/modules/osmo.sh: airstack osmo:{up,ide,foxglove,webrtc,logs,down} CLI wrappers around `osmo workflow submit` / `port-forward` / `cancel`. Persists the active workflow id and validates it's still running before each command (prevents the stale-state 410 error). - airstack.sh: bash 4+ re-exec bootstrap (macOS ships 3.2; the CLI uses `declare -A`). - osmo/README.md + docs/tutorials/airstack_on_osmo.md: admin pool setup (privileged_allowed) + per-user credentials (airlab-docker-login, airlab-nucleus) + student-facing IDE attach + WebRTC/Foxglove flow. Pool requirements: privileged_allowed: true, GPU pool with nvidia-container-toolkit on the host, ample node ephemeral storage (AirStack images extracted are ~50-100Gi via fuse-overlayfs; vfs needs ~500Gi+). Co-authored-by: Cursor <[email protected]>

…ward race, and cursor-server install hangs Four bugs that bit the first end-to-end runs (airstack-dev-10 → -13): - _osmo_wf_id: validate saved workflow id against `osmo workflow query` before returning. Without this, the state file at ~/.airstack/osmo-state outlives the workflow it points at and every subsequent osmo:webrtc / osmo:foxglove / osmo:ide call surfaces the same confusing "Workflow airstack-dev-N is not running! (status 410)" instead of the obvious "run airstack osmo:up to launch a fresh workflow". - cmd_osmo_up: `osmo workflow submit --set-env` is variadic. Passing two separate `--set-env A=1 --set-env B=2` silently drops the first one — this is what made airstack-dev-11 fail with "ERROR: SSH_PUB_KEY not set" when --branch was passed alongside the pubkey. Collapse the K=V pairs into a single --set-env. - cmd_osmo_ide: previously launched the IDE before starting the port-forward, so Cursor/VS Code would try to SSH localhost:2200 a few hundred ms before the tunnel listener existed and fail with "connect to host localhost port 2200: Connection refused". Now: detect an existing forward and reuse it (also avoids the "Address already in use" if osmo:foxglove was started in parallel), otherwise spawn the forward in the background, wait up to 30s for it to bind, then launch the IDE. Ctrl+C tears down the spawned forward cleanly via a trap. - workspace image / entrypoint: Cursor Remote-SSH hung indefinitely on airstack-dev-13 because (a) cursor-server's installer fell back to wget when curl timed out and wget was not in the image, and (b) a /tmp/cursor-remote-lock.* file left behind by the first crashed install blocked every silent retry. Add wget to the apt install list and rm -f the stale Cursor / VS Code remote lock files at the very top of entrypoint.sh so each fresh pod starts from a clean slate. Co-authored-by: Cursor <[email protected]>

…ons locally on osmo:foxglove osmo:logs was invoking `osmo workflow logs <id> workspace --follow`, but the real CLI takes the task via `-t TASK` (not positionally) and has no `--follow` flag at all — so the command failed immediately with "unrecognized arguments: workspace --follow". Replace with a polling loop that uses `-t workspace -n <N>` on a short interval, prints only the suffix that appeared since the previous fetch (find-the-last-seen-line trick; degrades to "reprint tail" with a warning if the cursor outruns -n), and exits cleanly once the workflow reaches a terminal state. Tunables: OSMO_LOGS_TASK / OSMO_LOGS_TAIL / OSMO_LOGS_INTERVAL. osmo:foxglove now installs the AirStack Foxglove extensions (robot-commands / waypoint-editor / polygon-editor) into the laptop's local Foxglove user-extensions directory before opening the port-forward. Without this, custom panels show up as "Unknown panel type: robot-commands.Robot Tasks" in the laptop's Foxglove Desktop because it has no way to discover the extension folders that live inside the GCS container. To avoid duplicating the install logic, the existing gcs/foxglove_extensions/install.py is refactored to read FOXGLOVE_EXT_SRC / FOXGLOVE_EXT_DST env vars (the in-container call already in gcs/docker/gcs-base-docker-compose.yaml keeps working unchanged via defaults). The wrapper sets those vars to ${PROJECT_ROOT}/gcs/foxglove_extensions and ~/.foxglove-studio/extensions respectively, overridable with OSMO_FOXGLOVE_EXT_DIR / skippable with OSMO_FOXGLOVE_SKIP_EXTENSIONS=1. Co-authored-by: Cursor <[email protected]>

…actually shows pixels Kit 107's WebRTC livestream picks a UDP media port dynamically. The documented `omni.services.livestream.nvcf` defaults (minHostPort=47998 maxHostPort=48020 fixedHostPort=0) are ignored by the stock standalone Kit binary — on airstack-dev-13 it bound to UDP 49042, outside both the Compose-published range AND the default `osmo:webrtc --udp` forward of `47995-48012,49000-49007`. Result: TCP signaling on 49100 worked, the WebRTC Streaming Client window opened, but every SRTP media packet was dropped → black viewport plus the recurring `NVST_CCE_DISCONNECTED when m_connectionCount 0 != 1` underflow in Kit's log. Pin the media port via three `app.livestream.*` settings set on `SimulationApp` before `omni.kit.livestream.webrtc` is enabled, so whichever code path the carb.livestream-rtc.plugin consults lands on the same port: app.livestream.fixedHostPort = 49099 app.livestream.minHostPort = 49099 app.livestream.maxHostPort = 49099 49099 is a deliberate one-off from the 49100 TCP signaling port — same neighborhood, easy to remember. Verified live on airstack-dev-13 after `docker compose up -d --force-recreate isaac-sim-livestream`: Kit binds UDP 49099 (`/proc/net/udp` hex BFCB on 0.0.0.0) and docker-proxy publishes it from the pod host network. Knock-on cleanups: - `simulation/isaac-sim/docker/docker-compose.yaml` shrinks the isaac-sim-livestream `ports:` from 27 forwarded ports (`47995-48012, 49000-49007 TCP+UDP, 49100 TCP`) to just two: `49100/tcp` + `49099/udp`. - `.airstack/modules/osmo.sh` shrinks `OSMO_WEBRTC_TCP` to `49100` and `OSMO_WEBRTC_UDP` to `49099`, so `airstack osmo:webrtc` spawns two port-forwards instead of thirty. - `.gitignore` ignores `.DS_Store` so working from a Mac doesn't leak Finder metadata. After pulling this commit into a running pod: `docker compose up -d --force-recreate isaac-sim-livestream` to apply the new port mapping; then re-run `airstack osmo:webrtc` on the laptop to pick up the new forward ranges. The standalone WebRTC Streaming Client connects to `localhost` (same address as before) and now actually receives frames. Co-authored-by: Cursor <[email protected]>

…d for in-pod git push Two paper-cuts that bit airstack-dev-13 after the WebRTC media port pin landed (commit 2d9b161): (1) The WebRTC stream showed only the bare 3D viewport — no menu bar, no toolbar, no panels, no console. Cause: SimulationApp's default when `headless=True` is to also hide the UI (`hide_ui=True`). The NVIDIA reference at `simulation/isaac-sim/standalone_examples/api/isaacsim.simulation_app/livestream.py` explicitly opts back into UI rendering plus picks explicit window sizing and `display_options=3286` to keep the default grid/axes visible. Mirror that config in `example_one_px4_pegasus_launch_script.py` when `ISAAC_SIM_LIVESTREAM=true` (local desktop dev keeps the minimal `headless=False` path unchanged). (2) The pod has no SSH private key, only an `authorized_keys` for inbound connections from the user's laptop. As a result, `git push` from inside the Cursor / VS Code Remote-SSH session inside the pod fails with "Permission denied (publickey)". sshd inside the workspace image already has `AllowAgentForwarding yes` baked in via `osmo/workspace/sshd_config`; the missing piece is purely on the Mac side. Update the `~/.ssh/config` block in the tutorial to include `ForwardAgent yes` (so the local agent's keys are exposed in the pod), `AddKeysToAgent yes` (auto-load on first push), and `UseKeychain yes` (macOS-only Keychain unlock without passphrase prompts; ignored on Linux). Adds an `ssh-add -l` smoke-test note. Co-authored-by: Cursor <[email protected]>

…auth-debug path osmo:setup hit two failure modes that wasted a debug session each: - `osmo credential set` is not an upsert for GENERIC creds — re-running setup (e.g. to rotate a Nucleus API token) failed with `400 duplicate key value violates unique constraint "credential_pkey"` and bailed before reaching the airlab-nucleus credential. Delete-then-set each credential so re-running is idempotent. - Bracket-paste mode and cross-OS clipboards routinely smuggle invisible bytes around long pastes. Nucleus's auth endpoint silently DENIES a token with one extra trailing byte, with no actionable error from the client side. _osmo_prompt now strips leading/trailing whitespace and CR/NUL bytes via a new _osmo_trim helper, and warns when bytes were stripped. cmd_osmo_setup additionally JWT-shape-checks the Nucleus token (must be eyJ.<dot>.<dot>.) before submitting it, so a wrong paste fails at setup time instead of silently DENIED at pod boot. Also documents how to debug the "Login Required: Unable to connect server omniverse://airlab-nucleus..." popup: SSH the Nucleus host and tail base_stack-nucleus-auth-1 for InternalCredentials.auth status: DENIED. Adds a "Nucleus connectivity from OSMO" section to the admin README clarifying that Nucleus over HTTPS uses a single 443 (no need to open the native 3009-3180 range from the OSMO cluster), per NVIDIA's TLS docs. Co-authored-by: Cursor <[email protected]>

…compose parser The OSMO entrypoint was writing OMNI_USER=<andrew_id> alongside an API token JWT in OMNI_PASS, which routes the JWT through the password- verification path. Nucleus silently DENIES — visible only in base_stack-nucleus-auth-1 as `InternalCredentials.auth … 'username': '<andrew>' … status: DENIED` (no Tokens.auth_with_api_token call). Kit then pops "Login Required: Unable to connect server omniverse://...". omniclient expects the literal sentinel username `$omni-api-token` paired with the JWT as the password. The entrypoint now detects a JWT-shaped OMNI_PASS (header starts with `eyJ`) and emits OMNI_USER=$$omni-api-token into omni_pass.env. The `$$` is intentional: docker-compose v2 interpolates env_file values, and a single `$` would be eaten by the parser (`OMNI_USER=$omni-api-token` becomes `OMNI_USER=-api-token` after ${omni}- expansion to empty). The container ultimately sees OMNI_USER=$omni-api-token, which is the correct sentinel. Also note for the next debugger: `docker compose restart` does NOT re-read env_file. Use `docker compose up -d <svc>` to recreate the container after editing omni_pass.env. Updates omni_pass_TEMPLATE.env header to document the API-token pattern explicitly (with the $$ caveat), and adds a troubleshooting row that distinguishes "wrong auth path" (DENIED with no Tokens.auth_with_api_token call) from "bad/expired token" (Tokens.auth_with_api_token: DENIED). Co-authored-by: Cursor <[email protected]>

… flow Reposition the OSMO tutorial as AirStack's recommended day-to-day development path (not just a fallback for laptops without GPUs) and collapse it onto a single recipe: clone the repo, then drive everything through the airstack osmo:* wrappers in .airstack/modules/osmo.sh. - docs/tutorials/airstack_on_osmo.md - Retitle + rewrite the intro to lead with five concrete advantages (pooled GPUs, no local CUDA/Docker/driver maintenance, same image as CI + field robots, one-command onboarding, hardware bigger than your laptop). Demote the Linux+GPU-desktop path to an escape hatch. - Drop the Mac/Windows/no-GPU framing in 'Who is this for?' and the mermaid laptop subgraph label. - Add 'a local clone of AirStack' to Prerequisites; remove it from the 'do not need' list. - Replace Option A/B credential split with a single ./airstack.sh osmo:setup recipe; move the three raw osmo credential set calls into a collapsible 'Under the hood' footnote. - Replace each step's raw osmo workflow ... command with the corresponding airstack osmo:up/logs/ide/webrtc/foxglove/down wrapper; preserve the raw form in 'Under the hood' footnotes that cross-link cmd_osmo_* in .airstack/modules/osmo.sh. - Drop the export WF=... paragraph — the wrappers read the id from ~/.airstack/osmo-state automatically; AIRSTACK_OSMO_WF overrides per-invocation. \$WF now only appears inside the raw-form footnotes. - Sweep Troubleshooting + What-survives tables: redirect raw port-forward fixes to the airstack osmo:* equivalents and rename the section to 'What survives airstack osmo:down?'. - Fix WebRTC edge label (49100/tcp + 49099/udp) to match the pinned ports the workflow actually uses today. Companion cleanups now that the privileged_allowed flip is automatic on the OSMO autosync side (synchronize_osmo_team_pools.py forces privileged_allowed: true on every platform of every pool, so students never see the 'platform does not have privileged flag enabled' error): - osmo/README.md: drop the 'Most common blocker' privileged warning, the privileged_allowed row from the pool-requirements table, and the 'privileged GPU pod' / '(privileged, GPU)' descriptors in the architecture summary. Simplify the validation-stage SSH-failure hint. - osmo/workflows/airstack-dev.yaml: trim the long DinD-requires-privileged comment to a one-liner (the privileged: true directive itself stays). - .airstack/modules/osmo.sh: remove the special-case 'privileged flag enabled' error branch in cmd_osmo_up — it should never fire now. Co-authored-by: Cursor <[email protected]>

osmo:logs was silent because cmd_osmo_logs wrapped osmo workflow logs in $( ... ) on the assumption that -n LAST_N_LINES exits after dumping the tail. Empirically the CLI keeps the stream open as new lines arrive (it already behaves like tail -f, despite --help advertising only -n), so command substitution waited forever and printed nothing. Drop the polling loop and just exec the command directly. Each fresh OSMO pod also ships a new sshd host key, so every osmo:up trips StrictHostKeyChecking against the previous workflow's fingerprint and SSH/Cursor abort with "Host key for [localhost]:2200 has changed". Switch the recommended ~/.ssh/config block (and osmo/README.md) to the ephemeral-host pattern (StrictHostKeyChecking no + UserKnownHostsFile /dev/null + LogLevel ERROR), and have cmd_osmo_ide ssh-keygen -R the stale loopback entry on every run so users on the old config get unblocked automatically. Co-authored-by: Cursor <[email protected]>

…workflow dies The pod's entrypoint clones AirStack fresh from GitHub on every workflow start (the pod fs is ephemeral). It defaulted to `main`, so any developer testing branch-only OSMO changes silently ran their pod against stale `main` code — most visibly: COMPOSE_PROFILES=desktop,isaac-sim-livestream resolved to "desktop" alone on `main` because the isaac-sim-livestream service only exists on the feature branch, so isaac-sim never came up and `airstack osmo:webrtc` showed a blank stream. - cmd_osmo_up now defaults --branch to the local repo's current branch (git rev-parse --abbrev-ref HEAD). Detached HEAD or non-git checkouts fall back to `main` cleanly. Pass --branch explicitly to override. - New _osmo_check_branch_pushed warns up-front when the about-to- submit branch has no upstream, is ahead of origin, or has an uncommitted working tree. The pod doesn't see your laptop's edits. Separately, when an OSMO workflow gets canceled mid-flight (osmo:down in another shell, or OSMO timing it out), the in-flight port-forward and logs streams raise OSMOUserError("Workflow X is not running!") from inside an asyncio Task. The CLI prints "Task exception was never retrieved" + a multi-line Traceback that buries the actual one-line cause. New _osmo_pf_filter awk script collapses that into a single [ERROR] line pointing at `airstack osmo:up`. Wired into webrtc, foxglove, and logs. webrtc also gains a cleanup trap that kills the backgrounded UDP port-forward on EXIT/INT/TERM so we don't leak it against a dead workflow. Tutorial Step 2 documents the new --branch default and the "pod-clones-from-GitHub-not-your-laptop" gotcha. Co-authored-by: Cursor <[email protected]>

dockerd's defaults of --max-concurrent-downloads=3 / --max-concurrent -uploads=5 cap a fresh airstack-dev pod's image-pull at ~300 MiB/s against the airlab-backup-10g registry — single-stream TLS tops out around 300-500 MiB/s per core, and three parallel streams of unevenly sized blobs serialize down to that ceiling. Ceph (1014 TiB, 92 OSDs, SSD pools) and 10 GbE both have far more headroom than that. Bump to 10/10 to overlap enough blob downloads to saturate the pipe. Threaded through the DOCKERD_MAX_DOWNLOADS / DOCKERD_MAX_UPLOADS env vars so a pool can be tuned at submit time without rebuilding the workspace image. Workspace image needs a rebuild + push for this to take effect: cd osmo/workspace docker build -t airlab-docker.andrew.cmu.edu/airstack/airstack-osmo-workspace:latest . docker push airlab-docker.andrew.cmu.edu/airstack/airstack-osmo-workspace:latest Co-authored-by: Cursor <[email protected]>

A plain `docker build && docker push` on an Apple Silicon Mac silently produces a linux/arm64-only `latest` manifest. OSMO workers are amd64, so every subsequent workflow fails at the outer pod-image pull with "no match for platform in manifest" before the entrypoint even runs — a confusing failure mode whose root cause lives entirely in the push, not in the workflow yaml or the entrypoint. Switch the README and the Dockerfile docstring to the buildx form, explain the why, and document the post-push manifest check. Co-authored-by: Cursor <[email protected]>

The OSMO pod's `/` is itself a containerd overlay snapshot, and Linux refuses to stack a second overlayfs on top of an overlay rootfs — which is why the inner dockerd was falling through to fuse-overlayfs. That costs a kernel↔userspace FUSE round-trip on every `creat()` during layer extraction, which murders throughput on apt/pip/ROS layers (measured: 32-50 MB/s for small-file-heavy layers vs 480 MB/s for big-file layers in the same pull). Pointing dockerd at /osmo/run/docker (the kubelet emptyDir backed by ext4 on /dev/vda3) lets the existing overlay2-first fallback chain actually succeed on its first try, restoring kernel-overlay extraction performance. emptyDir lifetime matches the workflow lifetime, so the docker layer cache gets the right scope automatically. Falls back to /var/lib/docker if /osmo/run isn't present so the image still works in non-OSMO test contexts. Co-authored-by: Cursor <[email protected]>

smash0190 and others added 13 commits May 14, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(osmo): VS Code/Cursor dev workflow on NVIDIA OSMO#352

feat(osmo): VS Code/Cursor dev workflow on NVIDIA OSMO#352
smash0190 wants to merge 13 commits into
mainfrom
feat/osmo-integration

smash0190 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smash0190 commented May 15, 2026

Summary

What this adds

Why

Commits (oldest first)

Pre-merge checklist for the author

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant