Skip to content

feat: dedicated realtime-API protocol types for /v1/realtime#9205

Draft
GuanLuo wants to merge 7 commits intogluo/dis-1858-multimodal-streaming-input-1n-websocket-base-frontendfrom
gluo/dis-1928-multimodal-streaming-input-5n-openai-realtime-api-protocol
Draft

feat: dedicated realtime-API protocol types for /v1/realtime#9205
GuanLuo wants to merge 7 commits intogluo/dis-1858-multimodal-streaming-input-1n-websocket-base-frontendfrom
gluo/dis-1928-multimodal-streaming-input-5n-openai-realtime-api-protocol

Conversation

@GuanLuo
Copy link
Copy Markdown
Contributor

@GuanLuo GuanLuo commented May 6, 2026

Overview:

Slice 5/N of the bidirectional streaming-input feature — replaces the chat-completion placeholder shape that #9079 (1/N) shipped on /v1/realtime with the OpenAI Realtime API event surface. Clients now exchange real RealtimeClientEvent / RealtimeServerEvent JSON frames over the WebSocket, and the handler synthesizes a spec-compliant session.created server event on connect so a stock OpenAI Realtime client can complete its handshake.

The change is type-swap-shaped on purpose: engine-side semantics (VAD, audio buffering, transcription, response generation) belong to a downstream slice and are intentionally not in this PR. The mock EchoBidirectionalEngine is rewritten to demonstrate the round-trip without taking on those responsibilities.

async-openai 0.34's realtime-types feature already covers the full event surface (verified against RealtimeClientEvent / RealtimeServerEvent and confirmed against OpenAI's published event list). Per lib/protocols/CLAUDE.md's ownership rubric, no Dynamo-side overrides are needed today — the new dynamo_protocols::types::realtime module is a pure namespaced re-export.

Details:

Protocols crate:

  • lib/protocols/Cargo.toml+ "realtime-types" to the async-openai features list.
  • lib/protocols/src/types/realtime.rs (new) — glob re-export of async_openai::types::realtime::* behind a banner comment that names the ownership rubric.
  • lib/protocols/src/types/mod.rspub mod realtime;. Namespaced (not flattened into types::*) because realtime exposes Session, ContentPart, Response, etc. that overlap with chat / responses surface area.

LLM crate — typedef:

  • lib/llm/src/types.rs::generic::realtime::RealtimeBidirectionalEngine — repointed from BidirectionalStreamingEngine<NvCreateChatCompletionRequest, Annotated<NvCreateChatCompletionStreamResponse>> to BidirectionalStreamingEngine<RealtimeClientEvent, Annotated<RealtimeServerEvent>>. Drops the TODO (#9175) doc.

LLM crate — handler (/v1/realtime):

  • Inbound JSON deserialization swapped to RealtimeClientEvent; mpsc channel typed accordingly.
  • Merge-blocker resolution: synthesizes RealtimeServerEvent::SessionCreated { event_id: <uuid>, session: Session::RealtimeSession(Box::default()) } as the first wire frame on connect, before any engine event flows. The OpenAI spec mandates this; doing it handler-side (not engine-side) lets this slice ship without waiting on engine implementation.
  • Outbound peels the Annotated carrier off — clients receive bare RealtimeServerEvent JSON, not the Dynamo-internal {"data": {...}, "id": "1"} envelope (this was a wire-shape bug that the new integration tests caught against the type-swap commit).
  • Annotated::error is mapped to a synthesized RealtimeServerEvent::Error so engine errors stay visible on the wire as a spec-shaped error event.

LLM crate — mock engine:

  • EchoBidirectionalEngine rewritten over the new types:
    • RealtimeClientEvent::SessionUpdate(req)RealtimeServerEvent::SessionUpdated echoing req.session back.
    • All other variants → RealtimeServerEvent::Error with code = "echo_engine_unsupported" and a message naming the offending variant.
  • client_event_variant_name helper keeps the rejection-message wire-tag stable across upstream additions to RealtimeClientEvent.

Tests:

  • Two unit tests in engines::tests (session.update round-trip; unknown-event → error).
  • Four integration tests in lib/llm/tests/http_websocket.rs:
    • realtime_websocket_emits_session_created_on_connect — merge-blocker assertion.
    • realtime_websocket_session_update_echoes_session_updated — end-to-end round-trip.
    • realtime_websocket_emits_close_after_client_close — fixture body swapped from chat JSON to session.update.
    • realtime_websocket_rejects_binary_frame — drains the on-connect session.created so the binary-reject is what triggers the close.
  • The DYN_TOKEN_ECHO_DELAY_MS test setup is gone; the new echo engine doesn't sleep.

Server-frame coverage scope (intentional):

The OpenAI Realtime server contract emits ~43 frames across session, audio buffer, conversation, transcription, response, function-calling, MCP, and rate-limit categories. Only the connection-lifecycle group (session.created + session.updated + error) is in scope here. Everything else is engine-side and correctly deferred to the engine-implementation slice that comes after.

Where should the reviewer start?

Reading order, smallest mental model first:

  1. lib/llm/tests/http_websocket.rs — what does this PR enable? The four tests show the on-connect handshake, the round-trip, the close path, and the binary-reject path.
  2. lib/protocols/src/types/realtime.rs + lib/protocols/src/types/mod.rs — the new module. 10 lines; the question to satisfy yourself is should we own anything? Per lib/protocols/CLAUDE.md, the answer is no.
  3. lib/llm/src/types.rs — typedef rename. Zero behavior, but the name shift signals the rest of the diff's intent.
  4. lib/llm/src/engines.rsEchoBidirectionalEngine. Pins the engine contract the handler has to satisfy.
  5. lib/llm/src/http/service/realtime.rs — the bridge. Both the inbound deserialization swap and the on-connect session.created synthesis live here.

To verify locally:

cargo test -p dynamo-llm --test http_websocket
cargo test -p dynamo-llm --lib engines::tests
cargo check --workspace --all-targets
cargo clippy --no-deps --all-targets -- -D warnings

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Internal Linear tickets DIS-1928 (this) and DIS-1858 (base) mirror the above GitHub issues (cross-linked per `.ai/linear-ticket-refs.md`).

🤖 Generated with Claude Code

GuanLuo and others added 7 commits May 6, 2026 00:56
Pulls in OpenAI Realtime client/server event types so dynamo-protocols
can re-export them in a follow-up commit. No code changes yet — this is
purely the Cargo feature toggle.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Adds `dynamo_protocols::types::realtime` namespacing the upstream
`async-openai::types::realtime` surface (~50 events). Pure re-export per
`lib/protocols/CLAUDE.md`'s ownership rubric — no Dynamo overrides today,
since upstream covers `RealtimeClientEvent` / `RealtimeServerEvent` in
full and no client has been observed sending a shape upstream rejects.

Namespaced (`pub mod realtime`) rather than flattened into `types::*` to
avoid collisions with `types::chat::*` and `types::responses::*` that
share names like `Session`, `Response`, `ContentPart`.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…types

Swaps the placeholder `BidirectionalStreamingEngine<NvCreateChatCompletionRequest,
Annotated<NvCreateChatCompletionStreamResponse>>` introduced in DIS-1858 for
the dedicated `BidirectionalStreamingEngine<RealtimeClientEvent,
Annotated<RealtimeServerEvent>>` shape that the `/v1/realtime` endpoint is
supposed to speak. Drops the `TODO (#9175)` doc comment.

The typedef alias is currently only referenced by name in this file —
`EchoBidirectionalEngine` and the handler use the underlying types directly
— so this swap compiles cleanly in isolation. Subsequent commits update those
sites to consume the realtime types, at which point the alias will start
carrying its weight.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Atomically swaps both ends of the /v1/realtime pipe to speak OpenAI
Realtime events instead of the chat-completion placeholder shape. Engine
+ handler must move together because the engine signature
(ManyIn<RealtimeClientEvent> -> ManyOut<Annotated<RealtimeServerEvent>>)
is what the handler builds the input stream against.

Engine (`EchoBidirectionalEngine`):
- Echoes `session.update` -> `session.updated` carrying the same
  `Session` payload, so the merge-blocker handshake roundtrip is
  observable end-to-end.
- Returns a single `error` server event with
  `code = "echo_engine_unsupported"` for every other client variant.
- `client_event_variant_name` helper keeps the rejection message stable
  across upstream additions to `RealtimeClientEvent`.

Handler (`/v1/realtime`):
- Synthesizes a spec-compliant `session.created` server frame as the
  first wire event on connect, before any engine event flows. Required
  by OpenAI Realtime so a stock client can complete its handshake; done
  handler-side (not engine-side) so this slice can land without engine
  work.
- Switches the inbound channel + JSON deserialization from
  `NvCreateChatCompletionRequest` to `RealtimeClientEvent`.
- Drops the `TODO (#9175)` placeholder note now that the realtime
  protocol type is in place.

Unit tests (under `engines::tests`):
- `echo_bidirectional_session_update_roundtrip`
- `echo_bidirectional_unknown_event_emits_error`

The chat-completion-shaped per-char-echo test is removed; the new
realtime semantics are simpler and don't need the
`DYN_TOKEN_ECHO_DELAY_MS` dance.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The outbound task was serializing the entire `Annotated<RealtimeServerEvent>`
into the WebSocket text frame, so clients saw `{"data": {"type": ...}, "id": "1"}`
instead of the bare event the OpenAI Realtime spec describes (`{"type": ...}`).
The integration tests added in the next commit surfaced this on real wire
shape; fix is to write the inner event directly.

Engine errors carried via `Annotated::error` are now mapped to a synthesized
`RealtimeServerEvent::Error` so they remain visible on the wire as a spec-shaped
`error` event rather than disappearing along with the envelope.

Frames with neither `data` nor `error` (heartbeat/comment/event-only Annotated)
are skipped — they have no realtime-spec representation.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…tEvent

Replaces the chat-completion-shaped fixtures with realtime-event ones and
adds the merge-blocker assertion that `session.created` is emitted as the
first wire frame on connect.

Coverage:
- realtime_websocket_emits_session_created_on_connect — handshake (new test
  covering the merge-blocker promise from context.md).
- realtime_websocket_session_update_echoes_session_updated — round-trips a
  client `session.update` through the engine and asserts the server replies
  with `session.updated` carrying the same `Session` payload.
- realtime_websocket_emits_close_after_client_close — unchanged behavior;
  fixture body swapped from chat-completion JSON to a `session.update`.
- realtime_websocket_rejects_binary_frame — unchanged behavior; just drains
  the on-connect `session.created` first so the binary-reject path is what
  triggers the close, not the on-connect frame.

The shared `expect_text_event` helper drives all four tests off a stable
`{"type": "<event-type>"}` shape now that the handler peels Annotated off
on outbound (preceding commit). `DYN_TOKEN_ECHO_DELAY_MS` setup is gone
since the new echo engine doesn't sleep.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Use `Box::default()` per clippy::box_default in the on-connect
  `session.created` synthesis.
- Drop the now-unused `RealtimeSession` import (the type is inferred from
  the `Session::RealtimeSession` variant signature).
- `cargo fmt` collapsed the `EchoBidirectionalEngine` impl header onto a
  single line and tightened the test helper signature.

No behavioral change.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@github-actions github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` and removed feat labels May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant