feat: dedicated realtime-API protocol types for /v1/realtime#9205
Draft
Conversation
Pulls in OpenAI Realtime client/server event types so dynamo-protocols can re-export them in a follow-up commit. No code changes yet — this is purely the Cargo feature toggle. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Adds `dynamo_protocols::types::realtime` namespacing the upstream `async-openai::types::realtime` surface (~50 events). Pure re-export per `lib/protocols/CLAUDE.md`'s ownership rubric — no Dynamo overrides today, since upstream covers `RealtimeClientEvent` / `RealtimeServerEvent` in full and no client has been observed sending a shape upstream rejects. Namespaced (`pub mod realtime`) rather than flattened into `types::*` to avoid collisions with `types::chat::*` and `types::responses::*` that share names like `Session`, `Response`, `ContentPart`. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…types Swaps the placeholder `BidirectionalStreamingEngine<NvCreateChatCompletionRequest, Annotated<NvCreateChatCompletionStreamResponse>>` introduced in DIS-1858 for the dedicated `BidirectionalStreamingEngine<RealtimeClientEvent, Annotated<RealtimeServerEvent>>` shape that the `/v1/realtime` endpoint is supposed to speak. Drops the `TODO (#9175)` doc comment. The typedef alias is currently only referenced by name in this file — `EchoBidirectionalEngine` and the handler use the underlying types directly — so this swap compiles cleanly in isolation. Subsequent commits update those sites to consume the realtime types, at which point the alias will start carrying its weight. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Atomically swaps both ends of the /v1/realtime pipe to speak OpenAI Realtime events instead of the chat-completion placeholder shape. Engine + handler must move together because the engine signature (ManyIn<RealtimeClientEvent> -> ManyOut<Annotated<RealtimeServerEvent>>) is what the handler builds the input stream against. Engine (`EchoBidirectionalEngine`): - Echoes `session.update` -> `session.updated` carrying the same `Session` payload, so the merge-blocker handshake roundtrip is observable end-to-end. - Returns a single `error` server event with `code = "echo_engine_unsupported"` for every other client variant. - `client_event_variant_name` helper keeps the rejection message stable across upstream additions to `RealtimeClientEvent`. Handler (`/v1/realtime`): - Synthesizes a spec-compliant `session.created` server frame as the first wire event on connect, before any engine event flows. Required by OpenAI Realtime so a stock client can complete its handshake; done handler-side (not engine-side) so this slice can land without engine work. - Switches the inbound channel + JSON deserialization from `NvCreateChatCompletionRequest` to `RealtimeClientEvent`. - Drops the `TODO (#9175)` placeholder note now that the realtime protocol type is in place. Unit tests (under `engines::tests`): - `echo_bidirectional_session_update_roundtrip` - `echo_bidirectional_unknown_event_emits_error` The chat-completion-shaped per-char-echo test is removed; the new realtime semantics are simpler and don't need the `DYN_TOKEN_ECHO_DELAY_MS` dance. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The outbound task was serializing the entire `Annotated<RealtimeServerEvent>`
into the WebSocket text frame, so clients saw `{"data": {"type": ...}, "id": "1"}`
instead of the bare event the OpenAI Realtime spec describes (`{"type": ...}`).
The integration tests added in the next commit surfaced this on real wire
shape; fix is to write the inner event directly.
Engine errors carried via `Annotated::error` are now mapped to a synthesized
`RealtimeServerEvent::Error` so they remain visible on the wire as a spec-shaped
`error` event rather than disappearing along with the envelope.
Frames with neither `data` nor `error` (heartbeat/comment/event-only Annotated)
are skipped — they have no realtime-spec representation.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…tEvent
Replaces the chat-completion-shaped fixtures with realtime-event ones and
adds the merge-blocker assertion that `session.created` is emitted as the
first wire frame on connect.
Coverage:
- realtime_websocket_emits_session_created_on_connect — handshake (new test
covering the merge-blocker promise from context.md).
- realtime_websocket_session_update_echoes_session_updated — round-trips a
client `session.update` through the engine and asserts the server replies
with `session.updated` carrying the same `Session` payload.
- realtime_websocket_emits_close_after_client_close — unchanged behavior;
fixture body swapped from chat-completion JSON to a `session.update`.
- realtime_websocket_rejects_binary_frame — unchanged behavior; just drains
the on-connect `session.created` first so the binary-reject path is what
triggers the close, not the on-connect frame.
The shared `expect_text_event` helper drives all four tests off a stable
`{"type": "<event-type>"}` shape now that the handler peels Annotated off
on outbound (preceding commit). `DYN_TOKEN_ECHO_DELAY_MS` setup is gone
since the new echo engine doesn't sleep.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Use `Box::default()` per clippy::box_default in the on-connect `session.created` synthesis. - Drop the now-unused `RealtimeSession` import (the type is inferred from the `Session::RealtimeSession` variant signature). - `cargo fmt` collapsed the `EchoBidirectionalEngine` impl header onto a single line and tightened the test helper signature. No behavioral change. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview:
Slice 5/N of the bidirectional streaming-input feature — replaces the chat-completion placeholder shape that #9079 (1/N) shipped on
/v1/realtimewith the OpenAI Realtime API event surface. Clients now exchange realRealtimeClientEvent/RealtimeServerEventJSON frames over the WebSocket, and the handler synthesizes a spec-compliantsession.createdserver event on connect so a stock OpenAI Realtime client can complete its handshake.The change is type-swap-shaped on purpose: engine-side semantics (VAD, audio buffering, transcription, response generation) belong to a downstream slice and are intentionally not in this PR. The mock
EchoBidirectionalEngineis rewritten to demonstrate the round-trip without taking on those responsibilities.async-openai0.34'srealtime-typesfeature already covers the full event surface (verified againstRealtimeClientEvent/RealtimeServerEventand confirmed against OpenAI's published event list). Perlib/protocols/CLAUDE.md's ownership rubric, no Dynamo-side overrides are needed today — the newdynamo_protocols::types::realtimemodule is a pure namespaced re-export.Details:
Protocols crate:
lib/protocols/Cargo.toml—+ "realtime-types"to theasync-openaifeatures list.lib/protocols/src/types/realtime.rs(new) — glob re-export ofasync_openai::types::realtime::*behind a banner comment that names the ownership rubric.lib/protocols/src/types/mod.rs—pub mod realtime;. Namespaced (not flattened intotypes::*) because realtime exposesSession,ContentPart,Response, etc. that overlap with chat / responses surface area.LLM crate — typedef:
lib/llm/src/types.rs::generic::realtime::RealtimeBidirectionalEngine— repointed fromBidirectionalStreamingEngine<NvCreateChatCompletionRequest, Annotated<NvCreateChatCompletionStreamResponse>>toBidirectionalStreamingEngine<RealtimeClientEvent, Annotated<RealtimeServerEvent>>. Drops theTODO (#9175)doc.LLM crate — handler (
/v1/realtime):RealtimeClientEvent; mpsc channel typed accordingly.RealtimeServerEvent::SessionCreated { event_id: <uuid>, session: Session::RealtimeSession(Box::default()) }as the first wire frame on connect, before any engine event flows. The OpenAI spec mandates this; doing it handler-side (not engine-side) lets this slice ship without waiting on engine implementation.Annotatedcarrier off — clients receive bareRealtimeServerEventJSON, not the Dynamo-internal{"data": {...}, "id": "1"}envelope (this was a wire-shape bug that the new integration tests caught against the type-swap commit).Annotated::erroris mapped to a synthesizedRealtimeServerEvent::Errorso engine errors stay visible on the wire as a spec-shapederrorevent.LLM crate — mock engine:
EchoBidirectionalEnginerewritten over the new types:RealtimeClientEvent::SessionUpdate(req)→RealtimeServerEvent::SessionUpdatedechoingreq.sessionback.RealtimeServerEvent::Errorwithcode = "echo_engine_unsupported"and a message naming the offending variant.client_event_variant_namehelper keeps the rejection-message wire-tag stable across upstream additions toRealtimeClientEvent.Tests:
engines::tests(session.updateround-trip; unknown-event → error).lib/llm/tests/http_websocket.rs:realtime_websocket_emits_session_created_on_connect— merge-blocker assertion.realtime_websocket_session_update_echoes_session_updated— end-to-end round-trip.realtime_websocket_emits_close_after_client_close— fixture body swapped from chat JSON tosession.update.realtime_websocket_rejects_binary_frame— drains the on-connectsession.createdso the binary-reject is what triggers the close.DYN_TOKEN_ECHO_DELAY_MStest setup is gone; the new echo engine doesn't sleep.Server-frame coverage scope (intentional):
The OpenAI Realtime server contract emits ~43 frames across session, audio buffer, conversation, transcription, response, function-calling, MCP, and rate-limit categories. Only the connection-lifecycle group (
session.created+session.updated+error) is in scope here. Everything else is engine-side and correctly deferred to the engine-implementation slice that comes after.Where should the reviewer start?
Reading order, smallest mental model first:
lib/llm/tests/http_websocket.rs— what does this PR enable? The four tests show the on-connect handshake, the round-trip, the close path, and the binary-reject path.lib/protocols/src/types/realtime.rs+lib/protocols/src/types/mod.rs— the new module. 10 lines; the question to satisfy yourself is should we own anything? Perlib/protocols/CLAUDE.md, the answer is no.lib/llm/src/types.rs— typedef rename. Zero behavior, but the name shift signals the rest of the diff's intent.lib/llm/src/engines.rs—EchoBidirectionalEngine. Pins the engine contract the handler has to satisfy.lib/llm/src/http/service/realtime.rs— the bridge. Both the inbound deserialization swap and the on-connectsession.createdsynthesis live here.To verify locally:
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
mainonce feat: bidirectional streaming-input WebSocket frontend #9079 squash-merges.ModelManageraccessor for bidirectional engines; orthogonal to this PR.Internal Linear tickets DIS-1928 (this) and DIS-1858 (base) mirror the above GitHub issues (cross-linked per `.ai/linear-ticket-refs.md`).
🤖 Generated with Claude Code