Add source role for audio input clients#105
Conversation
Co-authored-by: Rudy <[email protected]>
Drop the controller sources list (deferred to a future media/inputs role), remove the VAD hint from server/command (local configuration, out of protocol scope), and drop the 'unknown' signal value.
A source can power-manage its own upstream path off the start/stop ingest signal, so server-driven activate/deactivate is redundant.
Controlling upstream playback (and reporting its state back) is out of scope for a capture-only source@v1. Defer to a later role or version.
A source may start streaming on local signal detection without waiting for the server's start command, removing the signal-to-start round trip for cases like a turntable starting playback.
Fold the standalone Sources concept section into the Source messages intro, matching every other role. Drop the controls field left in the client/hello example and split two stacked Note lines.
Spell out what 'started' and 'stopped' mean and how the server uses them, replacing the implementation-defined note.
Capitalize normative MUST/SHOULD/MAY/REQUIRED in the added source role text, preserving each statement's strength.
Drop the level feature, input_stream/request-format, the client/command source object, and the per-message JSON examples to match the other role sections.
Reject chunks by input-stream framing (input_stream/start..input_stream/end) rather than by streaming state. Routing a validly-framed stream stays a server policy decision.
The input-stream framing (input_stream/start..input_stream/end) is already the authoritative lifecycle the server tracks, so a reported idle/streaming state only duplicated it. Signal presence stays, since it is not derivable from the framing.
Drop source-initiated start: the server is the sole initiator via command start/stop, which removes the start/resume ambiguity and keeps captured audio off the wire until requested. Give signal an explicit advisory role in that decision.
|
|
||
| A device that implements both `source` and `player` MUST NOT play its captured input locally. Like every player, it outputs only the stream the server distributes, so its output stays in sync with the rest of the group. | ||
|
|
||
| **Note:** Source timestamps are derived from the client's clock offset, which the time filter keeps re-estimating, so they may show discontinuities or drift (e.g., ADC clock variance). Server implementations SHOULD NOT assume perfectly continuous timestamps; the audio sample stream itself SHOULD remain continuous. |
There was a problem hiding this comment.
We should describe how the server should behave too. Like prescribing the use of ASRC and maybe how to handle network jitter as well. The protocol and client implementation stay the same though so we could solve this in a later follow up PR.
| A device MAY implement both the `source` and `player` roles (e.g., a speaker with a local AUX input forwarded into Sendspin). | ||
|
|
||
| **Note:** The `source` role (capturing input *into* Sendspin) is distinct from the client-level [`state: 'external_source'`](#external-source-handling), which marks a client whose *output* has been taken over by a non-Sendspin system. | ||
|
|
There was a problem hiding this comment.
We could technically rename either external_source or source if this is a problem.
IMO it's fine calling both source.
|
|
||
| The default after the handshake is `stop`: a source MUST NOT stream until the server sends `command: "start"`. The server is the only party that initiates streaming. | ||
|
|
||
| A source that supports line sensing reports `signal` in [`client/state`](#client--server-clientstate). The server MAY use it as a hint for when to send `command: "start"` or `command: "stop"`, but the decision is server policy. |
There was a problem hiding this comment.
Not sure, I think it's good to require the server to switch to the input, but I can see a couple cases where this might become annoying. Especially if it's a combined source+player that is already grouped with other players. Or if the connected input is very noisy.
|
|
||
| A device that implements both `source` and `player` MUST NOT play its captured input locally. Like every player, it outputs only the stream the server distributes, so its output stays in sync with the rest of the group. | ||
|
|
||
| **Note:** Source timestamps are derived from the client's clock offset, which the time filter keeps re-estimating, so they may show discontinuities or drift (e.g., ADC clock variance). Server implementations SHOULD NOT assume perfectly continuous timestamps; the audio sample stream itself SHOULD remain continuous. |
There was a problem hiding this comment.
With this PR merged:
We should also gate so clients can only send audio chunks after it's clock is synchronized.
|
LGTM! Your comments can all be addressed in follow-up clarifications. |
kahrendt
left a comment
There was a problem hiding this comment.
Thanks for the message rename!
Adds the
source@v1role that allows a client to capture local audio (line-in, turntable, Bluetooth receiver, microphone) and stream it to the server, while the server stays the single place that resamples, transcodes, mixes, buffers, and distributes to players (including back to the source itself when it is also a player, so it stays in sync).Heavily based on #52 by @rudyberends but simplified to the core. This PR specifically removes:
levelreporting (keptsignalline-sensing)client/commandsource notificationsselect_sourcecontroller integration and source listing inserver/stateAlso rebases it onto the encryption work (#84), and now requires source clients to be paired since an unpaired source could otherwise inject malformed audio into the server.