Skip to content

Add source role for audio input clients#105

Open
maximmaxim345 wants to merge 17 commits into
mainfrom
feat/add-source-role
Open

Add source role for audio input clients#105
maximmaxim345 wants to merge 17 commits into
mainfrom
feat/add-source-role

Conversation

@maximmaxim345

@maximmaxim345 maximmaxim345 commented Jun 29, 2026

Copy link
Copy Markdown
Member

Adds the source@v1 role that allows a client to capture local audio (line-in, turntable, Bluetooth receiver, microphone) and stream it to the server, while the server stays the single place that resamples, transcodes, mixes, buffers, and distributes to players (including back to the source itself when it is also a player, so it stays in sync).

Heavily based on #52 by @rudyberends but simplified to the core. This PR specifically removes:

  • the activation model (input type, activation kind)
  • level reporting (kept signal line-sensing)
  • the client/command source notifications
  • select_source controller integration and source listing in server/state
  • server-side virtual source clients

Also rebases it onto the encryption work (#84), and now requires source clients to be paired since an unpaired source could otherwise inject malformed audio into the server.

maximmaxim345 and others added 15 commits June 29, 2026 12:53
Drop the controller sources list (deferred to a future media/inputs
role), remove the VAD hint from server/command (local configuration,
out of protocol scope), and drop the 'unknown' signal value.
A source can power-manage its own upstream path off the start/stop
ingest signal, so server-driven activate/deactivate is redundant.
Controlling upstream playback (and reporting its state back) is out of
scope for a capture-only source@v1. Defer to a later role or version.
A source may start streaming on local signal detection without waiting
for the server's start command, removing the signal-to-start round trip
for cases like a turntable starting playback.
Fold the standalone Sources concept section into the Source messages
intro, matching every other role. Drop the controls field left in the
client/hello example and split two stacked Note lines.
Spell out what 'started' and 'stopped' mean and how the server uses
them, replacing the implementation-defined note.
Capitalize normative MUST/SHOULD/MAY/REQUIRED in the added source role
text, preserving each statement's strength.
Drop the level feature, input_stream/request-format, the client/command source object, and the per-message JSON examples to match the other role sections.
Reject chunks by input-stream framing (input_stream/start..input_stream/end) rather than by streaming state. Routing a validly-framed stream stays a server policy decision.
The input-stream framing (input_stream/start..input_stream/end) is already
the authoritative lifecycle the server tracks, so a reported idle/streaming
state only duplicated it. Signal presence stays, since it is not derivable
from the framing.
Drop source-initiated start: the server is the sole initiator via
command start/stop, which removes the start/resume ambiguity and keeps
captured audio off the wire until requested. Give signal an explicit
advisory role in that decision.
Comment thread README.md

A device that implements both `source` and `player` MUST NOT play its captured input locally. Like every player, it outputs only the stream the server distributes, so its output stays in sync with the rest of the group.

**Note:** Source timestamps are derived from the client's clock offset, which the time filter keeps re-estimating, so they may show discontinuities or drift (e.g., ADC clock variance). Server implementations SHOULD NOT assume perfectly continuous timestamps; the audio sample stream itself SHOULD remain continuous.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should describe how the server should behave too. Like prescribing the use of ASRC and maybe how to handle network jitter as well. The protocol and client implementation stay the same though so we could solve this in a later follow up PR.

Comment thread README.md
A device MAY implement both the `source` and `player` roles (e.g., a speaker with a local AUX input forwarded into Sendspin).

**Note:** The `source` role (capturing input *into* Sendspin) is distinct from the client-level [`state: 'external_source'`](#external-source-handling), which marks a client whose *output* has been taken over by a non-Sendspin system.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could technically rename either external_source or source if this is a problem.
IMO it's fine calling both source.

Comment thread README.md

The default after the handshake is `stop`: a source MUST NOT stream until the server sends `command: "start"`. The server is the only party that initiates streaming.

A source that supports line sensing reports `signal` in [`client/state`](#client--server-clientstate). The server MAY use it as a hint for when to send `command: "start"` or `command: "stop"`, but the decision is server policy.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I think it's good to require the server to switch to the input, but I can see a couple cases where this might become annoying. Especially if it's a combined source+player that is already grouped with other players. Or if the connected input is very noisy.

Comment thread README.md

A device that implements both `source` and `player` MUST NOT play its captured input locally. Like every player, it outputs only the stream the server distributes, so its output stays in sync with the rest of the group.

**Note:** Source timestamps are derived from the client's clock offset, which the time filter keeps re-estimating, so they may show discontinuities or drift (e.g., ADC clock variance). Server implementations SHOULD NOT assume perfectly continuous timestamps; the audio sample stream itself SHOULD remain continuous.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR merged:

We should also gate so clients can only send audio chunks after it's clock is synchronized.

@maximmaxim345 maximmaxim345 changed the base branch from ap/encryption to main June 29, 2026 17:05
@maximmaxim345 maximmaxim345 marked this pull request as ready for review June 29, 2026 17:09
Comment thread README.md Outdated
@kahrendt

Copy link
Copy Markdown
Contributor

LGTM! Your comments can all be addressed in follow-up clarifications.

Comment thread README.md Outdated

@kahrendt kahrendt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the message rename!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants