DOA Policy by sarapapi · Pull Request #37 · hlt-mt/simulstream

sarapapi · 2026-05-28T12:43:57Z

Integration of the StreamAtt-style policy for SpeechLLMs (specifically, Phi4-Multimodal and Qwen3-Omni).

This implied modifying the bow prefix in the standard StreamAtt implementation to take a parametrized token (BOW_PREFIX, which is set to " " for recent tokenizers)

Co-authored-by: Marco Gaido <[email protected]>

sarapapi · 2026-06-18T13:42:06Z

@mgaido91 before merging it, I'd like to have a manual run first, so please let me know if there is something else about the PR still to be finalized or if I can proceed with the testing. Thanks!

mgaido91

LGTM, apart two minor comments

mgaido91 · 2026-07-03T10:16:41Z

+type: "simulstream.server.speech_processors.phi4multimodal_doa.Phi4MultimodalDOA"
+text_history:
+  type: "simulstream.server.speech_processors.base_streamatt.FixedWordsTextHistory"
+  history_words: __HISTORY__


let's avoid placeholders, let's have basic example configs that people can run seamlessly.

But this implies having many configs if we want to include the ones present in the paper. I can include one among them, if you prefer

yes. In the longer term, we might think adding READMEs as we had in the fbk-fairseq repo and put configs and descriptions there.

What about creating a folder named doa and putting all the configs there? And where do you want to have the README ideally? In a separate folder like /example in the main, or do we want to have it in the configs, under the new folder config/doa?

yes, I agree on both.

sarapapi added 30 commits March 24, 2026 13:05

DOA Policy first implementation

cafb609

Improve Phi4-Multimodal DOA description

4f9dbd1

Rename DOA Phi config

99282e2

Rename DOA Phi config

9dd7668

Correct audio subsampling factor

f841ce2

Fix load_model to class method

73fed6b

Fix lang init

c456013

Fix newer transformers compatibility

54752a9

Debug

48ab026

Remove end of sentence

03091ac

Disable eos

aa990ee

Revert

6dc2e4d

Try different EOS

042e682

Debug

2dcfdcd

Debug

46ad7f8

Debug

bc37b8d

Partial fix

b8a5e33

Increase stability

aa41e41

Add cross attention normalization

5d08e7d

revert speech chunk

d9def94

debug

87ee050

debug

69eb48f

debug

e6f936f

reduce hallucinations

616fcaa

reduce hallucinations

c629a76

Try fix

af083b7

Debug

54d137f

Remove unnecessary parameters

4d889e3

Fix stripping with alternative tokenizer and add debug

8fc0ee4

Choose a simple prompt for Phi4Multimodal

6a40aed

sarapapi added 8 commits May 24, 2026 14:36

Fix voxtral tokenizer

5386881

Fix voxtral prompt

278786f

Clean for release

d124275

Revert voxtral detokenizer

c6ab757

Fix Phi4-Multimodal linting

35e5809

Fix Qwen3-Omni linting and remove useless comments

d908dd8

Clean from useless comments

5fa2b29

Fix uts

e40fa38

sarapapi self-assigned this May 28, 2026

sarapapi added the enhancement New feature or request label May 28, 2026

sarapapi requested a review from mgaido91 May 28, 2026 13:04

mgaido91 reviewed Jun 9, 2026

View reviewed changes

sarapapi and others added 7 commits June 9, 2026 13:06

Update simulstream/server/speech_processors/qwenomni_doa.py

25e024b

Co-authored-by: Marco Gaido <[email protected]>

Update simulstream/server/speech_processors/phi4multimodal_doa.py

d2a556a

Co-authored-by: Marco Gaido <[email protected]>

Merge branch 'main' into doa_policy

babf61d

Partially address comments

426af02

Address comment about pycountry

cd5ecc4

Fix linting

26e06ae

refactor code to avoid duplicated code

76ba013

mgaido91 reviewed Jun 12, 2026

View reviewed changes

Comment thread simulstream/server/speech_processors/base_streamatt.py Outdated

Comment thread simulstream/server/speech_processors/qwenomni_doa.py Outdated

sarapapi added 6 commits June 18, 2026 12:22

Fix logging

dae38d8

Partially revert the change

a7ac4d4

Fix the description to go new line after 100 chars

1fad0d7

Address comments

909d36d

Fix Lint

987637b

Remove unused import

39d6d7b

mgaido91 reviewed Jul 3, 2026

View reviewed changes

Address comment

fcafb30

Uh oh!

Conversation

sarapapi commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarapapi commented Jun 18, 2026

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

mgaido91 Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

sarapapi Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

mgaido91 Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

sarapapi Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

mgaido91 Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sarapapi commented May 28, 2026 •

edited

Loading