DOA Policy#37
Conversation
Co-authored-by: Marco Gaido <[email protected]>
Co-authored-by: Marco Gaido <[email protected]>
|
@mgaido91 before merging it, I'd like to have a manual run first, so please let me know if there is something else about the PR still to be finalized or if I can proceed with the testing. Thanks! |
mgaido91
left a comment
There was a problem hiding this comment.
LGTM, apart two minor comments
| type: "simulstream.server.speech_processors.phi4multimodal_doa.Phi4MultimodalDOA" | ||
| text_history: | ||
| type: "simulstream.server.speech_processors.base_streamatt.FixedWordsTextHistory" | ||
| history_words: __HISTORY__ |
There was a problem hiding this comment.
let's avoid placeholders, let's have basic example configs that people can run seamlessly.
There was a problem hiding this comment.
But this implies having many configs if we want to include the ones present in the paper. I can include one among them, if you prefer
There was a problem hiding this comment.
yes. In the longer term, we might think adding READMEs as we had in the fbk-fairseq repo and put configs and descriptions there.
There was a problem hiding this comment.
What about creating a folder named doa and putting all the configs there? And where do you want to have the README ideally? In a separate folder like /example in the main, or do we want to have it in the configs, under the new folder config/doa?
Integration of the StreamAtt-style policy for SpeechLLMs (specifically, Phi4-Multimodal and Qwen3-Omni).
This implied modifying the bow prefix in the standard StreamAtt implementation to take a parametrized token (BOW_PREFIX, which is set to " " for recent tokenizers)