Jenkins and plugins versions report
Not applicable - this is a backend Python bug not tied to Jenkins version.
What Operating System are you using (both controller, and any agents involved in the problem)?
Ubuntu 22.04 (also reproducible on any OS running the chatbot backend)
Reproduction steps
1.start the chatbot backend server
2.open two WebSocket connections to /sessions/{session_id}/stream
3.send a user message on WebSocket connection 1- this triggers get_chatbot_reply_stream() in chat_service.py
4.immediately send a message on WebSocket connection 2
5.observe that connection 2 is blocked/frozen until connection 1's retrieve_context() call finishes
Expected Results
retrieve_context() should be offloaded from the event loop using asyncio.to_thread(),so concurrent WebSocket connections are not blocked during RAG retrieval.
Actual Results
retrieve_context(user_input) runs synchronously on the event loop, starving all other async coroutines (WebSocket handlers, API endpoints) until the FAISS search and embedding inference complete.with typical retrieval times of 100-500ms+,this causes noticeable freezes for all concurrent users
Anything else?
this is the same class of bug that was fixed for get_session() (which now has a get_session_async() wrapper using asyncio.to_thread).the pattern is already established in the codebase,it just wasn't applied to retrieve_context()
Are you interested in contributing a fix?
Yes...
Jenkins and plugins versions report
Not applicable - this is a backend Python bug not tied to Jenkins version.
What Operating System are you using (both controller, and any agents involved in the problem)?
Ubuntu 22.04 (also reproducible on any OS running the chatbot backend)
Reproduction steps
1.start the chatbot backend server
2.open two WebSocket connections to
/sessions/{session_id}/stream3.send a user message on WebSocket connection 1- this triggers get_chatbot_reply_stream() in chat_service.py
4.immediately send a message on WebSocket connection 2
5.observe that connection 2 is blocked/frozen until connection 1's retrieve_context() call finishes
Expected Results
retrieve_context() should be offloaded from the event loop using asyncio.to_thread(),so concurrent WebSocket connections are not blocked during RAG retrieval.
Actual Results
retrieve_context(user_input) runs synchronously on the event loop, starving all other async coroutines (WebSocket handlers, API endpoints) until the FAISS search and embedding inference complete.with typical retrieval times of 100-500ms+,this causes noticeable freezes for all concurrent users
Anything else?
this is the same class of bug that was fixed for get_session() (which now has a get_session_async() wrapper using asyncio.to_thread).the pattern is already established in the codebase,it just wasn't applied to retrieve_context()
Are you interested in contributing a fix?
Yes...