Releases: robertziel/python_workflows_queue
Releases · robertziel/python_workflows_queue
v0.4.0 — ollama/vLLM backends + PAR-driven GPU two-lane concurrency
v0.4.0 — per-machine ollama/vLLM backends + PAR-driven GPU two-lane concurrency
- LLM backend abstraction (ollama/vLLM) with idle supervisor + config-synced
factory and capability advertisement (migrations 0013/0014). - PAR-driven GPU two-lane worker: inline diffusion (concurrency-1) alongside a
PAR-sized VLM pool; fill-before-spill packing; total node-jobs capped at PAR. - Docs: ollama vs vLLM request-flow graphs + how a diffusion model shares the
GPU; README Ansible deployment example.