Releases · robertziel/python_workflows_queue

v0.4.0 — per-machine ollama/vLLM backends + PAR-driven GPU two-lane concurrency

LLM backend abstraction (ollama/vLLM) with idle supervisor + config-synced
factory and capability advertisement (migrations 0013/0014).
PAR-driven GPU two-lane worker: inline diffusion (concurrency-1) alongside a
PAR-sized VLM pool; fill-before-spill packing; total node-jobs capped at PAR.
Docs: ollama vs vLLM request-flow graphs + how a diffusion model shares the
GPU; README Ansible deployment example.

Provide feedback