Skip to content

Releases: robertziel/python_workflows_queue

v0.4.0 — ollama/vLLM backends + PAR-driven GPU two-lane concurrency

29 May 22:48

Choose a tag to compare

v0.4.0 — per-machine ollama/vLLM backends + PAR-driven GPU two-lane concurrency

  • LLM backend abstraction (ollama/vLLM) with idle supervisor + config-synced
    factory and capability advertisement (migrations 0013/0014).
  • PAR-driven GPU two-lane worker: inline diffusion (concurrency-1) alongside a
    PAR-sized VLM pool; fill-before-spill packing; total node-jobs capped at PAR.
  • Docs: ollama vs vLLM request-flow graphs + how a diffusion model shares the
    GPU; README Ansible deployment example.