A production-grade fleet of 35 RunPod serverless inference workers — image generation, video, audio, speech, vision, OCR, document AI, NLP, multimodal, and utility — vendored as git submodules.
Each worker is its own standalone repository, ships with a tested handler, a CUDA-or-CPU Dockerfile, pinned dependencies, and a RunPod Hub manifest with curated presets and marketplace integration tests.
- 35 workers across 6 categories, one git submodule each.
- Production-ready out of the box: every worker exposes
handler.py,test_handler.py,Dockerfile,requirements.txt, and.runpod/{hub.json, tests.json}. - CPU-mockable test suites: validate handler logic locally before paying for a GPU pod.
- RunPod Hub-ready: every worker registers with curated model presets and marketplace tests for one-click publishing.
- Conventional Commits + clean linear history in every submodule.
- Independent submodules: fork, customize, and deploy any single worker without touching the rest.
| Worker | Model(s) | Highlights |
|---|---|---|
| runpod-animatediff | AnimateDiff (SD1.5 + SDXL MotionAdapter) | motion-LoRA stacking, AnimateLCM fast sampling, mp4 / gif / frames output |
| runpod-controlnet | 11 ControlNets on SD1.5 + SDXL | canny, depth, openpose, scribble, softedge, mlsd, seg, lineart, normal, tile, multi-net stacking |
| runpod-flux | FLUX.1 schnell + FLUX.1 dev | txt2img, img2img, inpaint, Canny/Depth ControlNet, LoRA |
| runpod-sdxl | SDXL Base, Turbo, Lightning, Juggernaut XL, Playground V2 | txt2img, img2img, inpaint, refiner pass, LoRA |
| runpod-svd | Stable Video Diffusion + SVD-XT 1.1 | img2vid with motion_bucket_id, mp4 / gif / frames |
| Worker | Model(s) | Highlights |
|---|---|---|
| runpod-bark | Suno Bark (Full + Small) | en_speaker, fa_speaker, nonverbal markers, wav / mp3 / flac |
| runpod-deepfilternet | DeepFilterNet3 / DFN2 / DFN | denoise, dereverb, LUFS normalize, webrtcvad |
| runpod-demucs | Demucs v4 (htdemucs, mdx_extra) | 4-stem and 6-stem separation, vocals, karaoke |
| runpod-diarize | pyannote.audio 3.1 | speaker diarization, RTTM, whisper transcribe-align |
| runpod-musicgen | AudioCraft MusicGen + AudioGen | text-to-music, melody conditioning, audio continuation |
| runpod-tts | Coqui XTTS-v2 + Piper TTS | multilingual, voice cloning, SSML-lite |
| runpod-whisper | faster-whisper (tiny → large-v3) | VAD, word-level timestamps, SRT / VTT subtitles |
| Worker | Model(s) | Highlights |
|---|---|---|
| runpod-clip | OpenCLIP ViT-B / L / bigG + EVA02-L | embeddings, similarity, zero-shot, image search ranking |
| runpod-depth | Depth-Anything-V2 (Small / Base / Large + Metric) | depth, colormap, normals, disparity, point cloud |
| runpod-face | InsightFace buffalo_s/l + antelopev2 | detect, ArcFace 512-d embed, age/gender, align, match |
| runpod-image-tag | WD14 SwinV2 + EVA02 + ConvNeXt + ViT-L | dual-engine tagger (ONNX + HF) for tags, classify, SD-prompts |
| runpod-mediapipe | MediaPipe Tasks (CPU) | pose, hands, face mesh, segmentation, detection, gestures |
| runpod-realesrgan | Real-ESRGAN + GFPGAN | 2× / 4× super-resolution, tile mode, optional face restore |
| runpod-rembg | rembg (U2Net, IS-Net, SAM, BiRefNet) | cutout, mask, composite background |
| runpod-sam | SAM2 hiera (tiny / base / large) | bbox, point, auto-mask, optional GroundingDINO text mode |
| runpod-yolo | Ultralytics YOLOv8 / YOLOv11 | detect, segment, pose, OBB, classify |
| Worker | Model(s) | Highlights |
|---|---|---|
| runpod-donut | NAVER Donut (CORD, DocVQA, RVL-CDIP) | OCR-free document understanding via VisionEncoderDecoder |
| runpod-easyocr-main | EasyOCR + PyMuPDF + Tesseract | end-to-end OCR, tables, hOCR, searchable PDF, translate |
| runpod-marker | Marker + Surya OCR + Tabled | PDF → markdown / JSON / HTML with OCR fallback |
| runpod-paddleocr | PaddleOCR + PP-Structure | multilingual OCR, layout analysis, table recognition |
| runpod-pdf-extract | pymupdf + pdfplumber + pymupdf4llm | text, tables, images, links, page renders (CPU) |
| runpod-trocr | Microsoft TrOCR (printed + handwritten) | recognize, recognize_lines, batch, annotate |
| Worker | Model(s) | Highlights |
|---|---|---|
| runpod-llm | vLLM (Mistral 7B, Llama 3 8B, Qwen 2.5 7B, Phi-3 Mini 128k, Gemma 2 9B) | OpenAI-compatible chat / complete with outlines-guided JSON |
| runpod-ner | spaCy + HuggingFace pipelines | NER, sentiment, POS, language ID, tokenization |
| runpod-sbert | sentence-transformers (BGE, E5, MiniLM, mxbai) | embed, rerank, search, cluster |
| runpod-summarize | BART + Pegasus + LongT5 | chunked map-reduce with TextRank fallback |
| runpod-translate | NLLB-200, M2M-100, MarianMT | langid auto-detect across FLORES-200 codes |
| runpod-vlm | LLaVA, Qwen2-VL, MiniCPM-V | VQA, caption, grounding, OpenAI-style image messages |
| runpod-zero-shot | BART-MNLI, DeBERTa, mDeBERTa, XLM-R XNLI | NLI entailment, multi-label, multilingual |
| Worker | Tooling | Highlights |
|---|---|---|
| runpod-ffmpeg | ffmpeg + ffprobe (CPU) | 13 tasks: probe, metadata, thumbnail, gif, transcode, trim, concat, extract_audio, frames, scene_detect, waveform, spectrogram, subtitles_burn |
Clone the fleet with all 35 submodules in one shot:
git clone --recurse-submodules https://github.com/dwin-gharibi/runpod-serverless-workers
cd runpod-serverless-workersIf you cloned without --recurse-submodules:
git submodule update --init --recursiveEach submodule is a fully standalone worker — build and deploy one at a time:
cd runpod-whisper
docker build -t my-org/runpod-whisper:latest .
docker push my-org/runpod-whisper:latest
# then point a RunPod Serverless endpoint at my-org/runpod-whisper:latestOr publish to the RunPod Hub (uses .runpod/hub.json for presets and .runpod/tests.json for marketplace tests):
cd runpod-whisper
runpod releaserunpod-serverless-workers/
├── .gitmodules # 35 submodule entries → github.com/dwin-gharibi/runpod-*
├── README.md # this file
└── runpod-<name> # gitlink only — no working-tree copy in this index
Per worker (inside each submodule):
runpod-<name>/
├── handler.py # runpod.serverless entrypoint
├── test_handler.py # CPU-mockable test suite
├── Dockerfile # CUDA or CPU container
├── requirements.txt # pinned dependencies
├── README.md # API contract, parameters, examples
├── .gitignore
└── .runpod/
├── hub.json # curated model presets for the Hub
└── tests.json # marketplace integration tests
Bring every worker to its latest main and record the bump in the fleet:
git submodule update --remote --merge
git add .
git commit -m "chore(submodules): bump all workers to latest main"Or update a single worker:
git submodule update --remote runpod-whisper
git add runpod-whisper
git commit -m "chore(submodules): bump runpod-whisper to latest main"The fleet index is MIT-licensed. Each submodule carries its own license — check the README inside each one. Notable cases:
runpod-markeris GPL-3.0 (inherits frommarker-pdf).runpod-animatediffmotion modules ship under the guoyww research license; base diffusion checkpoints follow each HF model card.runpod-diarize(pyannote) requires a HuggingFace token and acceptance of the model gate.runpod-barkSuno weights are research/non-commercial — check Suno terms before commercial use.runpod-samSAM2 weights follow Meta's SAM2 license.
Built and maintained by Dwin Gharibi.