Skip to content

dwin-gharibi/runpod-serverless-workers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RunPod Serverless Workers

A production-grade fleet of 35 RunPod serverless inference workers — image generation, video, audio, speech, vision, OCR, document AI, NLP, multimodal, and utility — vendored as git submodules.

Each worker is its own standalone repository, ships with a tested handler, a CUDA-or-CPU Dockerfile, pinned dependencies, and a RunPod Hub manifest with curated presets and marketplace integration tests.


Highlights

  • 35 workers across 6 categories, one git submodule each.
  • Production-ready out of the box: every worker exposes handler.py, test_handler.py, Dockerfile, requirements.txt, and .runpod/{hub.json, tests.json}.
  • CPU-mockable test suites: validate handler logic locally before paying for a GPU pod.
  • RunPod Hub-ready: every worker registers with curated model presets and marketplace tests for one-click publishing.
  • Conventional Commits + clean linear history in every submodule.
  • Independent submodules: fork, customize, and deploy any single worker without touching the rest.

The 35 workers

Image and video generation

Worker Model(s) Highlights
runpod-animatediff AnimateDiff (SD1.5 + SDXL MotionAdapter) motion-LoRA stacking, AnimateLCM fast sampling, mp4 / gif / frames output
runpod-controlnet 11 ControlNets on SD1.5 + SDXL canny, depth, openpose, scribble, softedge, mlsd, seg, lineart, normal, tile, multi-net stacking
runpod-flux FLUX.1 schnell + FLUX.1 dev txt2img, img2img, inpaint, Canny/Depth ControlNet, LoRA
runpod-sdxl SDXL Base, Turbo, Lightning, Juggernaut XL, Playground V2 txt2img, img2img, inpaint, refiner pass, LoRA
runpod-svd Stable Video Diffusion + SVD-XT 1.1 img2vid with motion_bucket_id, mp4 / gif / frames

Audio

Worker Model(s) Highlights
runpod-bark Suno Bark (Full + Small) en_speaker, fa_speaker, nonverbal markers, wav / mp3 / flac
runpod-deepfilternet DeepFilterNet3 / DFN2 / DFN denoise, dereverb, LUFS normalize, webrtcvad
runpod-demucs Demucs v4 (htdemucs, mdx_extra) 4-stem and 6-stem separation, vocals, karaoke
runpod-diarize pyannote.audio 3.1 speaker diarization, RTTM, whisper transcribe-align
runpod-musicgen AudioCraft MusicGen + AudioGen text-to-music, melody conditioning, audio continuation
runpod-tts Coqui XTTS-v2 + Piper TTS multilingual, voice cloning, SSML-lite
runpod-whisper faster-whisper (tiny → large-v3) VAD, word-level timestamps, SRT / VTT subtitles

Vision

Worker Model(s) Highlights
runpod-clip OpenCLIP ViT-B / L / bigG + EVA02-L embeddings, similarity, zero-shot, image search ranking
runpod-depth Depth-Anything-V2 (Small / Base / Large + Metric) depth, colormap, normals, disparity, point cloud
runpod-face InsightFace buffalo_s/l + antelopev2 detect, ArcFace 512-d embed, age/gender, align, match
runpod-image-tag WD14 SwinV2 + EVA02 + ConvNeXt + ViT-L dual-engine tagger (ONNX + HF) for tags, classify, SD-prompts
runpod-mediapipe MediaPipe Tasks (CPU) pose, hands, face mesh, segmentation, detection, gestures
runpod-realesrgan Real-ESRGAN + GFPGAN 2× / 4× super-resolution, tile mode, optional face restore
runpod-rembg rembg (U2Net, IS-Net, SAM, BiRefNet) cutout, mask, composite background
runpod-sam SAM2 hiera (tiny / base / large) bbox, point, auto-mask, optional GroundingDINO text mode
runpod-yolo Ultralytics YOLOv8 / YOLOv11 detect, segment, pose, OBB, classify

Document AI and OCR

Worker Model(s) Highlights
runpod-donut NAVER Donut (CORD, DocVQA, RVL-CDIP) OCR-free document understanding via VisionEncoderDecoder
runpod-easyocr-main EasyOCR + PyMuPDF + Tesseract end-to-end OCR, tables, hOCR, searchable PDF, translate
runpod-marker Marker + Surya OCR + Tabled PDF → markdown / JSON / HTML with OCR fallback
runpod-paddleocr PaddleOCR + PP-Structure multilingual OCR, layout analysis, table recognition
runpod-pdf-extract pymupdf + pdfplumber + pymupdf4llm text, tables, images, links, page renders (CPU)
runpod-trocr Microsoft TrOCR (printed + handwritten) recognize, recognize_lines, batch, annotate

NLP and multimodal

Worker Model(s) Highlights
runpod-llm vLLM (Mistral 7B, Llama 3 8B, Qwen 2.5 7B, Phi-3 Mini 128k, Gemma 2 9B) OpenAI-compatible chat / complete with outlines-guided JSON
runpod-ner spaCy + HuggingFace pipelines NER, sentiment, POS, language ID, tokenization
runpod-sbert sentence-transformers (BGE, E5, MiniLM, mxbai) embed, rerank, search, cluster
runpod-summarize BART + Pegasus + LongT5 chunked map-reduce with TextRank fallback
runpod-translate NLLB-200, M2M-100, MarianMT langid auto-detect across FLORES-200 codes
runpod-vlm LLaVA, Qwen2-VL, MiniCPM-V VQA, caption, grounding, OpenAI-style image messages
runpod-zero-shot BART-MNLI, DeBERTa, mDeBERTa, XLM-R XNLI NLI entailment, multi-label, multilingual

Utility

Worker Tooling Highlights
runpod-ffmpeg ffmpeg + ffprobe (CPU) 13 tasks: probe, metadata, thumbnail, gif, transcode, trim, concat, extract_audio, frames, scene_detect, waveform, spectrogram, subtitles_burn

Quick start

Clone the fleet with all 35 submodules in one shot:

git clone --recurse-submodules https://github.com/dwin-gharibi/runpod-serverless-workers
cd runpod-serverless-workers

If you cloned without --recurse-submodules:

git submodule update --init --recursive

Each submodule is a fully standalone worker — build and deploy one at a time:

cd runpod-whisper
docker build -t my-org/runpod-whisper:latest .
docker push  my-org/runpod-whisper:latest
# then point a RunPod Serverless endpoint at my-org/runpod-whisper:latest

Or publish to the RunPod Hub (uses .runpod/hub.json for presets and .runpod/tests.json for marketplace tests):

cd runpod-whisper
runpod release

Repository layout

runpod-serverless-workers/
├── .gitmodules                 # 35 submodule entries → github.com/dwin-gharibi/runpod-*
├── README.md                   # this file
└── runpod-<name>               # gitlink only — no working-tree copy in this index

Per worker (inside each submodule):

runpod-<name>/
├── handler.py                  # runpod.serverless entrypoint
├── test_handler.py             # CPU-mockable test suite
├── Dockerfile                  # CUDA or CPU container
├── requirements.txt            # pinned dependencies
├── README.md                   # API contract, parameters, examples
├── .gitignore
└── .runpod/
    ├── hub.json                # curated model presets for the Hub
    └── tests.json              # marketplace integration tests

Updating submodules

Bring every worker to its latest main and record the bump in the fleet:

git submodule update --remote --merge
git add .
git commit -m "chore(submodules): bump all workers to latest main"

Or update a single worker:

git submodule update --remote runpod-whisper
git add runpod-whisper
git commit -m "chore(submodules): bump runpod-whisper to latest main"

License

The fleet index is MIT-licensed. Each submodule carries its own license — check the README inside each one. Notable cases:

  • runpod-marker is GPL-3.0 (inherits from marker-pdf).
  • runpod-animatediff motion modules ship under the guoyww research license; base diffusion checkpoints follow each HF model card.
  • runpod-diarize (pyannote) requires a HuggingFace token and acceptance of the model gate.
  • runpod-bark Suno weights are research/non-commercial — check Suno terms before commercial use.
  • runpod-sam SAM2 weights follow Meta's SAM2 license.

Author

Built and maintained by Dwin Gharibi.

About

A production-grade fleet of 35 RunPod serverless inference workers — image generation, video, audio, speech, vision, OCR, document AI, NLP, multimodal, and utility — vendored as git submodules. Each worker is its own standalone repository, ships with a tested handler, a CUDA-or-CPU Dockerfile, pinned dependencies, and a RunPod Hub manifest.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors