RunPod Serverless Workers

A production-grade fleet of 35 RunPod serverless inference workers — image generation, video, audio, speech, vision, OCR, document AI, NLP, multimodal, and utility — vendored as git submodules.

Each worker is its own standalone repository, ships with a tested handler, a CUDA-or-CPU Dockerfile, pinned dependencies, and a RunPod Hub manifest with curated presets and marketplace integration tests.

Highlights

35 workers across 6 categories, one git submodule each.
Production-ready out of the box: every worker exposes handler.py, test_handler.py, Dockerfile, requirements.txt, and .runpod/{hub.json, tests.json}.
CPU-mockable test suites: validate handler logic locally before paying for a GPU pod.
RunPod Hub-ready: every worker registers with curated model presets and marketplace tests for one-click publishing.
Conventional Commits + clean linear history in every submodule.
Independent submodules: fork, customize, and deploy any single worker without touching the rest.

The 35 workers

Image and video generation

Worker	Model(s)	Highlights
runpod-animatediff	AnimateDiff (SD1.5 + SDXL MotionAdapter)	motion-LoRA stacking, AnimateLCM fast sampling, mp4 / gif / frames output
runpod-controlnet	11 ControlNets on SD1.5 + SDXL	canny, depth, openpose, scribble, softedge, mlsd, seg, lineart, normal, tile, multi-net stacking
runpod-flux	FLUX.1 schnell + FLUX.1 dev	txt2img, img2img, inpaint, Canny/Depth ControlNet, LoRA
runpod-sdxl	SDXL Base, Turbo, Lightning, Juggernaut XL, Playground V2	txt2img, img2img, inpaint, refiner pass, LoRA
runpod-svd	Stable Video Diffusion + SVD-XT 1.1	img2vid with `motion_bucket_id`, mp4 / gif / frames

Audio

Worker	Model(s)	Highlights
runpod-bark	Suno Bark (Full + Small)	`en_speaker`, `fa_speaker`, nonverbal markers, wav / mp3 / flac
runpod-deepfilternet	DeepFilterNet3 / DFN2 / DFN	denoise, dereverb, LUFS normalize, webrtcvad
runpod-demucs	Demucs v4 (htdemucs, mdx_extra)	4-stem and 6-stem separation, vocals, karaoke
runpod-diarize	pyannote.audio 3.1	speaker diarization, RTTM, whisper transcribe-align
runpod-musicgen	AudioCraft MusicGen + AudioGen	text-to-music, melody conditioning, audio continuation
runpod-tts	Coqui XTTS-v2 + Piper TTS	multilingual, voice cloning, SSML-lite
runpod-whisper	faster-whisper (tiny → large-v3)	VAD, word-level timestamps, SRT / VTT subtitles

Vision

Worker	Model(s)	Highlights
runpod-clip	OpenCLIP ViT-B / L / bigG + EVA02-L	embeddings, similarity, zero-shot, image search ranking
runpod-depth	Depth-Anything-V2 (Small / Base / Large + Metric)	depth, colormap, normals, disparity, point cloud
runpod-face	InsightFace buffalo_s/l + antelopev2	detect, ArcFace 512-d embed, age/gender, align, match
runpod-image-tag	WD14 SwinV2 + EVA02 + ConvNeXt + ViT-L	dual-engine tagger (ONNX + HF) for tags, classify, SD-prompts
runpod-mediapipe	MediaPipe Tasks (CPU)	pose, hands, face mesh, segmentation, detection, gestures
runpod-realesrgan	Real-ESRGAN + GFPGAN	2× / 4× super-resolution, tile mode, optional face restore
runpod-rembg	rembg (U2Net, IS-Net, SAM, BiRefNet)	cutout, mask, composite background
runpod-sam	SAM2 hiera (tiny / base / large)	bbox, point, auto-mask, optional GroundingDINO text mode
runpod-yolo	Ultralytics YOLOv8 / YOLOv11	detect, segment, pose, OBB, classify

Document AI and OCR

Worker	Model(s)	Highlights
runpod-donut	NAVER Donut (CORD, DocVQA, RVL-CDIP)	OCR-free document understanding via VisionEncoderDecoder
runpod-easyocr-main	EasyOCR + PyMuPDF + Tesseract	end-to-end OCR, tables, hOCR, searchable PDF, translate
runpod-marker	Marker + Surya OCR + Tabled	PDF → markdown / JSON / HTML with OCR fallback
runpod-paddleocr	PaddleOCR + PP-Structure	multilingual OCR, layout analysis, table recognition
runpod-pdf-extract	pymupdf + pdfplumber + pymupdf4llm	text, tables, images, links, page renders (CPU)
runpod-trocr	Microsoft TrOCR (printed + handwritten)	recognize, recognize_lines, batch, annotate

NLP and multimodal

Worker	Model(s)	Highlights
runpod-llm	vLLM (Mistral 7B, Llama 3 8B, Qwen 2.5 7B, Phi-3 Mini 128k, Gemma 2 9B)	OpenAI-compatible chat / complete with outlines-guided JSON
runpod-ner	spaCy + HuggingFace pipelines	NER, sentiment, POS, language ID, tokenization
runpod-sbert	sentence-transformers (BGE, E5, MiniLM, mxbai)	embed, rerank, search, cluster
runpod-summarize	BART + Pegasus + LongT5	chunked map-reduce with TextRank fallback
runpod-translate	NLLB-200, M2M-100, MarianMT	langid auto-detect across FLORES-200 codes
runpod-vlm	LLaVA, Qwen2-VL, MiniCPM-V	VQA, caption, grounding, OpenAI-style image messages
runpod-zero-shot	BART-MNLI, DeBERTa, mDeBERTa, XLM-R XNLI	NLI entailment, multi-label, multilingual

Utility

Worker	Tooling	Highlights
runpod-ffmpeg	ffmpeg + ffprobe (CPU)	13 tasks: probe, metadata, thumbnail, gif, transcode, trim, concat, extract_audio, frames, scene_detect, waveform, spectrogram, subtitles_burn

Quick start

Clone the fleet with all 35 submodules in one shot:

git clone --recurse-submodules https://github.com/dwin-gharibi/runpod-serverless-workers
cd runpod-serverless-workers

If you cloned without --recurse-submodules:

git submodule update --init --recursive

Each submodule is a fully standalone worker — build and deploy one at a time:

cd runpod-whisper
docker build -t my-org/runpod-whisper:latest .
docker push  my-org/runpod-whisper:latest
# then point a RunPod Serverless endpoint at my-org/runpod-whisper:latest

Or publish to the RunPod Hub (uses .runpod/hub.json for presets and .runpod/tests.json for marketplace tests):

cd runpod-whisper
runpod release

Repository layout

runpod-serverless-workers/
├── .gitmodules                 # 35 submodule entries → github.com/dwin-gharibi/runpod-*
├── README.md                   # this file
└── runpod-<name>               # gitlink only — no working-tree copy in this index

Per worker (inside each submodule):

runpod-<name>/
├── handler.py                  # runpod.serverless entrypoint
├── test_handler.py             # CPU-mockable test suite
├── Dockerfile                  # CUDA or CPU container
├── requirements.txt            # pinned dependencies
├── README.md                   # API contract, parameters, examples
├── .gitignore
└── .runpod/
    ├── hub.json                # curated model presets for the Hub
    └── tests.json              # marketplace integration tests

Updating submodules

Bring every worker to its latest main and record the bump in the fleet:

git submodule update --remote --merge
git add .
git commit -m "chore(submodules): bump all workers to latest main"

Or update a single worker:

git submodule update --remote runpod-whisper
git add runpod-whisper
git commit -m "chore(submodules): bump runpod-whisper to latest main"

License

The fleet index is MIT-licensed. Each submodule carries its own license — check the README inside each one. Notable cases:

runpod-marker is GPL-3.0 (inherits from marker-pdf).
runpod-animatediff motion modules ship under the guoyww research license; base diffusion checkpoints follow each HF model card.
runpod-diarize (pyannote) requires a HuggingFace token and acceptance of the model gate.
runpod-bark Suno weights are research/non-commercial — check Suno terms before commercial use.
runpod-sam SAM2 weights follow Meta's SAM2 license.

Author

Built and maintained by Dwin Gharibi.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
runpod-animatediff @ 3f44162		runpod-animatediff @ 3f44162
runpod-bark @ b76b2a5		runpod-bark @ b76b2a5
runpod-clip @ 92b7279		runpod-clip @ 92b7279
runpod-controlnet @ 02ef330		runpod-controlnet @ 02ef330
runpod-deepfilternet @ 424151a		runpod-deepfilternet @ 424151a
runpod-demucs @ 983cb56		runpod-demucs @ 983cb56
runpod-depth @ 14ab170		runpod-depth @ 14ab170
runpod-diarize @ 01ba427		runpod-diarize @ 01ba427
runpod-donut @ 22c025b		runpod-donut @ 22c025b
runpod-easyocr-main @ e7f6953		runpod-easyocr-main @ e7f6953
runpod-face @ 8c32bee		runpod-face @ 8c32bee
runpod-ffmpeg @ 062e05e		runpod-ffmpeg @ 062e05e
runpod-flux @ 9a458aa		runpod-flux @ 9a458aa
runpod-image-tag @ 54def80		runpod-image-tag @ 54def80
runpod-llm @ 733329a		runpod-llm @ 733329a
runpod-marker @ d72abfb		runpod-marker @ d72abfb
runpod-mediapipe @ 859c82a		runpod-mediapipe @ 859c82a
runpod-musicgen @ 7fdd5fa		runpod-musicgen @ 7fdd5fa
runpod-ner @ b2c250b		runpod-ner @ b2c250b
runpod-paddleocr @ 948363f		runpod-paddleocr @ 948363f
runpod-pdf-extract @ 80e4f1a		runpod-pdf-extract @ 80e4f1a
runpod-realesrgan @ 1f1e3c1		runpod-realesrgan @ 1f1e3c1
runpod-rembg @ 9b1aca7		runpod-rembg @ 9b1aca7
runpod-sam @ 684cc27		runpod-sam @ 684cc27
runpod-sbert @ 0402ee0		runpod-sbert @ 0402ee0
runpod-sdxl @ b4b5cd3		runpod-sdxl @ b4b5cd3
runpod-summarize @ 79d0985		runpod-summarize @ 79d0985
runpod-svd @ 700f911		runpod-svd @ 700f911
runpod-translate @ 539f0a3		runpod-translate @ 539f0a3
runpod-trocr @ 9e99a1f		runpod-trocr @ 9e99a1f
runpod-tts @ 6a8ad6d		runpod-tts @ 6a8ad6d
runpod-vlm @ b75cf2f		runpod-vlm @ b75cf2f
runpod-whisper @ 243340c		runpod-whisper @ 243340c
runpod-yolo @ fb5112e		runpod-yolo @ fb5112e
runpod-zero-shot @ 832ee1e		runpod-zero-shot @ 832ee1e
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RunPod Serverless Workers

Highlights

The 35 workers

Image and video generation

Audio

Vision

Document AI and OCR

NLP and multimodal

Utility

Quick start

Repository layout

Updating submodules

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

RunPod Serverless Workers

Highlights

The 35 workers

Image and video generation

Audio

Vision

Document AI and OCR

NLP and multimodal

Utility

Quick start

Repository layout

Updating submodules

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages