Skip to content

dwin-gharibi/runpod-depth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Runpod Depth-Anything-V2 (Monocular Depth)

Serverless GPU monocular depth estimation on Runpod. Backed by the Depth-Anything-V2 family (and DPT as an alternative). Given a single image you get back any of:

  • depth - the raw depth map (compressed 16-bit PNG, with optional float32 .npz)
  • colorized - the depth map rendered with a matplotlib-style colormap
  • normals - surface normals derived from the depth gradient
  • disparity - reciprocal depth ($1/(d+\varepsilon)$), normalized
  • point_cloud - a top-down (x, depth) preview rendered to a PNG (no raw XYZ)

Features

  • Six supported models, from the very fast Depth-Anything-V2-Small to the high-accuracy Large, plus two metric variants (indoor + outdoor) and a DPT fallback.
  • 1, 2, or N images per request via image_url, image_urls, image_b64, or a heterogeneous images: [{type, data}, ...] list.
  • Per-image errors so one bad URL never blocks a batch.
  • Pipeline LRU cache keyed by (model, device, dtype), so warm GPU containers serve subsequent requests instantly.
  • JSON-safe outputs (no raw numpy in the response unless you set include_raw: true).
  • Six colormaps: viridis, magma, inferno, plasma, turbo, gray.
  • Output formats: png, webp, jpg.
  • Automatic FP16 on GPU, FP32 on CPU. Override with dtype: "float16"|"float32".

Supported Models

model Type Size Best for
depth-anything/Depth-Anything-V2-Small-hf Relative ~25M Fast preview, default
depth-anything/Depth-Anything-V2-Base-hf Relative ~97M Balanced quality
depth-anything/Depth-Anything-V2-Large-hf Relative ~335M Best generic depth
depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf Metric ~335M Real-world meters, indoor scenes
depth-anything/Depth-Anything-V2-Metric-Outdoor-Large-hf Metric ~335M Real-world meters, outdoor scenes
Intel/dpt-large Relative ~344M DPT alternative, well-tested

Metric models return depth in meters (or close to it) and metric: true is set on the response. Relative models return arbitrary-scale inverse-depth-like floats; min/max are reported on every response so you can rescale on the client.

Tasks

Task Output key Notes
depth depth_png_b64 (uint16 PNG) + optional depth_npz_b64 The raw map. Set include_raw: true for float32 .npz.
colorized colorized_b64 + colormap Normalized depth -> RGB via the selected colormap.
normals normals_b64 Surface normals from depth gradient, encoded as RGB.
disparity disparity_b64 $1/(d+\varepsilon)$, clipped to the [1, 99] percentile and normalized to uint8.
point_cloud point_cloud_b64 + point_cloud_grid Top-down (x, depth) projection rendered to a PNG preview.

Input Schema

Field Type Default Description
image_url string - Single image URL.
image_urls string[] - Multiple image URLs.
image_b64 string - Single raw base64 (or data URI) image.
images object[] - List of {"type": "url"|"b64", "data": "..."}.
model string Depth-Anything-V2-Small-hf (or $DEPTH_MODEL) One of the supported models.
tasks string[] ["depth"] Any subset of depth, colorized, normals, disparity, point_cloud.
colormap string viridis One of viridis, magma, inferno, plasma, turbo, gray.
normalize bool true Normalize depth to [0, 1] for visualization.
output_format string png png, webp, or jpg.
quality int 95 JPEG/WebP quality.
max_size int 1024 Longest edge for inference (downscaled before forward pass).
invert bool false Invert depth (near<->far swap before colorization).
include_raw bool false Include float32 depth as .npz b64.
point_cloud_grid int 512 Grid size for point_cloud preview.
dtype string float16 on GPU, float32 on CPU float16, bfloat16, or float32.

Output Shape

{
  "results": [
    {
      "index": 0,
      "input": {"kind": "url", "url": "..."},
      "model": "depth-anything/Depth-Anything-V2-Small-hf",
      "metric": false,
      "width": 1024,
      "height": 768,
      "inference_size": [1024, 768],
      "depth_min": 0.124,
      "depth_max": 28.5,
      "depth_mean": 4.31,
      "tasks": ["depth", "colorized"],
      "depth_png_b64": "<uint16 PNG, base64>",
      "depth_png_mime": "image/png",
      "colorized_b64": "<PNG/WebP/JPG, base64>",
      "colorized_mime": "image/png",
      "colormap": "viridis",
      "elapsed_sec": 0.412
    }
  ],
  "model": "depth-anything/Depth-Anything-V2-Small-hf",
  "tasks": ["depth", "colorized"],
  "colormap": "viridis",
  "output_format": "png",
  "normalize": true,
  "invert": false,
  "include_raw": false,
  "max_size": 1024,
  "point_cloud_grid": 512,
  "device": "cuda",
  "cuda_available": true,
  "count": 1,
  "elapsed_sec": 0.418
}

Examples

1. Quick depth map (default, Small model)

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["depth"]
  }
}

2. Colorized depth with viridis

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["colorized"],
    "colormap": "viridis"
  }
}

3. Depth + colorized + normals + disparity together

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["depth", "colorized", "normals", "disparity"],
    "colormap": "turbo",
    "model": "depth-anything/Depth-Anything-V2-Large-hf"
  }
}

4. Metric depth (indoor) with raw float32 export

{
  "input": {
    "image_url": "https://example.com/room.jpg",
    "model": "depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf",
    "tasks": ["depth"],
    "include_raw": true,
    "normalize": false
  }
}

The returned depth_npz_b64 decodes to a np.savez_compressed archive with a single depth array (float32, in meters).

5. Batch via images mix of URL + base64

{
  "input": {
    "images": [
      {"type": "url", "data": "https://example.com/a.jpg"},
      {"type": "b64", "data": "iVBORw0KGgo..."}
    ],
    "tasks": ["colorized", "normals"],
    "colormap": "magma",
    "output_format": "webp"
  }
}

6. Point cloud preview only

{
  "input": {
    "image_url": "https://example.com/scene.jpg",
    "tasks": ["point_cloud"],
    "point_cloud_grid": 768
  }
}

The result's point_cloud_b64 is a 768x768 PNG showing each pixel projected onto an (x, depth) plane.

7. Inverted depth for a near=dark, far=bright look

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["colorized"],
    "colormap": "magma",
    "invert": true
  }
}

Performance Notes

  • The first call to a given (model, device, dtype) triplet pays a cold-start cost: HF weights are downloaded and a pipeline is constructed. Subsequent calls in the same container reuse the cached pipeline via _PIPELINE_CACHE.
  • max_size controls inference resolution. The depth map is upsampled back to the original image size for all output renderings, so increasing max_size trades latency for sharpness.
  • FP16 (dtype: "float16") on GPU is ~2x faster than FP32 with negligible quality impact on this family.
  • The point-cloud preview is intentionally rasterized (HxWx3 uint8) rather than returned as raw XYZ; full XYZ for a 1024x768 image would be ~9MB of JSON payload per output.
  • Metric models are roughly the size of the Large relative model. Use the Small relative model when you only need ordinal depth.

Colormap Reference

Name Style Good for
viridis dark blue -> green -> yellow Perceptually uniform, default
magma black -> purple -> orange -> yellow High-contrast scenes
inferno black -> red -> orange -> yellow Heat-map look
plasma dark blue -> magenta -> orange Vibrant alternative to viridis
turbo blue -> green -> yellow -> red Maximum perceptual range
gray linear grayscale Raw-looking, no false color

Environment Variables

Variable Default Purpose
DEPTH_MODEL depth-anything/Depth-Anything-V2-Small-hf Default model when model is not supplied.
HF_HOME /root/.cache/huggingface HuggingFace cache directory.
PYTHONUNBUFFERED 1 Stdout flushing.

Local Testing

python3 test_handler.py

The test harness injects light-weight mocks for torch, transformers, cv2, and matplotlib.cm, so no GPU and no model download is needed. The mock pipeline returns a radial-gradient grayscale "depth" image that the helper functions process end-to-end.

Expected output: ALL TESTS PASSED.

Building the container

docker build -t runpod-depth .

The Docker image is built on nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04, pre-installs PyTorch 2.3.1 + CUDA 12.1 wheels, then layers in transformers, opencv-python-headless, matplotlib, accelerate, runpod, and the rest of requirements.txt. The container runs python3 handler.py, which starts the Runpod serverless worker.

About

Serverless GPU monocular depth estimation on Runpod. Backed by the Depth-Anything-V2 family (and DPT as an alternative)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors