Runpod Depth-Anything-V2 (Monocular Depth)

Serverless GPU monocular depth estimation on Runpod. Backed by the Depth-Anything-V2 family (and DPT as an alternative). Given a single image you get back any of:

depth - the raw depth map (compressed 16-bit PNG, with optional float32 .npz)
colorized - the depth map rendered with a matplotlib-style colormap
normals - surface normals derived from the depth gradient
disparity - reciprocal depth ($1/(d+\varepsilon)$), normalized
point_cloud - a top-down (x, depth) preview rendered to a PNG (no raw XYZ)

Features

Six supported models, from the very fast Depth-Anything-V2-Small to the high-accuracy Large, plus two metric variants (indoor + outdoor) and a DPT fallback.
1, 2, or N images per request via image_url, image_urls, image_b64, or a heterogeneous images: [{type, data}, ...] list.
Per-image errors so one bad URL never blocks a batch.
Pipeline LRU cache keyed by (model, device, dtype), so warm GPU containers serve subsequent requests instantly.
JSON-safe outputs (no raw numpy in the response unless you set include_raw: true).
Six colormaps: viridis, magma, inferno, plasma, turbo, gray.
Output formats: png, webp, jpg.
Automatic FP16 on GPU, FP32 on CPU. Override with dtype: "float16"|"float32".

Supported Models

`model`	Type	Size	Best for
`depth-anything/Depth-Anything-V2-Small-hf`	Relative	~25M	Fast preview, default
`depth-anything/Depth-Anything-V2-Base-hf`	Relative	~97M	Balanced quality
`depth-anything/Depth-Anything-V2-Large-hf`	Relative	~335M	Best generic depth
`depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf`	Metric	~335M	Real-world meters, indoor scenes
`depth-anything/Depth-Anything-V2-Metric-Outdoor-Large-hf`	Metric	~335M	Real-world meters, outdoor scenes
`Intel/dpt-large`	Relative	~344M	DPT alternative, well-tested

Metric models return depth in meters (or close to it) and metric: true is set on the response. Relative models return arbitrary-scale inverse-depth-like floats; min/max are reported on every response so you can rescale on the client.

Tasks

Task	Output key	Notes
`depth`	`depth_png_b64` (uint16 PNG) + optional `depth_npz_b64`	The raw map. Set `include_raw: true` for float32 `.npz`.
`colorized`	`colorized_b64` + `colormap`	Normalized depth -> RGB via the selected colormap.
`normals`	`normals_b64`	Surface normals from depth gradient, encoded as RGB.
`disparity`	`disparity_b64`	$1/(d+\varepsilon)$, clipped to the [1, 99] percentile and normalized to uint8.
`point_cloud`	`point_cloud_b64` + `point_cloud_grid`	Top-down (x, depth) projection rendered to a PNG preview.

Input Schema

Field	Type	Default	Description
`image_url`	string	-	Single image URL.
`image_urls`	string[]	-	Multiple image URLs.
`image_b64`	string	-	Single raw base64 (or data URI) image.
`images`	object[]	-	List of `{"type": "url"\|"b64", "data": "..."}`.
`model`	string	`Depth-Anything-V2-Small-hf` (or `$DEPTH_MODEL`)	One of the supported models.
`tasks`	string[]	`["depth"]`	Any subset of `depth`, `colorized`, `normals`, `disparity`, `point_cloud`.
`colormap`	string	`viridis`	One of `viridis`, `magma`, `inferno`, `plasma`, `turbo`, `gray`.
`normalize`	bool	`true`	Normalize depth to [0, 1] for visualization.
`output_format`	string	`png`	`png`, `webp`, or `jpg`.
`quality`	int	95	JPEG/WebP quality.
`max_size`	int	1024	Longest edge for inference (downscaled before forward pass).
`invert`	bool	`false`	Invert depth (near<->far swap before colorization).
`include_raw`	bool	`false`	Include float32 depth as `.npz` b64.
`point_cloud_grid`	int	512	Grid size for `point_cloud` preview.
`dtype`	string	`float16` on GPU, `float32` on CPU	`float16`, `bfloat16`, or `float32`.

Output Shape

{
  "results": [
    {
      "index": 0,
      "input": {"kind": "url", "url": "..."},
      "model": "depth-anything/Depth-Anything-V2-Small-hf",
      "metric": false,
      "width": 1024,
      "height": 768,
      "inference_size": [1024, 768],
      "depth_min": 0.124,
      "depth_max": 28.5,
      "depth_mean": 4.31,
      "tasks": ["depth", "colorized"],
      "depth_png_b64": "<uint16 PNG, base64>",
      "depth_png_mime": "image/png",
      "colorized_b64": "<PNG/WebP/JPG, base64>",
      "colorized_mime": "image/png",
      "colormap": "viridis",
      "elapsed_sec": 0.412
    }
  ],
  "model": "depth-anything/Depth-Anything-V2-Small-hf",
  "tasks": ["depth", "colorized"],
  "colormap": "viridis",
  "output_format": "png",
  "normalize": true,
  "invert": false,
  "include_raw": false,
  "max_size": 1024,
  "point_cloud_grid": 512,
  "device": "cuda",
  "cuda_available": true,
  "count": 1,
  "elapsed_sec": 0.418
}

Examples

1. Quick depth map (default, Small model)

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["depth"]
  }
}

2. Colorized depth with viridis

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["colorized"],
    "colormap": "viridis"
  }
}

3. Depth + colorized + normals + disparity together

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["depth", "colorized", "normals", "disparity"],
    "colormap": "turbo",
    "model": "depth-anything/Depth-Anything-V2-Large-hf"
  }
}

4. Metric depth (indoor) with raw float32 export

{
  "input": {
    "image_url": "https://example.com/room.jpg",
    "model": "depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf",
    "tasks": ["depth"],
    "include_raw": true,
    "normalize": false
  }
}

The returned depth_npz_b64 decodes to a np.savez_compressed archive with a single depth array (float32, in meters).

5. Batch via `images` mix of URL + base64

{
  "input": {
    "images": [
      {"type": "url", "data": "https://example.com/a.jpg"},
      {"type": "b64", "data": "iVBORw0KGgo..."}
    ],
    "tasks": ["colorized", "normals"],
    "colormap": "magma",
    "output_format": "webp"
  }
}

6. Point cloud preview only

{
  "input": {
    "image_url": "https://example.com/scene.jpg",
    "tasks": ["point_cloud"],
    "point_cloud_grid": 768
  }
}

The result's point_cloud_b64 is a 768x768 PNG showing each pixel projected onto an (x, depth) plane.

7. Inverted depth for a near=dark, far=bright look

{
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "tasks": ["colorized"],
    "colormap": "magma",
    "invert": true
  }
}

Performance Notes

The first call to a given (model, device, dtype) triplet pays a cold-start cost: HF weights are downloaded and a pipeline is constructed. Subsequent calls in the same container reuse the cached pipeline via _PIPELINE_CACHE.
max_size controls inference resolution. The depth map is upsampled back to the original image size for all output renderings, so increasing max_size trades latency for sharpness.
FP16 (dtype: "float16") on GPU is ~2x faster than FP32 with negligible quality impact on this family.
The point-cloud preview is intentionally rasterized (HxWx3 uint8) rather than returned as raw XYZ; full XYZ for a 1024x768 image would be ~9MB of JSON payload per output.
Metric models are roughly the size of the Large relative model. Use the Small relative model when you only need ordinal depth.

Colormap Reference

Name	Style	Good for
`viridis`	dark blue -> green -> yellow	Perceptually uniform, default
`magma`	black -> purple -> orange -> yellow	High-contrast scenes
`inferno`	black -> red -> orange -> yellow	Heat-map look
`plasma`	dark blue -> magenta -> orange	Vibrant alternative to viridis
`turbo`	blue -> green -> yellow -> red	Maximum perceptual range
`gray`	linear grayscale	Raw-looking, no false color

Environment Variables

Variable	Default	Purpose
`DEPTH_MODEL`	`depth-anything/Depth-Anything-V2-Small-hf`	Default model when `model` is not supplied.
`HF_HOME`	`/root/.cache/huggingface`	HuggingFace cache directory.
`PYTHONUNBUFFERED`	`1`	Stdout flushing.

Local Testing

python3 test_handler.py

The test harness injects light-weight mocks for torch, transformers, cv2, and matplotlib.cm, so no GPU and no model download is needed. The mock pipeline returns a radial-gradient grayscale "depth" image that the helper functions process end-to-end.

Expected output: ALL TESTS PASSED.

Building the container

docker build -t runpod-depth .

The Docker image is built on nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04, pre-installs PyTorch 2.3.1 + CUDA 12.1 wheels, then layers in transformers, opencv-python-headless, matplotlib, accelerate, runpod, and the rest of requirements.txt. The container runs python3 handler.py, which starts the Runpod serverless worker.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.runpod		.runpod
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
handler.py		handler.py
requirements.txt		requirements.txt
test_handler.py		test_handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Runpod Depth-Anything-V2 (Monocular Depth)

Features

Supported Models

Tasks

Input Schema

Output Shape

Examples

1. Quick depth map (default, Small model)

2. Colorized depth with viridis

3. Depth + colorized + normals + disparity together

4. Metric depth (indoor) with raw float32 export

5. Batch via `images` mix of URL + base64

6. Point cloud preview only

7. Inverted depth for a near=dark, far=bright look

Performance Notes

Colormap Reference

Environment Variables

Local Testing

Building the container

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Runpod Depth-Anything-V2 (Monocular Depth)

Features

Supported Models

Tasks

Input Schema

Output Shape

Examples

1. Quick depth map (default, Small model)

2. Colorized depth with viridis

3. Depth + colorized + normals + disparity together

4. Metric depth (indoor) with raw float32 export

5. Batch via images mix of URL + base64

6. Point cloud preview only

7. Inverted depth for a near=dark, far=bright look

Performance Notes

Colormap Reference

Environment Variables

Local Testing

Building the container

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5. Batch via `images` mix of URL + base64

Packages