PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W Harley, Leonidas Guibas, Cewu Lu

PACE (Pose Annotations in Cluttered Environments) is a large-scale benchmark designed to advance pose estimation in challenging, cluttered scenarios. PACE provides comprehensive real-world and simulated datasets for both instance-level and category-level tasks, featuring:

55K frames with 258K annotations across 300 videos
238 objects from 43 categories (rigid and articulated)
An innovative annotation system using a calibrated 3-camera setup
PACESim: 100K photo-realistic simulated frames with 2.4M annotations across 931 objects

We evaluate state-of-the-art algorithms on PACE for both pose estimation and object pose tracking, highlighting the benchmark's challenges and research opportunities.

Why a New Dataset?

PACE rigorously tests the generalization of state-of-the-art methods in complex, real-world environments, enabling exploration and quantification of the 'simulation-to-reality' gap for practical applications.

🔥News

Try our latest pose estimator CPPF++ (TPAMI), which achieves state-of-the-art performance on PACE.

Update Log

2024/07/22: PACE v1.1 uploaded to HuggingFace. Benchmark evaluation code released.
2024/03/01: PACE v1.0 released.

Dataset Download

Download the dataset from HuggingFace. Unzip all tar.gz files and place them under dataset/pace for evaluation. Large files are split into chunks; merge them with, e.g., cat test_chunk_* > test.tar.gz. We also provide a convenient download script at download_pace.ipynb.

Dataset Format

PACE follows the BOP format with the following structure (regex syntax):

camera_pbr.json
models(_eval|_nocs)?
├─ models_info.json
├─ (artic_info.json)?
├─ obj_${OBJ_ID}.ply
model_splits
├─ category
|  ├─ ${category}_(train|val|test).txt
|  ├─ (train|val|test).txt
├─ instance
|  ├─ (train|val|test).txt
(train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test)
├─ ${SCENE_ID}
│  ├─ scene_camera.json
│  ├─ scene_gt.json
│  ├─ scene_gt_info.json
│  ├─ scene_gt_coco_det_modal(_partcat|_inst)?.json
│  ├─ depth
│  ├─ mask
│  ├─ mask_visib
│  ├─ rgb
|  ├─ (rgb_nocs)?

Key components:

camera_pbr.json: Camera parameters for PBR rendering; real camera parameters are in each scene's scene_camera.json.
models(_eval|_nocs)?: 3D object models. models contains original scanned meshes; models_eval has uniformly sampled point clouds for evaluation (e.g., Chamfer distance); all models (except articulated parts, ID 545–692) are recentered and normalized to a unit bounding box. models_nocs recolors vertices by NOCS coordinates.
- models_info.json: Mesh metadata (diameter, bounds, scales in mm), and mapping from obj_id to object identifier. Articulated objects have multiple parts, each with a unique obj_id; associations are in artic_info.json.
- artic_info.json: Part information for articulated objects, keyed by identifier.
- obj_${OBJ_ID}.ply: Mesh file for object ${OBJ_ID}.
model_splits: Model IDs for train/val/test splits. Instance-level splits share IDs; category-level splits differ per category.
train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test: Synthetic and real data for category/instance-level training and validation; real-world test data for both.
- ${SCENE_ID}: Each scene in a separate folder (e.g., 000011).
  - scene_camera.json: Camera parameters.
  - scene_gt.json: Ground-truth annotations (BOP format).
  - scene_gt_info.json: Meta info about ground-truth poses (BOP format).
  - scene_gt_coco_det_modal(_partcat|_inst)?.json: 2D bounding box and instance segmentation in COCO format.
    - scene_gt_coco_det_modal_partcat.json: Treats articulated parts as separate categories (for category-level evaluation).
    - scene_gt_coco_det_modal_inst.json: Treats each object instance as a separate category (for instance-level evaluation). Note: There may be more categories than reported in the paper, as some objects appear only in synthetic data.
  - rgb: Color images.
  - rgb_nocs: Normalized object coordinates as RGB (mapped from [-1, 1] to [0, 1]), normalized w.r.t. object bounding box. Example normalization:
```
mesh = trimesh.load_mesh(ply_fn)
bbox = mesh.bounds
center = (bbox[0] + bbox[1]) / 2
mesh.apply_translation(-center)
extent = bbox[1] - bbox[0]
colors = np.array(mesh.vertices) / extent.max()
colors = np.clip(colors + 0.5, 0, 1.)
```
    See this paper for disambiguation method.
  - depth: 16-bit depth images. Convert to meters by dividing by 10,000 (PBR) or 1,000 (real).
  - mask: Object masks.
  - mask_visib: Visible part masks.

Dataset Visualization

A visualization script is provided to display ground-truth pose annotations and rendered 3D models. Run visualizer.ipynb to generate visualizations like the following:

Benchmark Evaluation

Unzip all tar.gz files from HuggingFace and place them under dataset/pace for evaluation.

Instance-Level Pose Estimation

Ensure the bop_toolkit submodule is cloned: after git clone, run git submodule update --init, or use git clone --recurse-submodules [email protected]:qq456cvb/PACE.git.
Place prediction results at prediction/instance/${METHOD_NAME}_pace-test.csv (baseline results for CosyPose, GDRNPP, PPF and SurfEmb available on Hugging Face).

Run:

cd eval/instance
sh eval.sh ${METHOD_NAME}

Category-Level Pose Estimation

Place prediction results at prediction/category/${METHOD_NAME}_pred.pkl (baseline results for ANCSH, CPPF++, HS-Pose, NOCS, SAR-Net and SGPA available on Hugging Face).
Download ground-truth labels in compatible pkl format from Hugging Face and place at eval/category/catpose_gts_test.pkl.

Run:

cd eval/category
sh eval.sh ${METHOD_NAME}

Note: There are more categories (55) in category_names.txt than reported in the paper, as some categories lack real-world test images. The actual evaluation categories (47) are in category_names_test.txt (parts are counted separately). Ground-truth class IDs in catpose_gts_test.pkl use indices 1–55, matching category_names.txt.

Annotation Tools

We release the full annotation pipeline used to build PACE — a 3-camera RGB-D annotation system with an ArUco-marker-based capture rig. See the annotation tool documentation for a complete step-by-step guide, covering hardware setup, camera calibration, capture, object model preparation, interactive pose annotation (with keyboard shortcuts and label propagation), and mask generation.

The source code is organized as follows:

annotation_tool/
├─ inpainting      # video capture GUI + live marker inpainting preview
├─ obj_align       # align scanned meshes to category-canonical frames
├─ obj_sym         # annotate object symmetries
├─ pose_annotate   # main interactive pose annotation GUI
├─ postprocessing  # extrinsic refinement, marker removal, mask generation
├─ TFT_vs_Fund     # third-party MATLAB toolbox for 3-camera extrinsic refinement
├─ utils           # camera calibration, annotation I/O, rendering helpers

A typical annotation session follows this pipeline:

Calibrate the 3-camera rig with the ArUco marker (utils/calc_extrin.py)
Capture RGB-D videos plus background/relative-pose reference clips (inpainting/inpaint.py)
Refine the camera extrinsics via trifocal-tensor optimization (postprocessing/refine_extrinsic.py)
Remove the marker from the images by inpainting (postprocessing/remove_marker.py)
Prepare object models: canonical alignment and symmetry annotation (obj_align/, obj_sym/)
Annotate object poses interactively — PnP initialization, keyboard refinement, and automatic propagation across frames (pose_annotate/mainwindow.py)
Generate instance masks from the annotated poses (postprocessing/generate_seg.py)

License

MIT license for all contents except:

Models with IDs 693–1260 are from SketchFab under CC BY. Original posts: https://sketchfab.com/3d-models/${OBJ_IDENTIFIER} (find the identifier in models_info.json).
Models 1165 and 1166 are from GrabCAD (identical geometry, different colors). See GrabCAD license.

Citation

@inproceedings{you2024pace,
    title={PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments},
    author={You, Yang and Xiong, Kai and Yang, Zhening and Huang, Zhengxiang and Zhou, Junwei and Shi, Ruoxi and Fang, Zhou and Harley, Adam W. and Guibas, Leonidas and Lu, Cewu},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2024},
    organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
annotation_tool		annotation_tool
bop_toolkit @ c5c6702		bop_toolkit @ c5c6702
eval		eval
images		images
sample_data		sample_data
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
download_pace.ipynb		download_pace.ipynb
visualizer.ipynb		visualizer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W Harley, Leonidas Guibas, Cewu Lu

Why a New Dataset?

🔥News

Update Log

Table of Contents

Dataset Download

Dataset Format

Dataset Visualization

Benchmark Evaluation

Instance-Level Pose Estimation

Category-Level Pose Estimation

Annotation Tools

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PACE: Pose Annotations in Cluttered Environments(ECCV 2024)

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W Harley, Leonidas Guibas, Cewu Lu

Why a New Dataset?

🔥News

Update Log

Table of Contents

Dataset Download

Dataset Format

Dataset Visualization

Benchmark Evaluation

Instance-Level Pose Estimation

Category-Level Pose Estimation

Annotation Tools

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

Packages