Skip to content

qq456cvb/PACE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

PACE Teaser


PACE (Pose Annotations in Cluttered Environments) is a large-scale benchmark designed to advance pose estimation in challenging, cluttered scenarios. PACE provides comprehensive real-world and simulated datasets for both instance-level and category-level tasks, featuring:

  • 55K frames with 258K annotations across 300 videos
  • 238 objects from 43 categories (rigid and articulated)
  • An innovative annotation system using a calibrated 3-camera setup
  • PACESim: 100K photo-realistic simulated frames with 2.4M annotations across 931 objects

We evaluate state-of-the-art algorithms on PACE for both pose estimation and object pose tracking, highlighting the benchmark's challenges and research opportunities.


Why a New Dataset?

  • PACE rigorously tests the generalization of state-of-the-art methods in complex, real-world environments, enabling exploration and quantification of the 'simulation-to-reality' gap for practical applications.

πŸ”₯News

  • Try our latest pose estimator CPPF++ (TPAMI), which achieves state-of-the-art performance on PACE.

Update Log

  • 2024/07/22: PACE v1.1 uploaded to HuggingFace. Benchmark evaluation code released.
  • 2024/03/01: PACE v1.0 released.

Table of Contents


Dataset Download

Download the dataset from HuggingFace. Unzip all tar.gz files and place them under dataset/pace for evaluation. Large files are split into chunks; merge them with, e.g., cat test_chunk_* > test.tar.gz. We also provide a convenient download script at download_pace.ipynb.


Dataset Format

PACE follows the BOP format with the following structure (regex syntax):

camera_pbr.json
models(_eval|_nocs)?
β”œβ”€ models_info.json
β”œβ”€ (artic_info.json)?
β”œβ”€ obj_${OBJ_ID}.ply
model_splits
β”œβ”€ category
|  β”œβ”€ ${category}_(train|val|test).txt
|  β”œβ”€ (train|val|test).txt
β”œβ”€ instance
|  β”œβ”€ (train|val|test).txt
(train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test)
β”œβ”€ ${SCENE_ID}
β”‚  β”œβ”€ scene_camera.json
β”‚  β”œβ”€ scene_gt.json
β”‚  β”œβ”€ scene_gt_info.json
β”‚  β”œβ”€ scene_gt_coco_det_modal(_partcat|_inst)?.json
β”‚  β”œβ”€ depth
β”‚  β”œβ”€ mask
β”‚  β”œβ”€ mask_visib
β”‚  β”œβ”€ rgb
|  β”œβ”€ (rgb_nocs)?

Key components:

  • camera_pbr.json: Camera parameters for PBR rendering; real camera parameters are in each scene's scene_camera.json.
  • models(_eval|_nocs)?: 3D object models. models contains original scanned meshes; models_eval has uniformly sampled point clouds for evaluation (e.g., Chamfer distance); all models (except articulated parts, ID 545–692) are recentered and normalized to a unit bounding box. models_nocs recolors vertices by NOCS coordinates.
    • models_info.json: Mesh metadata (diameter, bounds, scales in mm), and mapping from obj_id to object identifier. Articulated objects have multiple parts, each with a unique obj_id; associations are in artic_info.json.
    • artic_info.json: Part information for articulated objects, keyed by identifier.
    • obj_${OBJ_ID}.ply: Mesh file for object ${OBJ_ID}.
  • model_splits: Model IDs for train/val/test splits. Instance-level splits share IDs; category-level splits differ per category.
  • train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test: Synthetic and real data for category/instance-level training and validation; real-world test data for both.
    • ${SCENE_ID}: Each scene in a separate folder (e.g., 000011).
      • scene_camera.json: Camera parameters.
      • scene_gt.json: Ground-truth annotations (BOP format).
      • scene_gt_info.json: Meta info about ground-truth poses (BOP format).
      • scene_gt_coco_det_modal(_partcat|_inst)?.json: 2D bounding box and instance segmentation in COCO format.
        • scene_gt_coco_det_modal_partcat.json: Treats articulated parts as separate categories (for category-level evaluation).
        • scene_gt_coco_det_modal_inst.json: Treats each object instance as a separate category (for instance-level evaluation). Note: There may be more categories than reported in the paper, as some objects appear only in synthetic data.
      • rgb: Color images.
      • rgb_nocs: Normalized object coordinates as RGB (mapped from [-1, 1] to [0, 1]), normalized w.r.t. object bounding box. Example normalization:
        mesh = trimesh.load_mesh(ply_fn)
        bbox = mesh.bounds
        center = (bbox[0] + bbox[1]) / 2
        mesh.apply_translation(-center)
        extent = bbox[1] - bbox[0]
        colors = np.array(mesh.vertices) / extent.max()
        colors = np.clip(colors + 0.5, 0, 1.)
        See this paper for disambiguation method.
      • depth: 16-bit depth images. Convert to meters by dividing by 10,000 (PBR) or 1,000 (real).
      • mask: Object masks.
      • mask_visib: Visible part masks.

Dataset Visualization

A visualization script is provided to display ground-truth pose annotations and rendered 3D models. Run visualizer.ipynb to generate visualizations like the following:


Benchmark Evaluation

Unzip all tar.gz files from HuggingFace and place them under dataset/pace for evaluation.

Instance-Level Pose Estimation

  • Ensure the bop_toolkit submodule is cloned: after git clone, run git submodule update --init, or use git clone --recurse-submodules [email protected]:qq456cvb/PACE.git.
  • Place prediction results at prediction/instance/${METHOD_NAME}_pace-test.csv (baseline results for CosyPose, GDRNPP, PPF and SurfEmb available on Hugging Face).
  • Run:
    cd eval/instance
    sh eval.sh ${METHOD_NAME}

Category-Level Pose Estimation

  • Place prediction results at prediction/category/${METHOD_NAME}_pred.pkl (baseline results for ANCSH, CPPF++, HS-Pose, NOCS, SAR-Net and SGPA available on Hugging Face).
  • Download ground-truth labels in compatible pkl format from Hugging Face and place at eval/category/catpose_gts_test.pkl.
  • Run:
    cd eval/category
    sh eval.sh ${METHOD_NAME}

Note: There are more categories (55) in category_names.txt than reported in the paper, as some categories lack real-world test images. The actual evaluation categories (47) are in category_names_test.txt (parts are counted separately). Ground-truth class IDs in catpose_gts_test.pkl use indices 1–55, matching category_names.txt.


Annotation Tools

We release the full annotation pipeline used to build PACE β€” a 3-camera RGB-D annotation system with an ArUco-marker-based capture rig. See the annotation tool documentation for a complete step-by-step guide, covering hardware setup, camera calibration, capture, object model preparation, interactive pose annotation (with keyboard shortcuts and label propagation), and mask generation.

The source code is organized as follows:

annotation_tool/
β”œβ”€ inpainting      # video capture GUI + live marker inpainting preview
β”œβ”€ obj_align       # align scanned meshes to category-canonical frames
β”œβ”€ obj_sym         # annotate object symmetries
β”œβ”€ pose_annotate   # main interactive pose annotation GUI
β”œβ”€ postprocessing  # extrinsic refinement, marker removal, mask generation
β”œβ”€ TFT_vs_Fund     # third-party MATLAB toolbox for 3-camera extrinsic refinement
β”œβ”€ utils           # camera calibration, annotation I/O, rendering helpers

A typical annotation session follows this pipeline:

  1. Calibrate the 3-camera rig with the ArUco marker (utils/calc_extrin.py)
  2. Capture RGB-D videos plus background/relative-pose reference clips (inpainting/inpaint.py)
  3. Refine the camera extrinsics via trifocal-tensor optimization (postprocessing/refine_extrinsic.py)
  4. Remove the marker from the images by inpainting (postprocessing/remove_marker.py)
  5. Prepare object models: canonical alignment and symmetry annotation (obj_align/, obj_sym/)
  6. Annotate object poses interactively β€” PnP initialization, keyboard refinement, and automatic propagation across frames (pose_annotate/mainwindow.py)
  7. Generate instance masks from the annotated poses (postprocessing/generate_seg.py)

License

MIT license for all contents except:

  • Models with IDs 693–1260 are from SketchFab under CC BY. Original posts: https://sketchfab.com/3d-models/${OBJ_IDENTIFIER} (find the identifier in models_info.json).
  • Models 1165 and 1166 are from GrabCAD (identical geometry, different colors). See GrabCAD license.

Citation

@inproceedings{you2024pace,
    title={PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments},
    author={You, Yang and Xiong, Kai and Yang, Zhening and Huang, Zhengxiang and Zhou, Junwei and Shi, Ruoxi and Fang, Zhou and Harley, Adam W. and Guibas, Leonidas and Lu, Cewu},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2024},
    organization={Springer}
}

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors