This is the official implementation of HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras (ECCV 2024, Paper) and HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving (Paper).
HENet is an end-to-end multi-task 3D perception framework. It reduces training costs through hybrid image encoding and mitigates multi-task conflicts through independent BEV feature encoding.
Visualization results of HENet and baselines on end-to-end multi-tasking. The proposed HENet estimates occluded objects better through long-term information and have more accurate predictions through high-resolution information.
HENet++ extends HENet to end-to-end planning. It simultaneously extracts both dense and sparse features, providing more suitable representations for different tasks, reducing cumulative errors, and delivering more comprehensive information to the planning module.
Through improvements in model architecture and pre-training based on model merging, HENet++ achieves superior multi-task and single-task performance.
This repository provides a sample model for hybrid encoding and multi-task decoding:
| mAP | NDS | mIoU | config | model | |
|---|---|---|---|---|---|
| HENet | 49.8 | 59.8 | 58.0 | HENet | Google Drive |
Additionally, this repository provides a student distilled model for HENet++ end-to-end autonomous driving. This model was distilled using a high-precision HENet++ model as the teacher and achieves a comparable end-to-end collision rate. It serves as a baseline and has been applied in the KnowVal and DrivingAgent frameworks.
| UniAD L2 | UniAD Col | VAD L2 | VAD Col | config | model | |
|---|---|---|---|---|---|---|
| HENet++ | 1.29 | 0.12% | 0.55 | 0.04% | HENet++ | Google Drive |
KnowVal, HENet, R4Det, RCBEVDet, and TEOcc were developed under the same framework. You can easily merge these repositories into one. If you have prepared the environment for any of them, you do not need to create a new environment.
The code is tested in the following two environments:
cuda 12.1
pytorch 2.0.1+cu118
GPU A800, A40
(Need to manually comment out the cuda version check of pytorch)
(For a detailed package list, please refer to envs_list_cu121.txt)
cuda 11.3
pytorch 1.12.1+cu113
GPU RTX8000, RTX3090, V100, P40
(For a detailed package list, please refer to envs_list_cu113.txt)
The most recommended installation steps are:
-
Create a Python environment. Install PyTorch corresponding to your machine's CUDA version;
-
Install mmcv corresponding to your PyTorch and CUDA version;
-
Install other dependencies of mmdet and install mmdet;
-
Install other dependencies of this project (Please change the spconv version in the requirements.txt to the CUDA version you are using) and setup this project;
python setup.py develop- Compile some operators manually.
cd mmdet3d/ops/csrc
python setup.py build_ext --inplace
cd ../deformattn
python setup.py build install- Install other dependencies of detectron2 and install detectron2;
cd detr2
python setup.py developPlease download nuScenes-v1.0-trainval and nuScenes-map-expansion-v1.3
at nuScenes.org and CVPR23-Occupancy/gts.tar.gz at
CVPR2023-3D-Occupancy-Prediction.
If your folder structure is different from the following, you may need to change the corresponding paths in config files.
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ │ ├── basemap
│ │ │ ├── expansion
│ │ │ ├── prediction
│ │ │ ├── *.png
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
We recommend that you download the processed data index file directly via this Google Drive link.
Prepare nuScenes data by running:
python tools/create_data_nuscenes_C.py./tools/dist_train.sh $config_path $gpusTesting on validation set:
./tools/dist_test.sh $config_path $checkpoint_path $gpus --eval bboxTesting on test set:
./tools/dist_test.sh $config_path $checkpoint_path $gpus --format-only --eval-options 'jsonfile_prefix=work_dirs'
mv work_dirs/pts_bbox/results_nusc.json work_dirs/pts_bbox/{$name}.jsonIf you have any other questions, please refer to mmdet3d docs.
We sincerely thank these excellent open-source projects:
If this work is helpful for your research, please consider citing our paper HENet++ and HENet.
@article{xia2025henet++,
title={HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving},
author={Xia, Zhongyu and Lin, Zhiwei and Wang, Yongtao and Yang, Ming-Hsuan},
journal={arXiv preprint arXiv:2511.07106},
year={2025}
}
@inproceedings{xia2024henet,
title={HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras},
author={Xia, Zhongyu and Lin, Zhiwei and Wang, Xinhao and Wang, Yongtao and Xing, Yun and Qi, Shengxiang and Dong, Nan and Yang, Ming-Hsuan},
booktitle={Proceedings of the European Conference on Computer Vision},
year={2024}
}
The project is only free for academic research purposes but needs authorization for commerce. For commerce permission, please contact [email protected].




