Skip to content

valeoai/VaViT

Repository files navigation

VaViT

PyTorch code and models for VaViT.

Vanilla ViT for Automotive Point Cloud Semantic Segmentation
Gilles Puy1, Nermin Samet1 Alexandre Boulch1, Spyros Gidaris1, Tuan-Hung VU1 Renaud Marlet1,2

1valeo.ai, France.
2LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, France.

VaViT architecture

If you find this code or work useful, please cite the following paper:

@article{vavit,
  title={Vanilla ViT for Automotive Point Cloud Semantic Segmentation},
  author={Puy, Gilles and Samet, Nermin and Boulch, Alexandre and Gidaris, Spyros and Vu, Tuan-Hung and Marlet, Renaud},
  journal={arXiv},
  year={2026}
}

Overview

Installation

Environment

The results were obtained with the environment below.

conda create -n vavit
conda activate vavit
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch
pip install pyaml==25.7.0 tqdm==4.67.1 scipy==1.15.3 tensorboard==2.20.0
git clone https://github.com/valeoai/VaViT
cd VaViT

Download and untar the following file:

wget https://github.com/valeoai/WaffleIron/files/10294733/info_datasets.tar.gz
tar -xvzf info_datasets.tar.gz
rm info_datasets.tar.gz

Datasets

We use the following datasets: nuScenes, SemanticKITTI and Waymo Open Dataset (WOD).

Please download nuScenes and SemanticKITTI. The folder structure must be:

/path/to/datasets/
|
|- nuscenes/
|  |- lidarseg/
|  | ...
|  |- v1.0-trainval
|
|- semantic_kitti/
|  |- dataset/

For WOD, you must preprocess it as described in Pointcept. The folder structure will be:

/path/to/datasets/
|
|- waymo_open_dataset_v_1_4_3_processed/
|  |- training/
|  |  |-segment-***
|
|  |- validation/
|  |  |-segment-***

Available models

We provide the following models.

Model Dataset mIoU (last / best)
VaViT-B nuScenes 81.3 % / 81.3 %
VaViT-B SemanticKITTI 67.6 % / 68.0 %
VaViT-B WOD 70.5 % / 70.9 %

Download them and unzip them in the VaViT/ folder. The checkpoints will appear in a subfolder ./checkpoints/.

Notes:

  • The nuScenes models are released under the following terms.
  • The SemanticKITTI are released under the following terms.
  • The WOD models are released under the following terms.

Evaluation

nuScenes

To evaluate the provided checkpoint:

config="VaViT_B-drop_0.3"
dataset="nuscenes"
PATH_DATASETS=/path/to/datasets

python train.py \
--dataset ${dataset} \
--path_dataset ${PATH_DATASETS}/${dataset}/ \
--log_path ./checkpoints/${dataset}/${config}/ \
--config configs/${dataset}/${config}.yaml \
--gpu 0 \
--restart \
--eval last

You can adjust the batch size in the config file configs/${dataset}/${config}.yaml if needed.

Replace --eval last by --eval best to re-evaluate the best checkpoint.

You can also save the predictions on disk and use the official nuScenes devkit for evaluation.

config="VaViT_B-drop_0.3"
dataset="nuscenes"
PATH_DATASETS=/path/to/datasets/

python save_nuscenes_predictions.py \
--path_dataset ${PATH_DATASETS}/${dataset}/ \
--config configs/${dataset}/${config}.yaml \
--ckpt ./checkpoints/${dataset}/${config}/ckpt_last.pth \
--result_folder ./predictions/${dataset}/${config}

pip install nuscenes-devkit
git clone https://github.com/nutonomy/nuscenes-devkit.git

python nuscenes-devkit/python-sdk/nuscenes/eval/lidarseg/evaluate.py \
--result_path ./predictions/${dataset}/${config} \
--eval_set val \
--version v1.0-trainval \
--dataroot ${PATH_DATASETS}/${dataset}/ \
--verbose True

SemanticKITTI

To evaluate the last checkpoint:

config="VaViT_B-drop_0.5"
dataset="semantic_kitti"
PATH_DATASETS=/path/to/datasets/

python train.py \
--dataset ${dataset} \
--path_dataset ${PATH_DATASETS}/${dataset}/ \
--log_path ./checkpoints/${dataset}/${config}/ \
--config configs/${dataset}/${config}.yaml \
--restart \
--eval last

You can adjust the batch size in the config file configs/${dataset}/${config}.yaml if needed.

Replace --eval last by --eval best to re-evaluate the best checkpoint.

Remark: On SemanticKITTI, the code above will extract object instances on the train set (despite this being not necessary for validation). This step can be bypassed by editing the yaml config file and changing the entry instance_cutmix to False. The instances are saved automatically in /tmp/kitti_instances/. Do not forget to enable again this augmentation to train a new model on SemanticKITTI.

You can also save the predictions on disk and use the official SemanticAPI for evaluation.

config="VaViT_B-drop_0.5"
dataset="semantic_kitti"
PATH_DATASETS=/path/to/datasets/

python save_semkitti_predictions.py \
--path_dataset ${PATH_DATASETS}/${dataset}/ \
--config configs/${dataset}/${config}.yaml \
--ckpt ./checkpoints/${dataset}/${config}/ckpt_last.pth \
--result_folder ./predictions/${dataset}/${config}/

git clone https://github.com/PRBonn/semantic-kitti-api.git
cd semantic-kitti-api/

python evaluate_semantics.py \
--dataset ${PATH_DATASETS}/${dataset}/dataset \
--predictions ../predictions/${dataset}/${config}/ \
--split valid

WOD

To evaluate the last checkpoint:

config="VaViT_B-drop_0.3"
dataset="waymo"
PATH_DATASETS=/path/to/datasets/

python train.py \
--dataset ${dataset} \
--path_dataset ${PATH_DATASETS}/waymo_open_dataset_v_1_4_3_processed/ \
--log_path ./checkpoints/${dataset}/${config}/ \
--config configs/${dataset}/${config}.yaml \
--restart \
--eval last

You can adjust the batch size in the config file configs/${dataset}/${config}.yaml if needed.

Replace --eval last by --eval best to re-evaluate the best checkpoint.

Training

nuScenes

To retrain VaViT-B on nuScenes:

config="VaViT_B-drop_0.3"
dataset="nuscenes"
PATH_DATASETS=/path/to/datasets/

python train.py \
--dataset ${dataset} \
--path_dataset ${PATH_DATASETS}/${dataset}/ \
--log_path ./logs/${dataset}/${config}/ \
--config configs/${dataset}/${config}.yaml \
--gpu 0

Remarks:

  • We trained our networks in bfloat16 (option activated by default). float16 is also available with the flag --fp16 but we often ran into numerical instabilities with this option.
  • The final model can be re-evaluated by adding the flags --restart --eval last.
  • For multi-GPU training (on one node), replace the flag --gpu 0 by --multiprocessing-distributed.

SemanticKITTI

To retrain VaViT-B on SemanticKITTI:

config="VaViT_B-drop_0.5"
dataset="semantic_kitti"
PATH_DATASETS=/path/to/datasets/

python train.py \
--dataset ${dataset} \
--path_dataset ${PATH_DATASETS}/${dataset}/ \
--log_path ./logs/${dataset}/${config}/ \
--config configs/${dataset}/${config}.yaml \
--multiprocessing-distributed

At the beginning of the training, the instances for cutmix augmentation are saved in /tmp/kitti_instances/. If this process is interrupted before completion, please delete /tmp/kitti_instances/ and relaunch the training. You can disable the instance cutmix augmentations by editing the yaml config file to set instance_cutmix to False.

WOD

To retrain VaViT-B on WOD:

config="VaViT_B-drop_0.3"
dataset="waymo"
PATH_DATASETS=/path/to/datasets/

python train.py \
--dataset ${dataset} \
--path_dataset ${PATH_DATASETS}/waymo_open_dataset_v_1_4_3_processed/ \
--log_path ./logs/${dataset}/${config}/ \
--config configs/${dataset}/${config}.yaml \
--multiprocessing-distributed

Acknowledgements

We thank the authors of

@inproceedings{berman18lovasz,
  title = {The Lovász-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks},
  author = {Berman, Maxim and Triki, Amal Rannen and Blaschko, Matthew B.},
  booktitle = {CVPR},
  year = {2018},
}

for making their implementation of the Lovász loss publicly available.

License

VaViT is released under the Apache 2.0 license.

The implementation of the Lovász loss in utils/lovasz.py is released under MIT Licence.

The SemanticKITTI config file in datasets/semantic-kitti.yaml is released under MIT License.

About

PyTorch code and models for VaViT

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages