Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions DEST/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
NVIDIA Source Code License for DEST

1. Definitions

“Licensor” means any person or entity that distributes its Work.

“Work” means (a) the original work of authorship made available under this license, which may include software,
documentation, or other files, and (b) any additions to or derivative works thereof that are made available
under this license.

The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning
as provided under U.S. copyright law; provided, however, that for the purposes of this license, derivative works
shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.

Works are “made available” under this license by including in or with the Work either (a) a copyright notice
referencing the applicability of this license to the Work, or (b) a copy of this license.

2. License Grant

2.1 Copyright Grant. Subject to the terms and conditions of this license, each Licensor grants to you a perpetual,
worldwide, non-exclusive, royalty-free, copyright license to use, reproduce, prepare derivative works of, publicly
display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.

3. Limitations

3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this license, (b)
you include a complete copy of this license with your distribution, and (c) you retain without modification any
copyright, patent, trademark, or attribution notices that are present in the Work.

3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and
distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use
limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that
are subject to Your Terms. Notwithstanding Your Terms, this license (including the redistribution requirements in
Section 3.1) will continue to apply to the Work itself.

3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use
non-commercially. Notwithstanding the foregoing, NVIDIA Corporation and its affiliates may use the Work and any
derivative works commercially. As used herein, “non-commercially” means for research or evaluation purposes only.

3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim,
cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then
your rights under this license from such Licensor (including the grant in Section 2.1) will terminate immediately.

3.5 Trademarks. This license does not grant any rights to use any Licensor's or its affiliates' names, logos,
or trademarks, except as necessary to reproduce the notices described in this license.

3.6 Termination. If you violate any term of this license, then your rights under this license (including the grant
in Section 2.1) will terminate immediately.

4. Disclaimer of Warranty.

THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING
WARRANTIES OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR
THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.

5. Limitation of Liability.

EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE),
CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR
MALFUNCTION, OR ANY OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
178 changes: 178 additions & 0 deletions DEST/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# DEST: Depth Estimation with Simplified Transformer

<!-- ![image](resources/image.png) -->
<div align="center">
<img src="./resources/attentions.png" height="400">
</div>

***[DEST: Depth Estimation with Simplified Transformer](https://arxiv.org/abs/2204.13791)***<br />
John Yang, Le An, Anurag Dixit, Jinkyu Koo, Su Inn Park
CVPR Workshop on [Transformers For Vision](https://sites.google.com/view/t4v-cvpr22), 2022

DEST leverages a simplified design of attention block in the transformer that is GPU friendly. Compared to state-of-the-art methods, our model achieves over 80% reduction in terms of model size and computation, while being more accurate and faster. The proposed model was validated on both depth esitimation and semantic segmentation tasks. This repository contains the official Pytorch model implementation and training configuration which can be adapted to your traing workflow.

<hr>

## Monocular Depth Estimation
For depth estimation, we employ the same setup as that in [PackNet-sfm](https://github.com/TRI-ML/packnet-sfm). For details on environment preparation, data download, and training/evaluation scripts, please refer to the original repo for details.

### Prerequistes

Run the following commands

```bash
git clone https://github.com/TRI-ML/packnet-sfm.git
cd packnet-sfm

cp path/to/DEST/configs/train_kitti_dest.yaml configs/
cp path/to/DEST/models/*_dest.py packnet_sfm/models/
cp path/to/DEST/networks/DESTNet.py packnet_sfm/networks/depth/
mkdir packnet_sfm/networks/DEST
cp path/to/DEST/networks/DEST/*.py packnet_sfm/networks/DEST/
```

in order to place DEST and its config file within the [PackNet-sfm](https://github.com/TRI-ML/packnet-sfm) implementation as shown below:

```yaml
packnet-sfm
├ configs
│ ...
│ └ train_kitti_dest.yaml
├ packnet_sfm
│ ...
│ ├ models
│ │ ...
│ │ ├ SfmModel_dest.py
│ │ ├ SemiSupModel_dest.py
│ │ └ SelfSupModel_dest.py
│ ├ networks
│ │ ...
│ │ ├ depth
│ │ │ ...
│ │ │ └ DESTNet.py
│ │ └ DEST
│ │ ├ __init__.py
│ │ ├ DEST_EncDec.py
│ │ ├ simplified_attention.py
│ │ └ simplified_joint_attention.py
...
```

### Modifications to make on PackNet repo
Our work quires ```timm``` library, so please add the following line in `docker/Dockerfile`.

```bash
RUN pip install timm
```

Before building the docker image, we also need to adjust the Python version, CUDNN version, NCCL version, etc. in the Dockerfile according to our machine. Note that the minimum supported Python version is 3.7. Base images can be found from [dockerhub](https://hub.docker.com/r/nvidia/cuda/tags?page=1&ordering=last_updated):

After properly configuring Dockerfile, please follow [the instructions](https://github.com/TRI-ML/packnet-sfm#install) to build your docker image.

Also, due to [the issues from the PackNet repository](https://github.com/TRI-ML/packnet-sfm/issues/107) during evalution,
you need to edit the lines of L295, L302 from the file `packnet-sfm/packnet_sfm/models/model_wrapper.py`.

Change lines
```
[L295] depth = inv2depth(inv_depths[0])
...
[L301] inv_depth_pp = post_process_inv_depth(
[L302] inv_depths[0], inv_depths_flipped[0], method='mean')
```
to
```
[L295] depth = inv2depth(inv_depths)
...
[L301] inv_depth_pp = post_process_inv_depth(
[L302] inv_depths, inv_depths_flipped, method='mean')
```


### Training

To train DEST from scratch on KITTI dataset, run the following command:
```bash
python scripts/train.py configs/train_kitti_dest.yaml
```

### Evaluation
For the evaluation of DEST model on KITTI dataset, run the following:

```bash
python scripts/eval.py --checkpoint <DEST.ckpt> [--config <config.yaml>]
```

For inference on a single image or folder:
You can also directly run inference on a single image or folder:

```bash
python scripts/infer.py --checkpoint <DEST.ckpt> --input <image or folder> --output <image or folder> [--image_shape <input shape (h,w)>]
```

<hr>

## Semantic Segmentation
For semantic segmentation, our implementation can be readily integrated into [OpenMMLab Semantic Segmentation Toolbox and Benchmark](https://github.com/open-mmlab/mmsegmentation) implementation for training and evaluation.

Please refer to their instruction for [installations](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/get_started.md#installation) and [dataset preparatation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#prepare-datasets).
Our DEST is trained/evaluated on [CityScapes Dataset](https://www.cityscapes-dataset.com/login/).

### Prerequisites
In order to follow MMSegmentation instructions for training, refer to the files that are located at ```DEST/semseg/``` and
re-locate the files within the MMSegmentation repository by running the following commands:
```bash
git clone https://github.com/open-mmlab/mmsegmentation.git # first clone the MMSegmentation env
cd mmsegmentation
mkdir configs/dest/

cp path/to/DEST/semseg/dest_simpatt-b0.py configs/_base_/models/
cp path/to/DEST/semseg/schedule_160k_adamw.py configs/_base_/schedules/
cp path/to/DEST/semseg/cityscapes_1024x1024_repeat.py configs/_base_/datasets/
cp path/to/DEST/semseg/dest_simpatt-*_1024x1024_160k_cityscapes.py configs/dest/
cp path/to/DEST/semseg/simplified_attention_mmseg.py mmseg/models/backbones/
cp path/to/DEST/semseg/dest_head.py mmseg/models/decode_heads/
```

You now need to include DEST in their library
```bash
echo 'from .simplified_attention_mmseg import SimplifiedTransformer' >> mmseg/models/backbones/__init__.py
echo 'from .dest_head import DestHead' >> mmseg/models/decode_heads/__init__.py
```

Then, you can start training/evaluating with a desired configuration of DEST.

### Training
Example: train DEST-B1 on CityScapes Dataset:

```bash
# Single-gpu training
python tools/train.py configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py
# Multi-gpu training
./tools/dist_train.sh configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py <GPU_NUM>
```

### Evaluation
After training, you can evaluate the trained model (e.g. DEST-B1)

```bash
# Single-gpu testing
python tools/test.py configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py /path/to/checkpoint_file
# Multi-gpu testing
./tools/dist_test.sh configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py /path/to/checkpoint_file <GPU_NUM>
# Multi-gpu, multi-scale testing
tools/dist_test.sh configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py /path/to/checkpoint_file <GPU_NUM> --aug-test
```


## License
The provided code can be used for research or other non-commercial purposes. For details please check the [LICENSE](LICENSE) file.

## Citation
```
@article{YangDEST,
title={Depth Estimation with Simplified Transformer},
author={Yang, John and An, Le and Dixit, Anurag and Koo, Jinkyu and Park, Su Inn},
journal={arXiv preprint arXiv:2204.13791},
year={2022}
}
```
43 changes: 43 additions & 0 deletions DEST/configs/train_kitti_dest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
model:
name: 'SelfSupModel_dest'
optimizer:
name: 'Adam'
depth:
lr: 0.000007
pose:
lr: 0.00001
scheduler:
name: 'StepLR'
step_size: 10
gamma: 0.5
depth_net:
name: 'DESTNet'
version: '1A'
pose_net:
name: 'PoseNet'
params:
crop: 'garg'
min_depth: 0.0
max_depth: 80.0
datasets:
augmentation:
image_shape: (192, 640)
train:
batch_size: 10
num_workers: 12
dataset: ['KITTI']
path: ['data/datasets/KITTI_raw']
split: ['data_splits/eigen_zhou_files.txt']
depth_type: ['velodyne']
repeat: [5]
validation:
dataset: ['KITTI']
path: ['data/datasets/KITTI_raw']
split: ['data_splits/eigen_val_files.txt',
'data_splits/eigen_test_files.txt']
depth_type: ['velodyne']
test:
dataset: ['KITTI']
path: ['data/datasets/KITTI_raw']
split: ['data_splits/eigen_test_files.txt']
depth_type: ['velodyne']
100 changes: 100 additions & 0 deletions DEST/models/SelfSupModel_dest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

import torch
from packnet_sfm.models.SfmModel_dest import SfmModel_dest
from packnet_sfm.losses.multiview_photometric_loss import MultiViewPhotometricLoss
from packnet_sfm.models.model_utils import merge_outputs


class SelfSupModel_dest(SfmModel_dest):
"""
Model that inherits a depth and pose network from SfmModel and
includes the photometric loss for self-supervised training.

Parameters
----------
kwargs : dict
Extra parameters
"""
def __init__(self, **kwargs):
# Initializes SfmModel
super().__init__(**kwargs)
# Initializes the photometric loss
self._photometric_loss = MultiViewPhotometricLoss(**kwargs)

@property
def logs(self):
"""Return logs."""
return {
**super().logs,
**self._photometric_loss.logs
}


def self_supervised_loss(self, image, ref_images, inv_depths, poses,
intrinsics, return_logs=False, progress=0.0):
"""
Calculates the self-supervised photometric loss.

Parameters
----------
image : torch.Tensor [B,3,H,W]
Original image
ref_images : list of torch.Tensor [B,3,H,W]
Reference images from context
inv_depths : torch.Tensor [B,1,H,W]
Predicted inverse depth maps from the original image
poses : list of Pose
List containing predicted poses between original and context images
intrinsics : torch.Tensor [B,3,3]
Camera intrinsics
return_logs : bool
True if logs are stored
progress :
Training progress percentage

Returns
-------
output : dict
Dictionary containing a "loss" scalar a "metrics" dictionary
"""
return self._photometric_loss(
image, ref_images, inv_depths, intrinsics, intrinsics, poses,
return_logs=return_logs, progress=progress)


def forward(self, batch, return_logs=False, progress=0.0):
"""
Processes a batch.

Parameters
----------
batch : dict
Input batch
return_logs : bool
True if logs are stored
progress :
Training progress percentage

Returns
-------
output : dict
Dictionary containing a "loss" scalar and different metrics and predictions
for logging and downstream usage.
"""
# Calculate predicted depth and pose output
output = super().forward(batch, return_logs=return_logs)
if not self.training:
# If not training, no need for self-supervised loss
return output
else:
# Otherwise, calculate self-supervised loss
self_sup_output = self.self_supervised_loss(
batch['rgb_original'], batch['rgb_context_original'],
output['inv_depths'], output['poses'], batch['intrinsics'],
return_logs=return_logs, progress=progress)
# Return loss and metrics
return {
'loss': self_sup_output['loss'],
**merge_outputs(output, self_sup_output),
}
Loading