NVIDIA · dependabot · Feb 11, 2022 · Jun 21, 2022 · Aug 27, 2022 · Aug 29, 2022
diff --git a/DEST/LICENSE b/DEST/LICENSE
@@ -0,0 +1,62 @@
+NVIDIA Source Code License for DEST
+
+1. Definitions
+
+“Licensor” means any person or entity that distributes its Work.
+
+“Work” means (a) the original work of authorship made available under this license, which may include software,
+documentation, or other files, and (b) any additions to or derivative works  thereof  that are made available
+under this license.
+
+The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning
+as provided under U.S. copyright law; provided, however, that for the purposes of this license, derivative works
+shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
+
+Works are “made available” under this license by including in or with the Work either (a) a copyright notice
+referencing the applicability of this license to the Work, or (b) a copy of this license.
+
+2. License Grant
+
+2.1 Copyright Grant. Subject to the terms and conditions of this license, each Licensor grants to you a perpetual,
+worldwide, non-exclusive, royalty-free, copyright license to use, reproduce, prepare derivative works of, publicly
+display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
+
+3. Limitations
+
+3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this license, (b)
+you include a complete copy of this license with your distribution, and (c) you retain without modification any
+copyright, patent, trademark, or attribution notices that are present in the Work.
+
+3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and
+distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use
+limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that
+are subject to Your Terms. Notwithstanding Your Terms, this license (including the redistribution requirements in
+Section 3.1) will continue to apply to the Work itself.
+
+3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use
+non-commercially. Notwithstanding the foregoing, NVIDIA Corporation and its affiliates may use the Work and any
+derivative works commercially. As used herein, “non-commercially” means for research or evaluation purposes only.
+
+3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim,
+cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then
+your rights under this license from such Licensor (including the grant in Section 2.1) will terminate immediately.
+
+3.5 Trademarks. This license does not grant any rights to use any Licensor's or its affiliates' names, logos,
+or trademarks, except as necessary to reproduce the notices described in this license.
+
+3.6 Termination. If you violate any term of this license, then your rights under this license (including the grant
+in Section 2.1) will terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING
+WARRANTIES OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR
+THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE),
+CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL,
+INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR
+MALFUNCTION, OR ANY OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
diff --git a/DEST/README.md b/DEST/README.md
@@ -0,0 +1,178 @@
+# DEST: Depth Estimation with Simplified Transformer
+
+<!-- ![image](resources/image.png) -->
+<div align="center">
+  <img src="./resources/attentions.png" height="400">
+</div>
+
+***[DEST: Depth Estimation with Simplified Transformer](https://arxiv.org/abs/2204.13791)***<br />
+John Yang, Le An, Anurag Dixit, Jinkyu Koo, Su Inn Park  
+CVPR Workshop on [Transformers For Vision](https://sites.google.com/view/t4v-cvpr22), 2022
+
+DEST leverages a simplified design of attention block in the transformer that is GPU friendly. Compared to state-of-the-art methods, our model achieves over 80% reduction in terms of model size and computation, while being more accurate and faster. The proposed model was validated on both depth esitimation and semantic segmentation tasks. This repository contains the official Pytorch model implementation and training configuration which can be adapted to your traing workflow. 
+
+<hr>
+
+## Monocular Depth Estimation
+For depth estimation, we employ the same setup as that in [PackNet-sfm](https://github.com/TRI-ML/packnet-sfm). For details on environment preparation, data download, and training/evaluation scripts, please refer to the original repo for details. 
+
+### Prerequistes
+
+Run the following commands
+
+```bash
+git clone https://github.com/TRI-ML/packnet-sfm.git
+cd packnet-sfm
+
+cp path/to/DEST/configs/train_kitti_dest.yaml configs/
+cp path/to/DEST/models/*_dest.py packnet_sfm/models/
+cp path/to/DEST/networks/DESTNet.py packnet_sfm/networks/depth/
+mkdir packnet_sfm/networks/DEST
+cp path/to/DEST/networks/DEST/*.py packnet_sfm/networks/DEST/
+```
+
+in order to place DEST and its config file within the [PackNet-sfm](https://github.com/TRI-ML/packnet-sfm) implementation as shown below:
+
+```yaml
+packnet-sfm
+ ├ configs
+ │ ...
+ │ └ train_kitti_dest.yaml
+ ├ packnet_sfm
+ │ ...
+ │ ├ models
+ │ │ ...
+ │ │ ├ SfmModel_dest.py
+ │ │ ├ SemiSupModel_dest.py
+ │ │ └ SelfSupModel_dest.py
+ │ ├ networks
+ │ │ ...
+ │ │ ├ depth
+ │ │ │ ...
+ │ │ │ └ DESTNet.py
+ │ │ └ DEST
+ │ │   ├ __init__.py
+ │ │   ├ DEST_EncDec.py
+ │ │   ├ simplified_attention.py
+ │ │   └ simplified_joint_attention.py
+...
+```
+
+### Modifications to make on PackNet repo
+Our work quires ```timm``` library, so please add the following line in `docker/Dockerfile`.
+
+```bash
+RUN pip install timm
+```
+
+Before building the docker image, we also need to adjust the Python version, CUDNN version, NCCL version, etc. in the Dockerfile according to our machine. Note that the minimum supported Python version is 3.7. Base images can be found from [dockerhub](https://hub.docker.com/r/nvidia/cuda/tags?page=1&ordering=last_updated): 
+
+After properly configuring Dockerfile, please follow [the instructions](https://github.com/TRI-ML/packnet-sfm#install) to build your docker image.
+
+Also, due to [the issues from the PackNet repository](https://github.com/TRI-ML/packnet-sfm/issues/107) during evalution, 
+you need to edit the lines of L295, L302 from the file `packnet-sfm/packnet_sfm/models/model_wrapper.py`.
+
+Change lines
+```
+[L295] depth = inv2depth(inv_depths[0])
+...
+[L301] inv_depth_pp = post_process_inv_depth(
+[L302]     inv_depths[0], inv_depths_flipped[0], method='mean')
+```
+to
+```
+[L295] depth = inv2depth(inv_depths)
+...
+[L301] inv_depth_pp = post_process_inv_depth(
+[L302]     inv_depths, inv_depths_flipped, method='mean')
+```
+
+
+### Training
+
+To train DEST from scratch on KITTI dataset, run the following command:
+```bash
+python scripts/train.py configs/train_kitti_dest.yaml
+```
+
+### Evaluation
+For the evaluation of DEST model on KITTI dataset, run the following:
+
+```bash
+python scripts/eval.py --checkpoint <DEST.ckpt> [--config <config.yaml>]
+```
+
+For inference on a single image or folder:
+You can also directly run inference on a single image or folder:
+
+```bash
+python scripts/infer.py --checkpoint <DEST.ckpt> --input <image or folder> --output <image or folder> [--image_shape <input shape (h,w)>]
+```
+
+<hr>
+
+## Semantic Segmentation
+For semantic segmentation, our implementation can be readily integrated into [OpenMMLab Semantic Segmentation Toolbox and Benchmark](https://github.com/open-mmlab/mmsegmentation) implementation for training and evaluation. 
+
+Please refer to their instruction for [installations](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/get_started.md#installation) and [dataset preparatation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#prepare-datasets).
+Our DEST is trained/evaluated on [CityScapes Dataset](https://www.cityscapes-dataset.com/login/). 
+
+### Prerequisites
+In order to follow MMSegmentation instructions for training,  refer to the files that are located at ```DEST/semseg/``` and
+re-locate the files within the MMSegmentation repository by running the following commands:
+```bash
+git clone https://github.com/open-mmlab/mmsegmentation.git # first clone the MMSegmentation env
+cd mmsegmentation
+mkdir configs/dest/
+
+cp path/to/DEST/semseg/dest_simpatt-b0.py configs/_base_/models/
+cp path/to/DEST/semseg/schedule_160k_adamw.py configs/_base_/schedules/
+cp path/to/DEST/semseg/cityscapes_1024x1024_repeat.py configs/_base_/datasets/
+cp path/to/DEST/semseg/dest_simpatt-*_1024x1024_160k_cityscapes.py configs/dest/
+cp path/to/DEST/semseg/simplified_attention_mmseg.py mmseg/models/backbones/
+cp path/to/DEST/semseg/dest_head.py mmseg/models/decode_heads/
+```
+
+You now need to include DEST in their library
+```bash
+echo 'from .simplified_attention_mmseg import SimplifiedTransformer' >> mmseg/models/backbones/__init__.py
+echo 'from .dest_head import DestHead' >> mmseg/models/decode_heads/__init__.py
+```
+
+Then, you can start training/evaluating with a desired configuration of DEST.
+
+### Training
+Example: train DEST-B1 on CityScapes Dataset:
+
+```bash
+# Single-gpu training
+python tools/train.py configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py
+# Multi-gpu training
+./tools/dist_train.sh configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py <GPU_NUM>
+```
+
+### Evaluation
+After training, you can evaluate the trained model (e.g. DEST-B1)
+
+```bash
+# Single-gpu testing
+python tools/test.py configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py /path/to/checkpoint_file
+# Multi-gpu testing
+./tools/dist_test.sh configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py /path/to/checkpoint_file <GPU_NUM>
+# Multi-gpu, multi-scale testing
+tools/dist_test.sh configs/dest/dest_simpatt-b1_1024x1024_160k_cityscapes.py /path/to/checkpoint_file <GPU_NUM> --aug-test
+```
+
+
+## License
+The provided code can be used for research or other non-commercial purposes. For details please check the [LICENSE](LICENSE) file.
+
+## Citation
+```
+@article{YangDEST,
+  title={Depth Estimation with Simplified Transformer},
+  author={Yang, John and An, Le and Dixit, Anurag and Koo, Jinkyu and Park, Su Inn},
+  journal={arXiv preprint arXiv:2204.13791},
+  year={2022}
+}
+```
diff --git a/DEST/configs/train_kitti_dest.yaml b/DEST/configs/train_kitti_dest.yaml
@@ -0,0 +1,43 @@
+model:
+    name: 'SelfSupModel_dest' 
+    optimizer:
+        name: 'Adam'
+        depth:
+            lr: 0.000007
+        pose:
+            lr: 0.00001 
+    scheduler:
+        name: 'StepLR'
+        step_size: 10   
+        gamma: 0.5
+    depth_net:
+        name: 'DESTNet'
+        version: '1A'
+    pose_net:
+        name: 'PoseNet'  
+    params:
+        crop: 'garg'
+        min_depth: 0.0
+        max_depth: 80.0
+datasets:
+    augmentation:
+        image_shape: (192, 640)
+    train:
+        batch_size: 10
+        num_workers: 12
+        dataset: ['KITTI']
+        path: ['data/datasets/KITTI_raw']
+        split: ['data_splits/eigen_zhou_files.txt']
+        depth_type: ['velodyne']
+        repeat: [5]
+    validation:
+        dataset: ['KITTI']
+        path: ['data/datasets/KITTI_raw']
+        split: ['data_splits/eigen_val_files.txt',
+                'data_splits/eigen_test_files.txt']
+        depth_type: ['velodyne']
+    test:
+        dataset: ['KITTI']
+        path: ['data/datasets/KITTI_raw']
+        split: ['data_splits/eigen_test_files.txt']
+        depth_type: ['velodyne']
diff --git a/DEST/models/SelfSupModel_dest.py b/DEST/models/SelfSupModel_dest.py
@@ -0,0 +1,100 @@
+# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+import torch
+from packnet_sfm.models.SfmModel_dest import SfmModel_dest
+from packnet_sfm.losses.multiview_photometric_loss import MultiViewPhotometricLoss
+from packnet_sfm.models.model_utils import merge_outputs
+
+
+class SelfSupModel_dest(SfmModel_dest):
+    """
+    Model that inherits a depth and pose network from SfmModel and
+    includes the photometric loss for self-supervised training.
+
+    Parameters
+    ----------
+    kwargs : dict
+        Extra parameters
+    """
+    def __init__(self, **kwargs):
+        # Initializes SfmModel
+        super().__init__(**kwargs)
+        # Initializes the photometric loss
+        self._photometric_loss = MultiViewPhotometricLoss(**kwargs)
+
+    @property
+    def logs(self):
+        """Return logs."""
+        return {
+            **super().logs,
+            **self._photometric_loss.logs
+        }
+
+
+    def self_supervised_loss(self, image, ref_images, inv_depths, poses,
+                             intrinsics, return_logs=False, progress=0.0):
+        """
+        Calculates the self-supervised photometric loss.
+
+        Parameters
+        ----------
+        image : torch.Tensor [B,3,H,W]
+            Original image
+        ref_images : list of torch.Tensor [B,3,H,W]
+            Reference images from context
+        inv_depths : torch.Tensor [B,1,H,W]
+            Predicted inverse depth maps from the original image
+        poses : list of Pose
+            List containing predicted poses between original and context images
+        intrinsics : torch.Tensor [B,3,3]
+            Camera intrinsics
+        return_logs : bool
+            True if logs are stored
+        progress :
+            Training progress percentage
+
+        Returns
+        -------
+        output : dict
+            Dictionary containing a "loss" scalar a "metrics" dictionary
+        """
+        return self._photometric_loss(
+            image, ref_images, inv_depths, intrinsics, intrinsics, poses,
+            return_logs=return_logs, progress=progress)
+
+
+    def forward(self, batch, return_logs=False, progress=0.0):
+        """
+        Processes a batch.
+
+        Parameters
+        ----------
+        batch : dict
+            Input batch
+        return_logs : bool
+            True if logs are stored
+        progress :
+            Training progress percentage
+
+        Returns
+        -------
+        output : dict
+            Dictionary containing a "loss" scalar and different metrics and predictions
+            for logging and downstream usage.
+        """
+        # Calculate predicted depth and pose output
+        output = super().forward(batch, return_logs=return_logs)
+        if not self.training:
+            # If not training, no need for self-supervised loss
+            return output
+        else:
+            # Otherwise, calculate self-supervised loss
+            self_sup_output = self.self_supervised_loss(
+                    batch['rgb_original'], batch['rgb_context_original'],
+                    output['inv_depths'], output['poses'], batch['intrinsics'],
+                    return_logs=return_logs, progress=progress)
+            # Return loss and metrics
+            return {
+                'loss': self_sup_output['loss'],
+                **merge_outputs(output, self_sup_output),
+            }