Code for our project "Multimodal Range View Based Semantic Segmentation" of the course "Deep Learning for 3D Perception" at the Technical University of Munich under supervision of Prof. Angela Dai.
Download SemanticKITTI from their official website.
-
Lidar backbone with Range Augmentations (RA):
- 512 x 64 range-view (RV) resolution:
python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/cenet_512.yml \ -n cenet_512_RA
- 1024 x 64 RV resolution (retrain from 512 x 64 checkpoint as the authors of CENet recommend):
python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/cenet_1024.yml \ -p /path/to/cenet_512_RA -n cenet_1024_RA
- 512 x 64 range-view (RV) resolution:
-
RGB backbone fine-tuning on SemanticKITTI dataset with range-view labels:
- for usage with 512 x 64 model:
python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/mask2former_512.yml \ -n mask2former_512
- for usage with 1024 x 64 model:
python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/mask2former_1024.yml \ -n mask2former_1024
- for usage with 512 x 64 model:
-
Fusion Model:
- 512 x 64 range-view (RV) resolution:
python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/fusion_512.yml \ -n fusion_512
- 1024 x 64 RV resolution:
python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/fusion_1024.yml \ -n fusion_1024
- 512 x 64 range-view (RV) resolution:
-
Infer:
python infer.py -d /path/to/SemanticKITTI/dataset -l /path/to/save/predictions/in \ -m path/to/trained_model
-
Evalulation:
- Lidar and fusion models:
python evaluate_iou.py -d /path/to/SemanticKITTI/dataset -p /path/to/predictions
- RGB models:
python evaluate_iou_rgb.py -d /path/to/SemanticKITTI/dataset -p /path/to/predictions
- Lidar and fusion models:
-
Visualize GT:
python visualize.py -w kitti -d /path/to/SemanticKITTI/dataset -s which_sequences
-
Visualize Predictions:
python visualize.py -w kitti -d /path/to/SemanticKITTI/dataset -p /path/to/predictions \ -s which_sequences
Our pre-trained models can be found here.
Our codebase originates from CENet. For the fusion model we use code from SwinFusion, while we follow the Hugging Face implementation of Mask2Former as RGB backbone. For initialization, we utilize the pre-trained Mask2Former models trained on the Cityscapes dataset for semantic segmentation.