A comparative study of Capsule Networks, Vision Transformers, and hybrid architectures for CIFAR-100 image classification.
This repository contains implementations and experiments comparing different deep learning architectures:
- CNN Baseline: Standard convolutional neural network baseline
- Vision Transformer (ViT): Transformer-based image classification model
- CapsNet: Capsule Network implementation with dynamic routing
- CapsViT Hybrid: Novel hybrid architecture combining Capsule Networks with Vision Transformer features
0. CNN_Baseline/- Baseline CNN implementation1. vit_run/- Vision Transformer training code and results2. CapsNet/- Capsule Network baseline implementation3. capsvit/- Hybrid CapsNet-ViT architecturedata/- CIFAR-100 datasetEvaluation/- Model evaluation scripts and metrics
- PyTorch
- torchvision
- einops
- GPUtil
- numpy
Each model can be trained independently using the Python scripts in their respective directories. Models are trained on CIFAR-100 dataset with 100 classes.
Results and trained models are stored in the respective output directories for each architecture.