Unofficial PyTorch Implementation of Exploring Plain Vision Transformer Backbones for Object Detection
Results | Updates | Usage | Todo | Acknowledge
This branch contains the unofficial pytorch implementation of Exploring Plain Vision Transformer Backbones for Object Detection. Thanks for their wonderful work!
The models are trained on 4 A100 machines with 2 images per gpu, which makes a batch size of 64 during training.
Model | Pretrain | Machine | FrameWork | Box mAP | Mask mAP | config | log | weight |
---|---|---|---|---|---|---|---|---|
ViT-Base | IN1K+MAE | TPU | Mask RCNN | 51.1 | 45.5 | config | log | OneDrive |
ViT-Base | IN1K+MAE | GPU | Mask RCNN | 51.1 | 45.4 | config | log | OneDrive |
ViTAE-Base | IN1K+MAE | GPU | Mask RCNN | 51.6 | 45.8 | config | log | OneDrive |
ViTAE-Small | IN1K+Sup | GPU | Mask RCNN | 45.6 | 40.1 | config | log | OneDrive |
[2022-04-18] Explore using small 1K supervised trained models (20M parameters) for ViTDet (45.6 mAP). The results with multi-stage structure is 46.0 mAP for Swin-T and 47.8 mAP for ViTAEv2-S with Mask RCNN on COCO.
[2022-04-17] Release the pretrained weights and logs for ViT-B and ViTAE-B on MS COCO. The models are totally trained with PyTorch on GPU.
[2022-04-16] Release the initial unofficial implementation of ViTDet with ViT-Base model! It obtains 51.1 mAP and 45.5 mAP on detection and segmentation, respectively. The weights and logs will be uploaded soon.
Applications of ViTAE Transformer include: image classification | object detection | semantic segmentation | animal pose segmentation | remote sensing | matting
We use PyTorch 1.9.0 or NGC docker 21.06, and mmcv 1.3.9 for the experiments.
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v1.3.9
MMCV_WITH_OPS=1 pip install -e .
cd ..
git clone https://github.com/ViTAE-Transformer/ViTDet.git
cd ViTDet
pip install -v -e .
After install the two repos, install timm and einops, i.e.,
pip install timm==0.4.9 einops
Download the pretrained models from MAE or ViTAE, and then conduct the experiments by
# for single machine
bash tools/dist_train.sh <Config PATH> <NUM GPUs