Skip to content

Latest commit

 

History

History
190 lines (134 loc) · 10.3 KB

README.md

File metadata and controls

190 lines (134 loc) · 10.3 KB

An Empirical Study of Remote Sensing Pretraining

Updates | Introduction | Usage | Results & Models | Statement |

Current applications

Scene Recognition: Please see Usage for a quick start;

Sementic Segmentation: Please see Remote Sensing Pretraining for Semantic Segmentation;

Object Detection: Please see Remote Sensing Pretraining for Object Detection;

Change Detection: Please see Remote Sensing Pretraining for Change Detection;

ViTAE: Please see ViTAE-Transformer;

Matting: Please see ViTAE-Transformer for matting;

Updates

011/04/2022

The baiduyun links of scene recognition models are provided.

07/04/2022

The paper is post on arxiv!

06/04/2022

The pretrained models for ResNet-50, Swin-T and ViTAEv2-S are released. The code for pretraining and scene recognition task are also provided for reference.

Introduction

This repository contains codes, models and test results for the paper "An Empirical Study of Remote Sensing Pretraining".

The aerial images are usually obtained by a camera in a birdview perspective lying on the planes or satellites, perceiving a large scope of land uses and land covers, whose scene is usually difficult to be interpreted since the interference of the scene-irrelevant regions and the complicated spatial distribution of land objects. Although deep learning has largely reshaped remote sensing research for aerial image understanding and made a great success. However, most of existing deep models are initialized with ImageNet pretrained weights, where the natural images inevitably presents a large domain gap relative to the aerial images, probably limiting the finetuning performance on downstream aerial scene tasks. This issue motivates us to conduct an empirical study of remote sensing pretraining (RSP). To this end, we train different networks from scratch with the help of the largest remote sensing scene recognition dataset up to now-MillionAID, to obtain the remote sensing pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of ImageNet pretraining (IMP) and RSP on a series of downstream tasks including #scene recognition#, semantic segmentation, object detection, and change detection using the CNN and vision transformers backbones.

Results and Models

UCM (8:2)

Backbone Input size Acc@1 (μ±σ) Model
RSP-ResNet-50-E300 224 × 224 99.48 ± 0.10 google & baidu
RSP-Swin-T-E300 224 × 224 99.52 ± 0.00 google & baidu
RSP-ViTAEv2-S-E100 224 × 224 99.90 ± 0.13 google & baidu

AID (2:8)

Backbone Input size Acc@1 (μ±σ) Model
RSP-ResNet-50-E300 224 × 224 96.81 ± 0.03 google & baidu
RSP-Swin-T-E300 224 × 224 96.89 ± 0.08 google & baidu
RSP-ViTAEv2-S-E100 224 × 224 96.91 ± 0.06 google & baidu

AID (5:5)

Backbone Input size Acc@1 (μ±σ) Model
RSP-ResNet-50-300 224 × 224 97.89 ± 0.08 google & baidu
RSP-Swin-T-E300 224 × 224 98.30 ± 0.04 google & baidu
RSP-ViTAEv2-S-E100 224 × 224 98.22 ± 0.09 google & baidu

NWPU-RESISC (1:9)

Backbone Input size Acc@1 (μ±σ) Model
RSP-ResNet-50-E300 224 × 224 93.93 ± 0.10 google & baidu
RSP-Swin-T-E300 224 × 224 93.02 ± 0.12 google & baidu
RSP-ViTAEv2-S-E100 224 × 224 94.41 ± 0.11 google & baidu

NWPU-RESISC (2:8)

Backbone Input size Acc@1 (μ±σ) Model
RSP-ResNet-50-E300 224 × 224 95.02 ± 0.06 google & baidu
RSP-Swin-T-E300 224 × 224 94.51 ± 0.05 google & baidu
RSP-ViTAEv2-S-E100 224 × 224 95.60 ± 0.06 google & baidu

Usage

Installation

  1. Create a conda virtual environment and activate it
conda create -n rsp python=3.8 -y
conda activate rsp
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=10.2 -c pytorch
pip install timm==0.4.12
  • Install apex (optional)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  • Install other requirements:
pip install pyyaml yacs pillow
  1. Clone this repo
git clone https://github.com/ViTAE-Transformer/RSP.git

Data Preparation

We use the MillionAID dataset for pretraining, and fine tune the pretrained model on UCM/AID/NWPU-RESISC45 datasets. For each dataset, we firstly merge all images together, and then split them to training and validation sets, where their information are separately recoded in train_label.txt and valid_label.txt. Note we only consider the third-level categories (totally 51 classes) for MillionAID dataset. The form in train_label.txt is exemplified as

P0960374.jpg dry_field 0
P0973343.jpg dry_field 0
P0235595.jpg dry_field 0
P0740591.jpg dry_field 0
P0099281.jpg dry_field 0
P0285964.jpg dry_field 0
...

Here, 0 is the training id of category for corresponded image.

Training

  • For pretraining, take ResNet-50 as an example, training on MillionAID dataset with 4 GPU and 512 batch size
python -m torch.distributed.launch --nproc_per_node 4 --master_port 6666 main.py \
--dataset 'millionAID' --model 'resnet' --exp_num 1 \
--batch-size 128 --epochs 300 --img_size 224 --split 100 \
--lr 5e-4  --weight_decay 0.05 --gpu_num 4 \
--output [model save path]
  • When repeatedly finetuning the pretrained ViTAE model on AID dataset with the setting of (2:8) in 5 times
python -m torch.distributed.launch --nproc_per_node 1 --master_port 7777 main.py \
--dataset 'aid' --model 'vitae_win' --ratio 28 --exp_num 5 \
--batch-size 64 --epochs 200 --img_size 224 --split 1 \
--lr 5e-4  --weight_decay 0.05 --gpu_num 1 \
--output [model save path] \
--pretrained [pretraind vitae path]

Inference

  • Evaluate the existing model
python -m torch.distributed.launch --nproc_per_node 1 --master_port 8888 main.py \
--dataset 'nwpuresisc' --model 'vitae_win' --ratio 28 --exp_num 5 \
--batch-size 64 --epochs 200 --img_size 224 --split 100 \
--lr 5e-4  --weight_decay 0.05 --gpu_num 1 \
--output [log save path] \
--resume [model load path] \
--eval

Note: When pretraining the Swin model, please uncomment _update_config_from_file(config, args.cfg) in config.py, and add

--cfg configs/swin_tiny_patch4_window7_224.yaml

Other Links

Sementic Segmentation: Please see Remote Sensing Pretraining for Semantic Segmentation;

Object Detection: Please see Remote Sensing Pretraining for Object Detection;

Change Detection: Please see Remote Sensing Pretraining for Change Detection;

ViTAE: Please see ViTAE-Transformer;

Matting: Please see ViTAE-Transformer for matting;

Statement

This project is for research purpose only. For any other questions please contact di.wang at gmail.com .

References

The codes of Pretraining & Recognition part mainly from Swin Transformer.