Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhance] Mask2Former Instance Segm Only #7571

Merged
merged 49 commits into from
May 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
4b5b997
Mask2Former/MaskFormer instance only training/eval
PeterVennerstrom Mar 29, 2022
b1e03bb
obsolete config names
PeterVennerstrom Mar 29, 2022
9f6de5f
if cond is None fix
PeterVennerstrom Mar 29, 2022
fb92360
white space
PeterVennerstrom Mar 29, 2022
1082de9
fix tests
PeterVennerstrom Mar 29, 2022
db10161
yapf formatting fix
PeterVennerstrom Mar 29, 2022
e509281
semantic_seg None docstring
PeterVennerstrom Mar 31, 2022
ea80145
original config names
PeterVennerstrom Mar 31, 2022
5ff7bb6
pan/ins unit test
PeterVennerstrom Mar 31, 2022
d06bf2e
show_result comment
PeterVennerstrom Mar 31, 2022
066d2b7
pan/ins head unit test
PeterVennerstrom Mar 31, 2022
acca83a
redundant test
PeterVennerstrom Mar 31, 2022
c755f4f
inherit configs
PeterVennerstrom Mar 31, 2022
99dfe4a
correct gpu #
PeterVennerstrom Mar 31, 2022
8df82f0
revert version
PeterVennerstrom Apr 1, 2022
30ee9d7
BaseDetector.show_result comment
PeterVennerstrom Apr 1, 2022
3925920
revert more versions
PeterVennerstrom Apr 1, 2022
ca5b67f
clarify comment
PeterVennerstrom Apr 1, 2022
a52285c
clarify comment
PeterVennerstrom Apr 1, 2022
3964689
add FilterAnnotations to data pipeline
PeterVennerstrom Apr 1, 2022
f46ee3e
more complete Returns docstring
PeterVennerstrom Apr 1, 2022
abb13f3
use pytest.mark.parametrize decorator
PeterVennerstrom Apr 1, 2022
390f7f5
fix docstring formatting
PeterVennerstrom Apr 1, 2022
d7f61ee
lint
PeterVennerstrom Apr 2, 2022
de7d48c
Include instances passing mask area test
PeterVennerstrom Apr 11, 2022
0d49650
Make FilterAnnotations generic for masks or bboxes
PeterVennerstrom Apr 18, 2022
3efadfb
Duplicate assertion
PeterVennerstrom Apr 18, 2022
62da928
Add pad config
PeterVennerstrom Apr 19, 2022
556593d
Less hard coded padding setting
PeterVennerstrom Apr 19, 2022
e09562b
Clarify test arguments
PeterVennerstrom Apr 20, 2022
c84ad22
Additional inst_seg configs
PeterVennerstrom Apr 25, 2022
83cc338
delete configs
PeterVennerstrom May 23, 2022
792f38f
Include original dev branch configs
PeterVennerstrom May 23, 2022
235061e
Fix indent
PeterVennerstrom May 25, 2022
cddde11
fix lint error from merge conflict
PeterVennerstrom May 26, 2022
5e45ea6
Update .pre-commit-config.yaml
chhluo May 27, 2022
db0b039
Rename mask2former_r50_lsj_8x2_50e_coco.py to mask2former_r50_lsj_8x2…
chhluo May 27, 2022
025c7af
Update and rename mask2former_r101_lsj_8x2_50e_coco.py to mask2former…
chhluo May 27, 2022
d201566
Update and rename mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coc…
chhluo May 27, 2022
bb96ebb
Update and rename mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco.py t…
chhluo May 27, 2022
135ba25
Update and rename mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_c…
chhluo May 27, 2022
4cc6e42
Update and rename mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py to…
chhluo May 27, 2022
2ff2d09
Update and rename mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py to…
chhluo May 27, 2022
8b93b11
Create mask2former_r50_lsj_8x2_50e_coco.py
chhluo May 27, 2022
e01bf9e
Create mask2former_r101_lsj_8x2_50e_coco.py
chhluo May 27, 2022
c34cebe
Create mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py
chhluo May 27, 2022
7ce5c08
Create mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py
chhluo May 27, 2022
487a8ac
Update test_forward.py
chhluo May 27, 2022
6ab15c2
remove gt_sem_seg
chhluo May 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
_base_ = './mask2former_r50_lsj_8x2_50e_coco-panoptic.py'

model = dict(
backbone=dict(
depth=101,
init_cfg=dict(type='Pretrained',
checkpoint='torchvision://resnet101')))
2 changes: 1 addition & 1 deletion configs/mask2former/mask2former_r101_lsj_8x2_50e_coco.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_base_ = './mask2former_r50_lsj_8x2_50e_coco.py'
_base_ = ['./mask2former_r50_lsj_8x2_50e_coco.py']

model = dict(
backbone=dict(
Expand Down
253 changes: 253 additions & 0 deletions configs/mask2former/mask2former_r50_lsj_8x2_50e_coco-panoptic.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
_base_ = [
'../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py'
]
num_things_classes = 80
num_stuff_classes = 53
num_classes = num_things_classes + num_stuff_classes
model = dict(
type='Mask2Former',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
panoptic_head=dict(
type='Mask2FormerHead',
in_channels=[256, 512, 1024, 2048], # pass to pixel_decoder inside
strides=[4, 8, 16, 32],
feat_channels=256,
out_channels=256,
num_things_classes=num_things_classes,
num_stuff_classes=num_stuff_classes,
num_queries=100,
num_transformer_feat_level=3,
pixel_decoder=dict(
type='MSDeformAttnPixelDecoder',
num_outs=3,
norm_cfg=dict(type='GN', num_groups=32),
act_cfg=dict(type='ReLU'),
encoder=dict(
type='DetrTransformerEncoder',
num_layers=6,
transformerlayers=dict(
type='BaseTransformerLayer',
attn_cfgs=dict(
type='MultiScaleDeformableAttention',
embed_dims=256,
num_heads=8,
num_levels=3,
num_points=4,
im2col_step=64,
dropout=0.0,
batch_first=False,
norm_cfg=None,
init_cfg=None),
ffn_cfgs=dict(
type='FFN',
embed_dims=256,
feedforward_channels=1024,
num_fcs=2,
ffn_drop=0.0,
act_cfg=dict(type='ReLU', inplace=True)),
operation_order=('self_attn', 'norm', 'ffn', 'norm')),
init_cfg=None),
positional_encoding=dict(
type='SinePositionalEncoding', num_feats=128, normalize=True),
init_cfg=None),
enforce_decoder_input_project=False,
positional_encoding=dict(
type='SinePositionalEncoding', num_feats=128, normalize=True),
transformer_decoder=dict(
type='DetrTransformerDecoder',
return_intermediate=True,
num_layers=9,
transformerlayers=dict(
type='DetrTransformerDecoderLayer',
attn_cfgs=dict(
type='MultiheadAttention',
embed_dims=256,
num_heads=8,
attn_drop=0.0,
proj_drop=0.0,
dropout_layer=None,
batch_first=False),
ffn_cfgs=dict(
embed_dims=256,
feedforward_channels=2048,
num_fcs=2,
act_cfg=dict(type='ReLU', inplace=True),
ffn_drop=0.0,
dropout_layer=None,
add_identity=True),
feedforward_channels=2048,
operation_order=('cross_attn', 'norm', 'self_attn', 'norm',
'ffn', 'norm')),
init_cfg=None),
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=2.0,
reduction='mean',
class_weight=[1.0] * num_classes + [0.1]),
loss_mask=dict(
type='CrossEntropyLoss',
use_sigmoid=True,
reduction='mean',
loss_weight=5.0),
loss_dice=dict(
type='DiceLoss',
use_sigmoid=True,
activate=True,
reduction='mean',
naive_dice=True,
eps=1.0,
loss_weight=5.0)),
panoptic_fusion_head=dict(
type='MaskFormerFusionHead',
num_things_classes=num_things_classes,
num_stuff_classes=num_stuff_classes,
loss_panoptic=None,
init_cfg=None),
train_cfg=dict(
num_points=12544,
oversample_ratio=3.0,
importance_sample_ratio=0.75,
assigner=dict(
type='MaskHungarianAssigner',
cls_cost=dict(type='ClassificationCost', weight=2.0),
mask_cost=dict(
type='CrossEntropyLossCost', weight=5.0, use_sigmoid=True),
dice_cost=dict(
type='DiceCost', weight=5.0, pred_act=True, eps=1.0)),
sampler=dict(type='MaskPseudoSampler')),
test_cfg=dict(
panoptic_on=True,
# For now, the dataset does not support
# evaluating semantic segmentation metric.
semantic_on=False,
instance_on=True,
# max_per_image is for instance segmentation.
max_per_image=100,
iou_thr=0.8,
# In Mask2Former's panoptic postprocessing,
# it will filter mask area where score is less than 0.5 .
filter_low_score=True),
init_cfg=None)

# dataset settings
image_size = (1024, 1024)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile', to_float32=True),
dict(
type='LoadPanopticAnnotations',
with_bbox=True,
with_mask=True,
with_seg=True),
dict(type='RandomFlip', flip_ratio=0.5),
# large scale jittering
dict(
type='Resize',
img_scale=image_size,
ratio_range=(0.1, 2.0),
multiscale_mode='range',
keep_ratio=True),
dict(
type='RandomCrop',
crop_size=image_size,
crop_type='absolute',
recompute_bbox=True,
allow_negative_crop=True),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=image_size),
dict(type='DefaultFormatBundle', img_to_float=True),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data_root = 'data/coco/'
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(pipeline=train_pipeline),
val=dict(
pipeline=test_pipeline,
ins_ann_file=data_root + 'annotations/instances_val2017.json',
),
test=dict(
pipeline=test_pipeline,
ins_ann_file=data_root + 'annotations/instances_val2017.json',
))

embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
# optimizer
optimizer = dict(
type='AdamW',
lr=0.0001,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=dict(
custom_keys={
'backbone': dict(lr_mult=0.1, decay_mult=1.0),
'query_embed': embed_multi,
'query_feat': embed_multi,
'level_embed': embed_multi,
},
norm_decay_mult=0.0))
optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))

# learning policy
lr_config = dict(
policy='step',
gamma=0.1,
by_epoch=False,
step=[327778, 355092],
warmup='linear',
warmup_by_epoch=False,
warmup_ratio=1.0, # no warmup
warmup_iters=10)

max_iters = 368750
runner = dict(type='IterBasedRunner', max_iters=max_iters)

log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', by_epoch=False),
dict(type='TensorboardLoggerHook', by_epoch=False)
])
interval = 5000
workflow = [('train', interval)]
checkpoint_config = dict(
by_epoch=False, interval=interval, save_last=True, max_keep_ckpts=3)

# Before 365001th iteration, we do evaluation every 5000 iterations.
# After 365000th iteration, we do evaluation every 368750 iterations,
# which means that we do evaluation at the end of training.
dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]
evaluation = dict(
interval=interval,
dynamic_intervals=dynamic_intervals,
metric=['PQ', 'bbox', 'segm'])
Loading