Try to train fast (grouped-conv) versions of csdarknet53 and csdarknet19 #6

AlexeyAB · 2020-01-04T13:02:23Z

Since CSPDarkNet53 is better than CSPResNeXt50 for Detector, try to train these 4 models:

Model	GPU	256x256	512x512	608x608
darknet53.cfg (original)	RTX 2070	113	56	38
csdarknet53.cfg (original)	RTX 2070	101	57	41
csdarknet53g.cfg.txt	RTX 2070	122	64	46
csdarknet53ghr.cfg.txt	RTX 2070	100	75	57
spinenet49.cfg.txt low priority	RTX 2070	49	44	43
csdarknet19-fast.cfg.txt	RTX 2070	213	149	116

csdarknet19-fast.cfg contains DropBlock, so use the latest version of Darknet that uses fast random-functions for DropBlock.

The text was updated successfully, but these errors were encountered:

WongKinYiu · 2020-01-04T13:18:30Z

@AlexeyAB Thanks,

I will get free gpus after finish training local_avgpool models.

AlexeyAB · 2020-01-30T16:18:01Z

@WongKinYiu Hi,

It seems that the (Mosaic/Smooth/Mish) affects the model CSPResNeXt-50 better than the models CSPResNet-50 and CSPDarknet-53 for Classifier 256x256, because CSPResNeXt-50 has more outputs of each layer https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/imagenet/results.md
But CSPDarkNet53-PANet-SPP is better than CSPResNeXt50-PANet-SPP and CSPResNet50-PANet-SPP for Detector 512x512, because CSPDarkNet53-PANet-SPP has more parameters and layers
CSPNet - New models and the most comprehensive comparison of detection models AlexeyAB/darknet#4406 (comment)

So you should combine two networks with more layers and more parameters (CSPDarknet-53) + more outputs (CSPResNeXt-50)

It seems that the model CSPResNeXt50 has higher Top1/Top5 because it has more outputs for each layer out_w * out_h * out_c (i.e. it has higher filters= in [conv] layers).
258 291 (1.1x) CSPresnext50 / 233 348 (1.0x) CSPDarknet53.
It seems that although a small number of parameters ~21M (CSPResNet50/CSPResNext50) = [conv] groups = 32 and 84 layers are sufficient for the 256x256 Classifier, but a much more number of parameters ~27M (CSPDarkNet53) [conv] groups = 1 and 108 layers is needed for the 512x512 Detector.

Suggestion:

Either increase number of filters in CSPDarknet53 (and also increase groups= from 1 to 2 - 8 for layers with high value of filters)
or increase number of layers in CSPresnext50 (and also decrease groups= from 32 to 1 - 8)

Model	Num of layers	G r o u p s	Para meters	average outputs = `out_w X out_h X out_c`	RTX 2070 FPS	b f l o p s	Top1 / Top5	Top1 / Top5 (mosaic + label smooth + mish)	AP (Detector) 512x512
CSP DarkNet53 256x256	108	1	27	233 348 (1.0x)	125	13	77.2% / 93.6%	78.7% / 94.8%	38.7 %
CSP ResNeXt50 256x256	84	32	20	258 291 (1.1x)	72	8	77.9% / 94.0%	79.8% / 95.2%	38.0 %
CSP ResNet50 256x256	84	1	21	203 665 (0.87x)	168	9	76.6% / 93.3%	78.1% / 94.2%	38.0 %

Did you try to train with DropBlock, does it work well?

WongKinYiu · 2020-01-30T23:53:05Z

@AlexeyAB

Yes, If I change output channel of CSPResNet50 and CSPDarknet53 to 2048, I think it can achieve better results but with large amount of computation.

Do you need an ImageNet pre-trained model which has more layers + more parameters + more outputs ? If yes, I can train a model. Or if you have a cfg file, I will get 2 free gpus tomorrow for training it.

The DropBlock models are still training. Currently, the models get a little bit lower accuracy then the models without DropBlock at same epoch. But it may because DropBlock need more epochs to get converge.

WongKinYiu · 2020-02-08T03:34:18Z

@AlexeyAB Hello,

The model with DropBlock gets lower accuracy than without it (79.8 vs 79.1).
I think we need follow what EfficientNet do - reduce drop probability during training.

AlexeyAB · 2020-02-08T11:29:52Z

@WongKinYiu Hi,

This is already done: https://github.com/AlexeyAB/darknet/blob/d51d89053afc4b7f50a30ace7b2fcf1b2ddd7598/src/dropout_layer_kernels.cu#L28-L31

May be we should increase drop probability during whole training instead of half the training process
Or may be drop-block requires more parameters in the model, since drop-block/out/connect divides the model into an ensemble of many models, each of which turns out to be too small

So we should try:

new models more layers + more parameters + more outputs
fast cuda-implementation of drop-block

AlexeyAB · 2020-02-10T23:17:52Z

@WongKinYiu Hi,

Do you need an ImageNet pre-trained model which has more layers + more parameters + more outputs ? If yes, I can train a model. Or if you have a cfg file, I will get 2 free gpus tomorrow for training it.

Try to train please these 2 models - both use: MISH + mosaic=1 cutmix=1 label_smooth_eps=0.1 + reduced groups= for faster inference

csresnext50morelayers.cfg.txt - added more layers between 1st and 2nd subsamling
csresnext50sub.cfg.txt - added more layers between 1st and 2nd subsamling, and concatenated 2 subsamlings [conv] stride=2 and [maxpool] stride=2

Also did you try to train CSPResNeXt-50+Elastic with MISH + mosaic=1 cutmix=1 label_smooth_eps=0.1 ?

Also did you try to train spinenet49.cfg.txt ?

WongKinYiu · 2020-02-10T23:28:31Z

OK, will train these two models.

No, the inference speed of CSPResNeXt-50 with Elastic is too slow.
I think it can not run real time object detection.

AlexeyAB · 2020-02-10T23:53:24Z

@WongKinYiu

All with MISH-activation and 608x608 network resolution on GeForce RTX 2070:

csresnext50.cfg - 51.2 FPS
csdarknet53.cfg - 53.6 FPS
csresnext50morelayers.cfg.txt - 44.0 FPS
csresnext50sub.cfg.txt - 43.3 FPS
spinenet49.cfg - 43.2 FPS (640x640 network resolution)
elastic-csresnext50.cfg - 34.7 FPS (576x576 network resolution)

So it may make sense to train the model spinenet49.cfg with MISH + mosaic=1 cutmix=1 label_smooth_eps=0.1 spinenet49.cfg.txt

WongKinYiu · 2020-02-11T00:29:39Z

@AlexeyAB

I have only two free gpus currently, will train csresnext50morelayers and spinenet49 first.

WongKinYiu · 2020-03-02T13:59:02Z

@AlexeyAB

Model	CutMix	Mosaic	Label Smoothing	Mish	Top-1	Top-5
SpineNet-49	✔️	✔️	✔️	✔️	78.3%	94.6%

AlexeyAB · 2020-03-02T14:40:57Z

@WongKinYiu Thanks!

So SpineNet-49 is worse than csdarknet53 and csresnext50 at least for ImageNet

spinenet49.cfg.txt - 43.2 FPS - 78.3% | 94.6%
csdarknet53.cfg - 53.6 FPS - 78.7% | 94.8%
csresnext50.cfg - 51.2 FPS - 79.8% | 95.2%

Also I fixed label_smoothing for Detector (not for Classifier) AlexeyAB/darknet@81290b0
in such a way as there: https://github.com/david8862/keras-YOLOv3-model-set/blob/6cc297434e0604e2f6c34a8a2557b342468f083a/yolo3/loss.py#L225-L227
For such a probability transformation http://fooplot.com/#W3sidHlwZSI6MCwiZXEiOiJ4KjAuOSswLjA1IiwiY29sb3IiOiIjMDAwMDAwIn0seyJ0eXBlIjoxMDAwLCJ3aW5kb3ciOlsiMCIsIjEiLCIwIiwiMSJdfV0-

So you can try to train Detector with new label_smoothing.

Usage

[yolo]
label_smooth_eps=0.1

for each [yolo] layer

Since old label_smoothing worked well for the Classifier, but worked bad for the Detector.

WongKinYiu · 2020-03-02T15:00:37Z

The results in original paper.

CSPDarkNet-53 has more parameters and FLOPs.

AlexeyAB · 2020-03-02T15:21:34Z

Yes, SpineNet-49 has fewer params and flops, but CSPDarkNet-53 faster and more accurate for Classifier.
But may be SpineNet-49 more accurate for Detector.

WongKinYiu · 2020-03-07T23:57:53Z

@AlexeyAB

Model	CutMix	Mosaic	Label Smoothing	Mish	Top-1	Top-5
CSPResNeXt-50-morelayers	✔️	✔️	✔️	✔️	79.4%	95.2%

AlexeyAB · 2020-03-08T00:32:05Z

@WongKinYiu Thanks! Do you mean csresnext50morelayers.cfg or CSPDarkNet-53-morelayers? #6 (comment)

WongKinYiu · 2020-03-08T00:58:45Z

@AlexeyAB Oh, sorry, it is csresnext50morelayers.cfg.

AlexeyAB · 2020-03-08T01:16:02Z

@WongKinYiu
So csresnext50morelayers.cfg is worse than csresnext50.cfg (Top1 79.4% vs 79.8%) on ImageNet. https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/imagenet/results.md

But I think csresnext50morelayers.cfg will be better as backbone for Detector.

WongKinYiu · 2020-03-08T01:33:16Z

@AlexeyAB

Yes, csresnext50 performs better on ImageNet.

I will get a free gpu after about 4 days.
However, currently I do not have results of backbone with mish activation on MSCOCO, could you help for designing the cfg for detector with csresnext50morelayers backbone?

Thanks.

AlexeyAB · 2020-03-08T01:50:30Z

@WongKinYiu
Ok, I can make 2 cfg-files, with [net] mosaic=1 dynamic_minibatch=1 and mish-activation:

csresnext50morelayers + SPP_PAN
csresnext50morelayers + SPP+ASFF+BiFPN

Will we try to test new label_smoothing for Detector?
When will the CBN, DropBlock, ASFF and BiFPN model training end approximately?

WongKinYiu · 2020-03-08T02:37:38Z

@AlexeyAB

for classifier,
cbn will finish in one week,
cbn+dropblock still very slow, i think it need more than one month to finish training.

for detector,
rfb+bn need about two weeks,
cbn need about two weeks,
bifpn need about three to four weeks, but the training may stop several days or weeks,
asff not yet start.

i will also do ablation study for dynamic_minibatch and new label_smoothing.

glenn-jocher · 2020-03-08T02:43:10Z

@AlexeyAB @WongKinYiu have you had any success with label smoothing? I just learned about it recently, but was confused about a few things:

Can it be applied in both classification models and object detection models?
Is it always applied to both negative samples (i.e. 0.1) and positive samples (i.e. 0.9) , or could it be applied only to negatives etc?
For object detection, should it be applied to both objectness loss and classification loss?
Can it be applied to either CEloss and BCEloss criteria?

WongKinYiu · 2020-03-08T03:03:02Z

@glenn-jocher

Yes, it can be applied on classification head of detectors - BoF.
I think both okay, because it can be applied on both of YOLOv3 and FasterRCNN.
In BoF paper, it seems only applied on classification head.
I think yes, the paper mentioned "In the case of sigmoid outputs of range 0 to 1.0 as in YOLOv3 [16], label smoothing is even simpler by correcting the upper and lower limit of the range of targets as in Eq. 3."

But unfortunately, all of mixup, cosine lr, and label smoothing get worse results in my experiments.

glenn-jocher · 2020-03-08T05:51:29Z

@WongKinYiu ah thanks, that's super informative!

That solves a big mystery for me then. I tried to apply it to both obj loss and class loss at the same time, and it destroyed my NMS because every anchor single was above threshold (of 0.001).

I implemented cosine lr scheduler a couple weeks ago, it worked well (+0.3 mAP) though I noticed it worked better if I raised the initial LR. Before with the traditional step scheduler I was using about lr0=0.006, now with the cosine scheduler I use lr0=0.010 to get that +0.3 increase on COCO.

Name	[email protected]	[email protected]	Comments
(288-640)-608 to 273 bs16a4 yolov3-spp.cfg	61.6	41.6	step lr
(288-640)-608 to 273 bs16a4 yolov3-spp.cfg	61.8	41.9	cos lr0=0.01

glenn-jocher · 2020-03-08T05:59:15Z

@WongKinYiu see ultralytics/yolov3#238 (comment) for the cosine scheduler implementation. These are the training plots for the two runs (step and cos lr). Interestingly the val losses are better at the end with step, and you can see cos obj loss is starting to overtrain at the end, but the cos final mAP is still slightly higher. I'm not quite sure what that means.

glenn-jocher · 2020-03-08T06:15:36Z

@WongKinYiu do you know what the value of epsilon should be in eqn3 of the BoF paper? If I assume epsilon=0.1 the classification target values (after a sigmoid) would be

positive: (1 - 0.1) = 0.9
negative: 0.1/(80-1) = 0.0013

Does that seem right??

WongKinYiu · 2020-03-08T06:29:36Z

@glenn-jocher
https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/yolo/yolo_target.py#L268-L273

glenn-jocher · 2020-03-08T06:40:16Z

In their case they seem to be using epsilon as smooth_weight, with a constraint to keep it getting too large if the class count is low. Ok, I'll start from there.
smooth_weight = min(1. / self._num_class, 1. / 40)

WongKinYiu · 2020-03-08T06:46:40Z

It seems only YOLOv3 can apply label smooth.
All of SSD, CenterNet, FasterRCNN, MaskRCNN do not have label smooth function.

AlexeyAB · 2020-03-19T10:11:26Z

@WongKinYiu

update: still gets all zero iou.

After how many iterations?

Try to train without CBN. I noticed that CBN worse accuracy on most of my models.

I train this cfg-file for 2300 iterations on MS COCO and don't get iou=0 or Nan loss: csresnext50sub-spp-asff-bifpn-rfb-db.cfg.txt (just to know there is max_batches = 50050 steps=40000,45000 instead of max_batches = 500500 steps=400000,450000)

label_smooth_eps=0.1, dynamic_minibatch=1, mosaic=1, BiFPN, ASFF, RFB, DropBlock - do not cause problems.

WongKinYiu · 2020-03-19T10:20:12Z

about 40 iterations, now i change 608/64/64 back to 416/64/32 and still performs normal at 1500 iterations.

update: becomes all zero at 3xxx iterations.

AlexeyAB · 2020-03-19T10:29:33Z

@WongKinYiu Nice! Do you currently train CSResNext50-PANet and CSDarknet53-PANet with Mosaic,Genetic,Mish... which are based on the best of these models https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/imagenet/results.md ?

WongKinYiu · 2020-03-19T11:39:15Z

yes, the training of CSPResNext50-PANet-SPP with csresnext50-gamma.cfg pretrained model will finish in 1~2 weeks.

AlexeyAB · 2020-03-20T20:00:04Z

@WongKinYiu Try to train with 608/64/64 + mosaic=1 dynamic_minibatch=1 label_smooth_eps=0.1 but without CBN, i.e. with batch_normalize=1

I successfully trained such model without Nan or zero-IoU: csresnext50sub-spp-asff-bifpn-rfb-db.cfg.txt

1	2

WongKinYiu · 2020-03-21T02:21:02Z

@AlexeyAB start training.

AlexeyAB · 2020-03-23T11:35:10Z

@WongKinYiu

Does BiFPN+ASFF+RFB+DB training go well without Nan/IoU=0?

WongKinYiu · 2020-03-23T11:44:52Z

@AlexeyAB

i resume training from 2k iterations several times when Nan/IoU=0 occurs, now training go to 7k iterations without Nan/IoU=0.

AlexeyAB · 2020-03-23T12:42:42Z

@WongKinYiu This is strange, since I didn't get Nan/IoU=0 at all.

WongKinYiu · 2020-03-23T13:01:56Z

@AlexeyAB hmm... i get IoU=0 3 times of this cfg, i already test previous cfg on cuda 9.0/10.0/10.1/10.2 before, all of training meet same situation.

AlexeyAB · 2020-03-23T16:10:05Z

@WongKinYiu Maybe this is a temporary phenomenon, which itself will be corrected, and which should not be paid attention to reaching ~10,000 iterations?

syjeon121 · 2020-04-10T05:31:50Z

@AlexeyAB @WongKinYiu hi i want to use cspdarknet53-panet-spp in this repo readme for custom object training

how many layers should i extract from weights file using partial?

WongKinYiu · 2020-04-11T07:23:31Z

partial to here.
https://github.com/AlexeyAB/darknet/blob/master/cfg/cd53paspp-gamma.cfg#L948

sctrueew · 2020-04-11T19:08:29Z

@WongKinYiu Hi,

What pre-trained should I use for CSPDarknet53-PANet-SPP model?

Thanks

WongKinYiu · 2020-04-12T00:16:49Z

Hello,

which cfg do you want to use?
and is your dataset lager than mscoco?

sctrueew · 2020-04-12T03:46:45Z

@WongKinYiu Hi,

which cfg do you want to use?

I already used CSPResNeXt50-PANet-SPP and I got a good result but the training time is high and I am going to use CSPDarknet53-PANet-SPP.

and is your dataset lager than mscoco?
Yes, I have a big dataset, it's' about 300 classes and 1m images. My dataset includes traffic signs.

Which cfg is it good for this case? the accuracy is importance for me.

Thanks

WongKinYiu · 2020-04-12T04:22:52Z

for this case, you can use:

CSPDarknet53-PANet-SPP: 512x512 input/42.4 AP/64.5 AP50
[imagenet pretrained] [coco pretrained]
CSPDarknet53-PANet-SPP(Mish): 512x512 input/43.0 AP/64.9 AP50
[imagenet pretrained] [coco pretrained]

If your dataset is larger than mscoco, you can considerate using imagenet pretrained model (partial 104). If you hope the model converge quickly, you can use mscoco pretrained model (partial 135).

sctrueew · 2020-04-12T04:41:02Z

@WongKinYiu Hi,

My dataset is larger than mscoco. Can I use a 608 network size to get higher accuracy?

Thanks

WongKinYiu · 2020-04-12T05:19:05Z

does your dataset contains many small object?
If yes, training with 608 network size can get higher accuracy definitely.

sctrueew · 2020-04-12T05:27:48Z

@WongKinYiu Hi,

Yes, some objects are small. Where can I download the pre-trained for CSPDarknet53-PANet-SPP(Mish)?

WongKinYiu · 2020-04-12T05:59:50Z

here #6 (comment)

sctrueew · 2020-04-12T06:48:35Z

@WongKinYiu Hi,

Thanks for the reply, are these command right to generate pre-trained?

ImageNet dataset

darknet partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega_final.weights.conv.104 104

Coco dataset

darknet partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega_final.weights.conv.135 135

WongKinYiu · 2020-04-12T06:51:07Z

No, the weights file of imagenet pretrained model is csdarknet53-omega_final.weights.

sctrueew · 2020-04-12T06:52:59Z

@WongKinYiu Hi,

Sorry, I thought that I have to make the pre-trained with the command.

Thanks a lot

sctrueew · 2020-04-19T10:44:46Z

@WongKinYiu Hi,

I have a problem that sometimes some pictures are not detected or detected wrong. I attached my model and some images for testing. Could you please check it and guide me? I have about 2K images per class. Please give me some information about the hyperparameters for my case.

file

Thanks in advance

WongKinYiu · 2020-04-21T05:39:11Z

hello, how u calculate anchors?
could u show object number of each classes?

sctrueew · 2020-04-21T06:09:09Z

@WongKinYiu Hi, Thanks for the reply

Did you test it?

hello, how u calculate anchors?

darknet detector calc_anchors a.obj -num_of_clusters 9 -width 608 -height 608

could u show object number of each classes?

1.txt

please rename 1.txt to 1.zip.

Shraddha767 · 2022-07-23T12:01:56Z

where we can add dropblock in yolov4 cfg??

AlexeyAB mentioned this issue Feb 13, 2020

CSPResNeXt50-PANet-SPP ultralytics/yolov3#698

Closed

glenn-jocher mentioned this issue Mar 8, 2020

LEARNING RATE SCHEDULER ultralytics/yolov3#238

Closed

AlexeyAB mentioned this issue Apr 1, 2020

Implemented weighted-multi_input-[shortcut] layer with weights-normalization AlexeyAB/darknet#4662

Open

sctrueew mentioned this issue Apr 21, 2020

How to improve the model for custom data #21

Open

Try to train fast (grouped-conv) versions of csdarknet53 and csdarknet19 #6

Try to train fast (grouped-conv) versions of csdarknet53 and csdarknet19 #6

Comments

AlexeyAB commented Jan 4, 2020 • edited

WongKinYiu commented Jan 4, 2020

AlexeyAB commented Jan 30, 2020 • edited

WongKinYiu commented Jan 30, 2020

WongKinYiu commented Feb 8, 2020

AlexeyAB commented Feb 8, 2020

AlexeyAB commented Feb 10, 2020 • edited

WongKinYiu commented Feb 10, 2020

AlexeyAB commented Feb 10, 2020 • edited

WongKinYiu commented Feb 11, 2020

WongKinYiu commented Mar 2, 2020

AlexeyAB commented Mar 2, 2020

WongKinYiu commented Mar 2, 2020

AlexeyAB commented Mar 2, 2020

WongKinYiu commented Mar 7, 2020 • edited

AlexeyAB commented Mar 8, 2020

WongKinYiu commented Mar 8, 2020

AlexeyAB commented Mar 8, 2020

WongKinYiu commented Mar 8, 2020

AlexeyAB commented Mar 8, 2020 • edited

WongKinYiu commented Mar 8, 2020

glenn-jocher commented Mar 8, 2020 • edited

WongKinYiu commented Mar 8, 2020

glenn-jocher commented Mar 8, 2020

glenn-jocher commented Mar 8, 2020

glenn-jocher commented Mar 8, 2020 • edited

WongKinYiu commented Mar 8, 2020

glenn-jocher commented Mar 8, 2020

WongKinYiu commented Mar 8, 2020

AlexeyAB commented Mar 19, 2020 • edited

WongKinYiu commented Mar 19, 2020 • edited

AlexeyAB commented Mar 19, 2020

WongKinYiu commented Mar 19, 2020

AlexeyAB commented Mar 20, 2020 • edited

WongKinYiu commented Mar 21, 2020

AlexeyAB commented Mar 23, 2020

WongKinYiu commented Mar 23, 2020

AlexeyAB commented Mar 23, 2020

WongKinYiu commented Mar 23, 2020

AlexeyAB commented Mar 23, 2020

syjeon121 commented Apr 10, 2020

WongKinYiu commented Apr 11, 2020

sctrueew commented Apr 11, 2020

WongKinYiu commented Apr 12, 2020 • edited

sctrueew commented Apr 12, 2020 • edited

WongKinYiu commented Apr 12, 2020 • edited

sctrueew commented Apr 12, 2020

WongKinYiu commented Apr 12, 2020

sctrueew commented Apr 12, 2020

WongKinYiu commented Apr 12, 2020

sctrueew commented Apr 12, 2020

WongKinYiu commented Apr 12, 2020

sctrueew commented Apr 12, 2020

sctrueew commented Apr 19, 2020

WongKinYiu commented Apr 21, 2020

sctrueew commented Apr 21, 2020 • edited

Shraddha767 commented Jul 23, 2022

AlexeyAB commented Jan 4, 2020 •

edited

AlexeyAB commented Jan 30, 2020 •

edited

AlexeyAB commented Feb 10, 2020 •

edited

AlexeyAB commented Feb 10, 2020 •

edited

WongKinYiu commented Mar 7, 2020 •

edited

AlexeyAB commented Mar 8, 2020 •

edited

glenn-jocher commented Mar 8, 2020 •

edited

glenn-jocher commented Mar 8, 2020 •

edited

AlexeyAB commented Mar 19, 2020 •

edited

WongKinYiu commented Mar 19, 2020 •

edited

AlexeyAB commented Mar 20, 2020 •

edited

WongKinYiu commented Apr 12, 2020 •

edited

sctrueew commented Apr 12, 2020 •

edited

WongKinYiu commented Apr 12, 2020 •

edited

sctrueew commented Apr 21, 2020 •

edited