Proper Initialization in Conv2d() biases into YOLO Layers #460

glenn-jocher · 2019-08-18T22:46:42Z

I discovered a trick we have not exploited yet on review of the Focal Loss paper section 3.3:

3.3. Class Imbalance and Model Initialization
Binary classification models are by default initialized to
have equal probability of outputting either y = −1 or 1.
Under such an initialization, in the presence of class imbalance, the loss due to the frequent class can dominate total
loss and cause instability in early training. To counter this,
we introduce the concept of a ‘prior’ for the value of p estimated by the model for the rare class (foreground) at the
start of training. We denote the prior by π and set it so that
the model’s estimated p for examples of the rare class is low,
e.g. 0.01. We note that this is a change in model initialization (see §4.1) and not of the loss function. We found this
to improve training stability for both the cross entropy and
focal loss in the case of heavy class imbalance.

This involves initializing the bias terms of the Conv2d() modules directly preceding the YOLO layers for neurons involved in detection and classification (but not regression). A bias of -5 correspond roughly to a 0.01 probability (i.e. torch.sigmoid(-5) = 0.01, so in this manner all detection neurons will output null detections in the first few batches, which is what we want. The default initialization is a Gaussian about 0, which means half of all neurons will report detections in the first batch (i.e. 5000 detections per image), massively impacting the gradient and causing instabilities.

A test implementation showed improvements across all aspects after implementing this change on coco_16img.data, including final mAP, final P and R, and final F1. Oscillations in losses are reduced substantially and results are generally smoother and better behaved. I need to formalize the implementation and will hopefully commit this change soon.

glenn-jocher · 2019-08-19T00:44:26Z

Repo has been updated! New training results for coco_16img and coco_64img:

ZhxJia · 2019-09-07T08:46:08Z

what's the purpose of prebias?

glenn-jocher · 2019-09-08T19:03:29Z

@ZhxJia to initialize the biases on the inputs to the yolo layers.

glenn-jocher · 2019-09-08T22:09:27Z

@ZhxJia the Focal Loss paper section 3.3 detailed above contains good information on the topic. Basically the logits input to the objectness neurons should be highly negative on initialization, i.e. -5 to -10, since most neurons with not detect anything in a given picture. PyTorch default bias initialization is centered about zero, so the --prebias argument in train.py freezes the entire network, and aggressively trains these biases for one epoch before starting training. It improves performance in the first few epochs, though the long term benefit is unclear (i.e. after 100 epochs).

lyrgwlr · 2019-11-30T07:27:52Z

@glenn-jocher
If I use yolov3-spp.weights as my initial weights for my custom data, is the bias initialzation proper? Cause the initial weights is not "PyTorch default bias initialization".

I think these lines should be commented in models.py if yolov3-spp.weights is loaded for the initial weights:

bias = module_list[-1][0].bias.view(len(mask), -1)  # 255 to 3x85
bias[:, 4] += b[0] - bias[:, 4].mean()  # obj
bias[:, 5:] += b[1] - bias[:, 5:].mean()  # cls
module_list[-1][0].bias = torch.nn.Parameter(bias.view(-1))

What do you think about it?

glenn-jocher · 2019-11-30T23:08:18Z

@lyrgwlr yes, if a fully trained model like yolov3-spp.weights is loaded before training, then these lines can be commented.

In some cases, such as --weights yolov3-spp.pt --cfg yolov3-spp-1cls.cfg these fully trained models serve as a backbone to other models that may have different shape output layers, in which case only the layers with the exact same sizes and names are loaded, and the above lines should stay, as they are still biasing randomly initialized layers.

The idea of course is to bias the obj and cls biases more negatively, as those neurons very rarely contain an object, or the correct class.

glenn-jocher · 2019-11-30T23:14:26Z

@lyrgwlr ah I completely forgot: the lines in models.py that you highlighted run when the model is initialized, which must always happen.

If --weights are specified, then these are loaded afterwards, replacing the initialized model's parameter values. So the lines above should never be commented, it would serve no purpose.

To run the model with randomly initialized weights use --weights ''.

lyrgwlr · 2019-12-01T02:54:58Z

@glenn-jocher
You're right. I have an other doubt.
I have a dataset with 2 classes. I have tryed two ways:

change the filters before yolo layer from 255 to 21(3*(2+5)), and use "darknet53.conv.74.weights" as my initial weights (In this case I can't use "yolov3-spp.weights" casue last 3 conv weights are not match.)
no change for .cfg, just create my own .data and change 0&1 rows in coco.names to my class name. And I use "yolov3-spp.weights" as my initial weights.

The hyperparameters and other things are totally same.

I found the performance of the second method is better much than the first method (mAP about 5%).
Is the result present:

"yolov3-spp.weights" is better than "darknet53.conv.74.weights" in detection mission?
the default hyperparameters is proper just for "the 255 filters network"?

I want to figure it out because I need a baseline to do other improvement works.

glenn-jocher · 2019-12-01T03:12:55Z

@lyrgwlr for the fastest results use this below. See #106 (comment)

python3 train.py --weights ultralytics49.pt --cfg cfg/yourfile.cfg

glenn-jocher · 2020-01-16T19:22:14Z

I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.

HongshanLi · 2020-01-27T02:27:17Z

@glenn-jocher, thanks for discover this property.I trained my custom data with both setups. I found that the one without setting prebias result nan output in the initial training. Did you notice that same thing?

glenn-jocher · 2020-01-27T17:48:28Z

@HongshanLi no, the reason we implement a prebias stage is to improve mAP, not to prevent optimizer instability.

prebias is automatically incorporated into the main repo now, so running train.py with no options will always use a prebias step in the first few epochs now.

github-actions · 2020-11-08T00:27:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

glenn-jocher mentioned this issue Aug 18, 2019

Swish activation instead of Leaky in darknet +1% Top1 (or + ~1% [email protected]) AlexeyAB/darknet#3464

Closed

glenn-jocher added question Further information is requested tutorial Tutorial or example labels Nov 30, 2019

This was referenced Dec 5, 2019

LEARNING RATE SCHEDULER #238

Closed

CSPResNeXt50-PANet-SPP #698

Closed

glenn-jocher changed the title ~~Proper Bias Initialization in Conv2d modules preceding YOLO Layers~~ Proper Initialization in Conv2d() biases into YOLO Layers Dec 11, 2019

glenn-jocher closed this as completed Jan 16, 2020

glenn-jocher reopened this Oct 8, 2020

github-actions bot added the Stale label Nov 8, 2020

github-actions bot closed this as completed Nov 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper Initialization in Conv2d() biases into YOLO Layers #460

Proper Initialization in Conv2d() biases into YOLO Layers #460

glenn-jocher commented Aug 18, 2019

glenn-jocher commented Aug 19, 2019 •

edited

ZhxJia commented Sep 7, 2019

glenn-jocher commented Sep 8, 2019

glenn-jocher commented Sep 8, 2019

lyrgwlr commented Nov 30, 2019 •

edited by glenn-jocher

glenn-jocher commented Nov 30, 2019 •

edited

glenn-jocher commented Nov 30, 2019 •

edited

lyrgwlr commented Dec 1, 2019 •

edited

glenn-jocher commented Dec 1, 2019 •

edited

glenn-jocher commented Jan 16, 2020

HongshanLi commented Jan 27, 2020

glenn-jocher commented Jan 27, 2020 •

edited

github-actions bot commented Nov 8, 2020

Proper Initialization in Conv2d() biases into YOLO Layers #460

Proper Initialization in Conv2d() biases into YOLO Layers #460

Comments

glenn-jocher commented Aug 18, 2019

glenn-jocher commented Aug 19, 2019 • edited

ZhxJia commented Sep 7, 2019

glenn-jocher commented Sep 8, 2019

glenn-jocher commented Sep 8, 2019

lyrgwlr commented Nov 30, 2019 • edited by glenn-jocher

glenn-jocher commented Nov 30, 2019 • edited

glenn-jocher commented Nov 30, 2019 • edited

lyrgwlr commented Dec 1, 2019 • edited

glenn-jocher commented Dec 1, 2019 • edited

glenn-jocher commented Jan 16, 2020

HongshanLi commented Jan 27, 2020

glenn-jocher commented Jan 27, 2020 • edited

github-actions bot commented Nov 8, 2020

glenn-jocher commented Aug 19, 2019 •

edited

lyrgwlr commented Nov 30, 2019 •

edited by glenn-jocher

glenn-jocher commented Nov 30, 2019 •

edited

glenn-jocher commented Nov 30, 2019 •

edited

lyrgwlr commented Dec 1, 2019 •

edited

glenn-jocher commented Dec 1, 2019 •

edited

glenn-jocher commented Jan 27, 2020 •

edited