Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper Initialization in Conv2d() biases into YOLO Layers #460

Closed
glenn-jocher opened this issue Aug 18, 2019 · 13 comments
Closed

Proper Initialization in Conv2d() biases into YOLO Layers #460

glenn-jocher opened this issue Aug 18, 2019 · 13 comments
Labels
question Further information is requested Stale tutorial Tutorial or example

Comments

@glenn-jocher
Copy link
Member

I discovered a trick we have not exploited yet on review of the Focal Loss paper section 3.3:

3.3. Class Imbalance and Model Initialization
Binary classification models are by default initialized to
have equal probability of outputting either y = −1 or 1.
Under such an initialization, in the presence of class imbalance, the loss due to the frequent class can dominate total
loss and cause instability in early training. To counter this,
we introduce the concept of a ‘prior’ for the value of p estimated by the model for the rare class (foreground) at the
start of training. We denote the prior by π and set it so that
the model’s estimated p for examples of the rare class is low,
e.g. 0.01. We note that this is a change in model initialization (see §4.1) and not of the loss function. We found this
to improve training stability for both the cross entropy and
focal loss in the case of heavy class imbalance.

This involves initializing the bias terms of the Conv2d() modules directly preceding the YOLO layers for neurons involved in detection and classification (but not regression). A bias of -5 correspond roughly to a 0.01 probability (i.e. torch.sigmoid(-5) = 0.01, so in this manner all detection neurons will output null detections in the first few batches, which is what we want. The default initialization is a Gaussian about 0, which means half of all neurons will report detections in the first batch (i.e. 5000 detections per image), massively impacting the gradient and causing instabilities.

A test implementation showed improvements across all aspects after implementing this change on coco_16img.data, including final mAP, final P and R, and final F1. Oscillations in losses are reduced substantially and results are generally smoother and better behaved. I need to formalize the implementation and will hopefully commit this change soon.

image

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Aug 19, 2019

Repo has been updated! New training results for coco_16img and coco_64img:

image

@ZhxJia
Copy link

ZhxJia commented Sep 7, 2019

what's the purpose of prebias?

@glenn-jocher
Copy link
Member Author

@ZhxJia to initialize the biases on the inputs to the yolo layers.

@glenn-jocher
Copy link
Member Author

@ZhxJia the Focal Loss paper section 3.3 detailed above contains good information on the topic. Basically the logits input to the objectness neurons should be highly negative on initialization, i.e. -5 to -10, since most neurons with not detect anything in a given picture. PyTorch default bias initialization is centered about zero, so the --prebias argument in train.py freezes the entire network, and aggressively trains these biases for one epoch before starting training. It improves performance in the first few epochs, though the long term benefit is unclear (i.e. after 100 epochs).

@lyrgwlr
Copy link

lyrgwlr commented Nov 30, 2019

@glenn-jocher
If I use yolov3-spp.weights as my initial weights for my custom data, is the bias initialzation proper? Cause the initial weights is not "PyTorch default bias initialization".

I think these lines should be commented in models.py if yolov3-spp.weights is loaded for the initial weights:

bias = module_list[-1][0].bias.view(len(mask), -1)  # 255 to 3x85
bias[:, 4] += b[0] - bias[:, 4].mean()  # obj
bias[:, 5:] += b[1] - bias[:, 5:].mean()  # cls
module_list[-1][0].bias = torch.nn.Parameter(bias.view(-1))

What do you think about it?

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Nov 30, 2019

@lyrgwlr yes, if a fully trained model like yolov3-spp.weights is loaded before training, then these lines can be commented.

In some cases, such as --weights yolov3-spp.pt --cfg yolov3-spp-1cls.cfg these fully trained models serve as a backbone to other models that may have different shape output layers, in which case only the layers with the exact same sizes and names are loaded, and the above lines should stay, as they are still biasing randomly initialized layers.

The idea of course is to bias the obj and cls biases more negatively, as those neurons very rarely contain an object, or the correct class.

@glenn-jocher glenn-jocher added question Further information is requested tutorial Tutorial or example labels Nov 30, 2019
@glenn-jocher
Copy link
Member Author

glenn-jocher commented Nov 30, 2019

@lyrgwlr ah I completely forgot: the lines in models.py that you highlighted run when the model is initialized, which must always happen.

If --weights are specified, then these are loaded afterwards, replacing the initialized model's parameter values. So the lines above should never be commented, it would serve no purpose.

To run the model with randomly initialized weights use --weights ''.

@lyrgwlr
Copy link

lyrgwlr commented Dec 1, 2019

@glenn-jocher
You're right. I have an other doubt.
I have a dataset with 2 classes. I have tryed two ways:

  1. change the filters before yolo layer from 255 to 21(3*(2+5)), and use "darknet53.conv.74.weights" as my initial weights (In this case I can't use "yolov3-spp.weights" casue last 3 conv weights are not match.)
  2. no change for .cfg, just create my own .data and change 0&1 rows in coco.names to my class name. And I use "yolov3-spp.weights" as my initial weights.

The hyperparameters and other things are totally same.

I found the performance of the second method is better much than the first method (mAP about 5%).
Is the result present:

  1. "yolov3-spp.weights" is better than "darknet53.conv.74.weights" in detection mission?
  2. the default hyperparameters is proper just for "the 255 filters network"?

I want to figure it out because I need a baseline to do other improvement works.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Dec 1, 2019

@lyrgwlr for the fastest results use this below. See #106 (comment)

python3 train.py --weights ultralytics49.pt --cfg cfg/yourfile.cfg

This was referenced Dec 5, 2019
@glenn-jocher glenn-jocher changed the title Proper Bias Initialization in Conv2d modules preceding YOLO Layers Proper Initialization in Conv2d() biases into YOLO Layers Dec 11, 2019
@glenn-jocher
Copy link
Member Author

I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.

@HongshanLi
Copy link

@glenn-jocher, thanks for discover this property.I trained my custom data with both setups. I found that the one without setting prebias result nan output in the initial training. Did you notice that same thing?

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Jan 27, 2020

@HongshanLi no, the reason we implement a prebias stage is to improve mAP, not to prevent optimizer instability.

prebias is automatically incorporated into the main repo now, so running train.py with no options will always use a prebias step in the first few epochs now.

@glenn-jocher glenn-jocher reopened this Oct 8, 2020
@github-actions
Copy link

github-actions bot commented Nov 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale tutorial Tutorial or example
Projects
None yet
Development

No branches or pull requests

4 participants