New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper Initialization in Conv2d() biases into YOLO Layers #460
Comments
what's the purpose of prebias? |
@ZhxJia to initialize the biases on the inputs to the yolo layers. |
@ZhxJia the Focal Loss paper section 3.3 detailed above contains good information on the topic. Basically the logits input to the objectness neurons should be highly negative on initialization, i.e. -5 to -10, since most neurons with not detect anything in a given picture. PyTorch default bias initialization is centered about zero, so the |
@glenn-jocher I think these lines should be commented in models.py if yolov3-spp.weights is loaded for the initial weights: bias = module_list[-1][0].bias.view(len(mask), -1) # 255 to 3x85
bias[:, 4] += b[0] - bias[:, 4].mean() # obj
bias[:, 5:] += b[1] - bias[:, 5:].mean() # cls
module_list[-1][0].bias = torch.nn.Parameter(bias.view(-1)) What do you think about it? |
@lyrgwlr yes, if a fully trained model like yolov3-spp.weights is loaded before training, then these lines can be commented. In some cases, such as The idea of course is to bias the obj and cls biases more negatively, as those neurons very rarely contain an object, or the correct class. |
@lyrgwlr ah I completely forgot: the lines in models.py that you highlighted run when the model is initialized, which must always happen. If To run the model with randomly initialized weights use |
@glenn-jocher
The hyperparameters and other things are totally same. I found the performance of the second method is better much than the first method (mAP about 5%).
I want to figure it out because I need a baseline to do other improvement works. |
@lyrgwlr for the fastest results use this below. See #106 (comment)
|
I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case. |
@glenn-jocher, thanks for discover this property.I trained my custom data with both setups. I found that the one without setting prebias result nan output in the initial training. Did you notice that same thing? |
@HongshanLi no, the reason we implement a prebias stage is to improve mAP, not to prevent optimizer instability. prebias is automatically incorporated into the main repo now, so running |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I discovered a trick we have not exploited yet on review of the Focal Loss paper section 3.3:
This involves initializing the bias terms of the Conv2d() modules directly preceding the YOLO layers for neurons involved in detection and classification (but not regression). A bias of -5 correspond roughly to a 0.01 probability (i.e. torch.sigmoid(-5) = 0.01, so in this manner all detection neurons will output null detections in the first few batches, which is what we want. The default initialization is a Gaussian about 0, which means half of all neurons will report detections in the first batch (i.e. 5000 detections per image), massively impacting the gradient and causing instabilities.
A test implementation showed improvements across all aspects after implementing this change on coco_16img.data, including final mAP, final P and R, and final F1. Oscillations in losses are reduced substantially and results are generally smoother and better behaved. I need to formalize the implementation and will hopefully commit this change soon.
The text was updated successfully, but these errors were encountered: