New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test models with good hyperparameters - +4.8% [email protected] on MS COCO test-dev #4430
Comments
Hi @AlexeyAB So, We have to change the uc_normalizer from 1.0 to 0.1 in Gaussian_yolov3_BDD.cfg. is it right? |
@zpmmehrdad Yes. |
@glenn-jocher Hi, Do you use static learning rate = 0.00261?
|
@AlexeyAB I use original darknet LR scheduler, with drops of *=0.1 at 80% and 90% of total epochs. It's true that that a smoother drop may have a slight benefit (I think BOF paper showed this), but it's likely a very minimal effect. See ultralytics/yolov3#238 |
At least for some datasets, it seems that |
@AlexeyAB yes, you should definitely examine the value of each loss component to ensure that the balancing parameters produce roughly equal loss between the 3 components. In ultrayltics/yolov3 they produce magnitudes of about 5, 5, 5 for GIoU, obj, cls on COCO epoch 0. If they produce different magnitudes here, you should adjust accordingly. BTW, one thing that has always bothered me about the ultralytics/yolov3 loss function is that each of the yolo layers is treated equally (because we mean all the elements in each layer), and I think here you sum all the elements in each layer instead. Is this correct? In all of the papers I always see mAP_small underperform mAP_large and medium, and the smaller object output grid points far outnumber the large object output grid points, so it makes sense to me that the small object layer should generate more loss (yet this is not currently the case at ultralytics). I experimented with this change in the past unsuccessfully unfortunately. What do you think? |
What do you mean?
It is just because smaller objects have fewer pixels, especially after resizing to the network size 416x416. |
@glenn-jocher Also did you think about rotation/scale-invariant features like SIFT/SURF (rotation/scale-invariant conv-layers or something else)? |
@AlexeyAB yes I've worked a lot with SURF and SIFT, but don't get these confused with object detection. SURF is a faster version of SIFT, they are not AI algorithms, their purpose is to match points in one image to points in a second image by comparing feature vectors between possible point pairs. This is useful is Structure From Motion (SFM) applications like AR where its necessary to know the camera motion between frames to reconstruct a 3D scene, or simply to find an object in a second image that exists in a first image. But it does not generalize at all, so for example SURF points from a blue car will never match to SURF points on a red car, so in this sense it is completely separate from object detection. Yes a more targeted strategy to lower layers is a good idea. But the point I was making is that I think the darknet loss function (if there are no balancers) treats each element the same, whereas the ultralytics loss treats each layer the same (i.e. for 416 there would be 507 + 2028 + 8112 = 10467 loss elements in the 3 layers). The current ultralytics loss reduces the value of the lower layer elements because it takes a mean() of each layer for the total loss: I'm thinking if I take the mean of the entire 10467 anchors instead, this would result in more effective training of the smaller object layers. I tried this before with poor effect, but maybe I should try again. The current COCO results are:
|
Does this calculation only affect the display of the total loss on the screen? Or does it somehow affect the value of each F.e.
Both are not color invariant. DNN can achieve color/scale/rotation invarinat only due to a large amount of filters (millions of parameters).
It depends on whether we want to detect only red cars or any color. Otherwise, we will either have to add surfers for the blue car etc..., or use color-invariant SURF-descriptors: https://link.springer.com/chapter/10.1007/978-3-642-35740-4_6 SURF is not an object detection method, it is a method of matching areas in an image that can be used for detection/tracking objects, with rotation/scale-invariance. In all surf tutorials, Surf is demonstrated as a method for comparing whole images, rather than individual objects. The reason - just because SURF has different efficiencies for different key points in the image, it is just assumed that on a separate object there may be or may not be good points, but on the whole image they should be much more likely. Many years ago I successfully used the Surf Extractor After detection by using Surf Extractor, the area with the object was rotated and scaled, and then other refinement algorithms for object recognition/detection/comparison were applied asynchronously, like |
|
@nyj-ocean oh that's an impressive difference! @AlexeyAB I don't think we should get too hung up exactly the best normalizer for every situation, because I think they will all be different depending on many factors, including the custom data, number of classes, class frequency, etc. I think a robust balancing method would probably sacrifice epoch 0 simply to see what the default balancers produce (i.e. 1, 1, 1), and then restart training using those results to balance the loss components. The steps would roughly be:
|
@glenn-jocher Yes, just I think we should keep 1, 0.1, 0.1 for box, obj, cls, at least for high AP@75 and may be for AP@50 too |
from your cfg.zip, those two cfg both is yolo layers rather than gaussian-yolo layers. |
@nyj-ocean Thanks. In your cfg-file there are [yolo] layers instead of [Gaussian_yolo]. |
Tested: https://github.com/AlexeyAB/darknet/blob/master/cfg/csresnext50-panet-spp-original-optimal.cfg +4.8% [email protected] on MS COCO test-dev |
|
@nyj-ocean Open a new issue. May be I will benchmark SE-module and will check can I improve SE-speed. |
I add
The |
@nyj-ocean |
@AlexeyAB @nyj-ocean The "good hyperparameters" is effective. But why does loss function not converge normally? |
Test models with good hyperparameters: #3114 (comment) and #4147 (comment)
iou_normalizer=1
for [yolo]iou_normalizer=0.07
for [yolo] + C/D/GIoUiou_normalizer=0.1
anduc_normalizer=0.1
for [Gaussian_yolo]iou_normalizer=0.07
anduc_normalizer=0.07
for [Gaussian_yolo] + C/D/GIoUThe text was updated successfully, but these errors were encountered: