Test models with good hyperparameters - +4.8% [email protected] on MS COCO test-dev #4430

AlexeyAB · 2019-12-02T12:09:56Z

Test models with good hyperparameters: #3114 (comment) and #4147 (comment)

batch=64
subdivisions=8
width=608
height=608
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.00261 # or 0.122 (so iou~=3.29 and cls & obj ~= 47) as in @glenn-jocher yolov3
burn_in=1000
max_batches = 500500
policy=steps
steps=400000,450000
scales=.1,.1

mosaic=1

[yolo] # or  [Gaussian_yolo]
...

jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
scale_x_y = 1.05   # 1.05, 1.10, 1.20
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
uc_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6
beta1=0.6

set iou_normalizer=1 for [yolo]
set iou_normalizer=0.07 for [yolo] + C/D/GIoU
set iou_normalizer=0.1 and uc_normalizer=0.1 for [Gaussian_yolo]
and set iou_normalizer=0.07 and uc_normalizer=0.07 for [Gaussian_yolo] + C/D/GIoU

The text was updated successfully, but these errors were encountered:

sctrueew · 2019-12-03T12:01:42Z

Hi @AlexeyAB

So, We have to change the uc_normalizer from 1.0 to 0.1 in Gaussian_yolov3_BDD.cfg. is it right?

AlexeyAB · 2019-12-03T12:13:05Z

@zpmmehrdad Yes.

AlexeyAB · 2019-12-03T13:00:52Z

@glenn-jocher Hi,

Do you use static learning rate = 0.00261?
Or do you use SGDR (cosine) - constantly decreasing learning rate?

learning_rate=0.00261
momentum=0.949

glenn-jocher · 2019-12-06T20:40:37Z

@AlexeyAB I use original darknet LR scheduler, with drops of *=0.1 at 80% and 90% of total epochs. It's true that that a smoother drop may have a slight benefit (I think BOF paper showed this), but it's likely a very minimal effect. See ultralytics/yolov3#238

AlexeyAB · 2019-12-08T10:59:56Z

At least for some datasets, it seems that iou_n=0.07 is too low value for GIoU, and iou_n=0.5 is much better: #3874 (comment)

glenn-jocher · 2019-12-08T20:00:04Z

@AlexeyAB yes, you should definitely examine the value of each loss component to ensure that the balancing parameters produce roughly equal loss between the 3 components. In ultrayltics/yolov3 they produce magnitudes of about 5, 5, 5 for GIoU, obj, cls on COCO epoch 0. If they produce different magnitudes here, you should adjust accordingly.

BTW, one thing that has always bothered me about the ultralytics/yolov3 loss function is that each of the yolo layers is treated equally (because we mean all the elements in each layer), and I think here you sum all the elements in each layer instead. Is this correct?

In all of the papers I always see mAP_small underperform mAP_large and medium, and the smaller object output grid points far outnumber the large object output grid points, so it makes sense to me that the small object layer should generate more loss (yet this is not currently the case at ultralytics). I experimented with this change in the past unsuccessfully unfortunately. What do you think?

AlexeyAB · 2019-12-08T20:59:19Z

@glenn-jocher

BTW, one thing that has always bothered me about the ultralytics/yolov3 loss function is that each of the yolo layers is treated equally (because we mean all the elements in each layer), and I think here you sum all the elements in each layer instead. Is this correct?

What do you mean?
Just each final activation produce separate delta, which is backpropagated without changes.

In all of the papers I always see mAP_small underperform mAP_large and medium, and the smaller object output grid points far outnumber the large object output grid points, so it makes sense to me that the small object layer should generate more loss (yet this is not currently the case at ultralytics). I experimented with this change in the past unsuccessfully unfortunately. What do you think?

It is just because smaller objects have fewer pixels, especially after resizing to the network size 416x416.
I think we should use more anchors, use routes to the lower layers and special blocks (which have many layers but don't lose detailed information) - for the small objects.

AlexeyAB · 2019-12-08T21:00:47Z

@glenn-jocher Also did you think about rotation/scale-invariant features like SIFT/SURF (rotation/scale-invariant conv-layers or something else)?

glenn-jocher · 2019-12-08T22:50:01Z

@AlexeyAB yes I've worked a lot with SURF and SIFT, but don't get these confused with object detection. SURF is a faster version of SIFT, they are not AI algorithms, their purpose is to match points in one image to points in a second image by comparing feature vectors between possible point pairs. This is useful is Structure From Motion (SFM) applications like AR where its necessary to know the camera motion between frames to reconstruct a 3D scene, or simply to find an object in a second image that exists in a first image.

But it does not generalize at all, so for example SURF points from a blue car will never match to SURF points on a red car, so in this sense it is completely separate from object detection.

Yes a more targeted strategy to lower layers is a good idea. But the point I was making is that I think the darknet loss function (if there are no balancers) treats each element the same, whereas the ultralytics loss treats each layer the same (i.e. for 416 there would be 507 + 2028 + 8112 = 10467 loss elements in the 3 layers).

The current ultralytics loss reduces the value of the lower layer elements because it takes a mean() of each layer for the total loss:
loss = mean(layer0_loss) + mean(layer1_loss) + mean(layer2_loss)

I'm thinking if I take the mean of the entire 10467 anchors instead, this would result in more effective training of the smaller object layers. I tried this before with poor effect, but maybe I should try again. The current COCO results are:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.243 <--
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.450
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.514

 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.422 <--
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.640
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.707

AlexeyAB · 2019-12-09T00:24:02Z

@glenn-jocher

The current ultralytics loss reduces the value of the lower layer elements because it takes a mean() of each layer for the total loss:
loss = mean(layer0_loss) + mean(layer1_loss) + mean(layer2_loss)

Does this calculation only affect the display of the total loss on the screen? Or does it somehow affect the value of each delta that will be redistributed?

F.e.

if output[i] = 0.2 (for x=2,y=3, anchors=1, yolo-layer-3)
and delta_class[i] = 1 - p = 1 - 0.2 = 0.8 (for the same x=2,y=3, anchors=1, yolo-layer-3)
then after this loss-calculation loss = mean(layer0_loss) + mean(layer1_loss) + mean(layer2_loss), what value will be back-propagated in ultralytics-yolo, is it 0.8 or what?

SURF: translation/scale/rotation invarinat
CONV-filter: translation invariant

Both are not color invariant.

DNN can achieve color/scale/rotation invarinat only due to a large amount of filters (millions of parameters).
Surf is very demanding on resources, so we will not be able to make the network completely only out of millions of Surfs. But perhaps we can apply a certain amount of them in some layers, for example, with subsampling. Or something else, there are many algorithms: SIFT, SURF, BRIEF, ORB...

But it does not generalize at all, so for example SURF points from a blue car will never match to SURF points on a red car, so in this sense it is completely separate from object detection.

It depends on whether we want to detect only red cars or any color. Otherwise, we will either have to add surfers for the blue car etc..., or use color-invariant SURF-descriptors: https://link.springer.com/chapter/10.1007/978-3-642-35740-4_6

SURF is not an object detection method, it is a method of matching areas in an image that can be used for detection/tracking objects, with rotation/scale-invariance.

In all surf tutorials, Surf is demonstrated as a method for comparing whole images, rather than individual objects. The reason - just because SURF has different efficiencies for different key points in the image, it is just assumed that on a separate object there may be or may not be good points, but on the whole image they should be much more likely.

Many years ago I successfully used the Surf Extractor Ptr<SurfDescriptorExtractor> extractor = new SurfDescriptorExtractor(); to track an object with rotation and scale invariance with occlusion and long disappearances, with instant training. Because we can calculate surf descriptors for any point in any area of the image (including where the object) and save them to a file.

After detection by using Surf Extractor, the area with the object was rotated and scaled, and then other refinement algorithms for object recognition/detection/comparison were applied asynchronously, like Similarity check (PNSR and SSIM) on the GPU, HaarCascades / Viola–Jones object detection, ...

nyj-ocean · 2019-12-13T10:14:20Z

In my case,iou_normalizer=0.07 seems better than iou_normalizer=0.5 in yolov3+Gaussian+CIoU

iou_normalizer=0.07

iou_normalizer=0.5

AlexeyAB · 2019-12-13T11:40:43Z

@nyj-ocean

Can you share both cfg-file in zip-archive?
Did you use the same other params?
How many classes in your dataset?

glenn-jocher · 2019-12-14T00:06:23Z

@nyj-ocean oh that's an impressive difference! @AlexeyAB I don't think we should get too hung up exactly the best normalizer for every situation, because I think they will all be different depending on many factors, including the custom data, number of classes, class frequency, etc.

I think a robust balancing method would probably sacrifice epoch 0 simply to see what the default balancers produce (i.e. 1, 1, 1), and then restart training using those results to balance the loss components. The steps would roughly be:

Set balancers to 1, 1, 1 for box, obj, cls
Train up to 1 epoch / 10 minutes / 1000 iterations, saving loss component means.
Set balancers to inverse loss component means.
Train normally.

AlexeyAB · 2019-12-14T00:34:43Z

@glenn-jocher Yes, just I think we should keep 1, 0.1, 0.1 for box, obj, cls, at least for high AP@75 and may be for AP@50 too

tuteming · 2019-12-15T04:04:05Z

from your cfg.zip, those two cfg both is yolo layers rather than gaussian-yolo layers.
please confirm. thanks.

AlexeyAB · 2019-12-15T08:21:18Z

@nyj-ocean Thanks. In your cfg-file there are [yolo] layers instead of [Gaussian_yolo].

AlexeyAB · 2020-01-02T21:25:57Z

Tested: https://github.com/AlexeyAB/darknet/blob/master/cfg/csresnext50-panet-spp-original-optimal.cfg

+4.8% [email protected] on MS COCO test-dev

nyj-ocean · 2020-01-08T05:02:19Z

@AlexeyAB

Is there a need to add SE module to YOLOv3?

Squeeze-and-Excitation Networks.pdf

AlexeyAB · 2020-01-08T11:16:20Z

@nyj-ocean
Squeeze-and-Excitation blocks are already implemented in enet-coco.cfg (EfficientNetB0-Yolov3) 4 months ago, but it is very slow https://github.com/AlexeyAB/darknet#pre-trained-models

Open a new issue. May be I will benchmark SE-module and will check can I improve SE-speed.

nyj-ocean · 2020-01-11T11:45:50Z

@AlexeyAB

Squeeze-and-Excitation blocks are already implemented in enet-coco.cfg (EfficientNetB0-Yolov3) 4 months ago

I add Squeeze-and-Excitation blocks to yolov3.cfg.
Then train with my dataset

	mAP
yolov3	86.03
yolov3+senet	85.78

The mAP of yolov3+senet is lower than yolov3
The result is strange

AlexeyAB · 2020-01-11T13:35:36Z

@nyj-ocean
If you add SE-block to the darkent53 backbone, then you should retrain classifier for using new pre-trained weights file.

becauseofAI · 2020-03-30T09:32:14Z

In my case,iou_normalizer=0.07 seems better than iou_normalizer=0.5 in yolov3+Gaussian+CIoU

iou_normalizer=0.07

iou_normalizer=0.5

@AlexeyAB @nyj-ocean The "good hyperparameters" is effective. But why does loss function not converge normally?

AlexeyAB added the want enhancement Want to improve accuracy, speed or functionality label Dec 2, 2019

AlexeyAB mentioned this issue Dec 2, 2019

Gaussian YOLOv3 (+3.1% [email protected] on COCO) , (+3.0% [email protected] on KITTI) , (+3.5% [email protected] on BDD) #4147

Closed

AlexeyAB added this to To do in Yolo works with crowds and occlusions without re-identification errors Dec 2, 2019

AlexeyAB mentioned this issue Dec 2, 2019

DIoU and CIOU loss implementation #4360

Closed

AlexeyAB mentioned this issue Dec 4, 2019

Detector - added iou_thresh=0.213 parameter to use more than 1 anchor per truth #4451

Closed

Look4-you mentioned this issue Dec 9, 2019

The Loss value does not decrease #4481

Open

AlexeyAB moved this from To do to In progress in Yolo works with crowds and occlusions without re-identification errors Dec 31, 2019

AlexeyAB changed the title ~~Test models with good hyperparameters~~ Test models with good hyperparameters - +4.8% [email protected] on MS COCO test-dev Jan 2, 2020

AlexeyAB added enhancement and removed want enhancement Want to improve accuracy, speed or functionality labels Jan 2, 2020

AlexeyAB moved this from In progress to Done in Yolo works with crowds and occlusions without re-identification errors Jan 2, 2020

xevolesi mentioned this issue Jan 24, 2020

Custom training goes wrong with "good hyperparameters" #4751

Closed

cenit closed this as completed Jan 23, 2021

Yolo works with crowds and occlusions without re-identification errors automation moved this from Done to To do Jan 23, 2021

cenit moved this from To do to Done in Yolo works with crowds and occlusions without re-identification errors Jan 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test models with good hyperparameters - +4.8% [email protected] on MS COCO test-dev #4430

Test models with good hyperparameters - +4.8% [email protected] on MS COCO test-dev #4430

AlexeyAB commented Dec 2, 2019 •

edited

sctrueew commented Dec 3, 2019

AlexeyAB commented Dec 3, 2019

AlexeyAB commented Dec 3, 2019

glenn-jocher commented Dec 6, 2019

AlexeyAB commented Dec 8, 2019

glenn-jocher commented Dec 8, 2019 •

edited

AlexeyAB commented Dec 8, 2019

AlexeyAB commented Dec 8, 2019 •

edited

glenn-jocher commented Dec 8, 2019 •

edited

AlexeyAB commented Dec 9, 2019

nyj-ocean commented Dec 13, 2019

AlexeyAB commented Dec 13, 2019

glenn-jocher commented Dec 14, 2019 •

edited

AlexeyAB commented Dec 14, 2019

tuteming commented Dec 15, 2019

AlexeyAB commented Dec 15, 2019

AlexeyAB commented Jan 2, 2020

nyj-ocean commented Jan 8, 2020

AlexeyAB commented Jan 8, 2020

nyj-ocean commented Jan 11, 2020

AlexeyAB commented Jan 11, 2020

becauseofAI commented Mar 30, 2020

Test models with good hyperparameters - +4.8% [email protected] on MS COCO test-dev #4430

Test models with good hyperparameters - +4.8% [email protected] on MS COCO test-dev #4430

Comments

AlexeyAB commented Dec 2, 2019 • edited

sctrueew commented Dec 3, 2019

AlexeyAB commented Dec 3, 2019

AlexeyAB commented Dec 3, 2019

glenn-jocher commented Dec 6, 2019

AlexeyAB commented Dec 8, 2019

glenn-jocher commented Dec 8, 2019 • edited

AlexeyAB commented Dec 8, 2019

AlexeyAB commented Dec 8, 2019 • edited

glenn-jocher commented Dec 8, 2019 • edited

AlexeyAB commented Dec 9, 2019

nyj-ocean commented Dec 13, 2019

AlexeyAB commented Dec 13, 2019

glenn-jocher commented Dec 14, 2019 • edited

AlexeyAB commented Dec 14, 2019

tuteming commented Dec 15, 2019

AlexeyAB commented Dec 15, 2019

AlexeyAB commented Jan 2, 2020

nyj-ocean commented Jan 8, 2020

AlexeyAB commented Jan 8, 2020

nyj-ocean commented Jan 11, 2020

AlexeyAB commented Jan 11, 2020

becauseofAI commented Mar 30, 2020

AlexeyAB commented Dec 2, 2019 •

edited

glenn-jocher commented Dec 8, 2019 •

edited

AlexeyAB commented Dec 8, 2019 •

edited

glenn-jocher commented Dec 8, 2019 •

edited

glenn-jocher commented Dec 14, 2019 •

edited