Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why I can't get good result in my own datasets? #864

Closed
BBuf opened this issue Feb 22, 2020 · 13 comments
Closed

Why I can't get good result in my own datasets? #864

BBuf opened this issue Feb 22, 2020 · 13 comments
Labels
bug Something isn't working

Comments

@BBuf
Copy link

BBuf commented Feb 22, 2020

@glenn-jocher Hello, I recently trained two of my private detection datasets using AlexeyAB DarkNet and your project, one of which is two categories and one of which is a category, and my target sizes are all regular. Then I can achieve mAP values of 99.5 +% and 96% + respectively on AlexeyAB DarkNet, but using your project can only reach mAP values of 80% + and 70% +, respectively. I also use GIOU Loss in the AlexeyAB code, and all use the default parameters in your project. In addition, I use YOLOV3-Tiny network. I wonder if there might be some problems with this code?

@BBuf BBuf added the bug Something isn't working label Feb 22, 2020
@BBuf BBuf changed the title Why I can't get good result In my own two datasets? Why I can't get good result in my own datasets? Feb 22, 2020
@glenn-jocher
Copy link
Member

@BBuf this repo trains yolov3-spp.cfg on COCO to the highest mAP of any reported results we know. See https://github.com/ultralytics/yolov3#map

You may want to tune your hyperparameters #392 or switch from tiny to yolov3-spp.cfg. Other than that general guidance, we don't offer free support or feedback on training custom datasets. I'll leave the issue open for community feedback.

@glenn-jocher
Copy link
Member

glenn-jocher commented Feb 22, 2020

@BBuf two other thoughts are that you may want to try better tiny derivatives like https://github.com/ultralytics/yolov3/blob/master/cfg/yolov3-tiny3.cfg, and that you should also try to test your darknet-trained models here to get an apples to apples mAP comparison:

python3 test.py --data ... --weights ... --cfg ...

And lastly, you need to look at your results.png for training feedback.

@BBuf
Copy link
Author

BBuf commented Feb 23, 2020

OK, now I adjusted the batch_size of both projects to be the same, and then retrained YOLOV3-Tiny, the results are as follows:

In AlexeyAB DarkNet:

Train Loss And Map:

图片

Use your code to test on the best weights model:

图片

In your project:

result.png

图片

Use your code to test on the best pt model:

图片

Apparently, the mAP and F1 scores of the AlexeyAB version of the model are higher than your project. I want to know why? In addition, when I tested, the conf-thres was set to 0.1

@glenn-jocher
Copy link
Member

@BBuf ah ok, this is a lot more info now. Yes, the darknet training is working better for you. The biggest problem I see with the ultralytics results are that the classification loss is about 10X larger compared to the GIoU and Objectness losses. The 3 should all be roughly in balance, so you probably want to reduce your hyp['cls'] by up to 5X or 10X to bring it in line:

'cls': 37.4, # cls loss gain

This is probably because the hyperparameters are tuned to an 80-class dataset, and you have a 2 class dataset. This is very interesting, maybe there's a way to automate this balancing in the future, perhaps using the mean losses after the first x epochs to adjust the hyps.

I would also cut the training time here down by half, as it looks like after 100 epochs you've already reached a steady state solution. Can you retrain with those two changes and see if it helps?

@glenn-jocher
Copy link
Member

glenn-jocher commented Feb 23, 2020

@BBuf about the batch size, I recommend --batch-size 64 --accum 1 if possible, or if you run out of CUDA memory then you can reduce this down, i.e.

python3 train.py --batch-size 64 --accum 1
python3 train.py --batch-size 32 --accum 2
python3 train.py --batch-size 16 --accum 4

are all roughly similar, but use less memory as you go down. --accum is the number of gradient accumulations before an optimizer update.

@BBuf
Copy link
Author

BBuf commented Feb 24, 2020

Ok, I will try it, thank you。

@BBuf
Copy link
Author

BBuf commented Feb 24, 2020

I changed 'cls': 37.4 to cls: 5.0 and got the following result:

图片

图片

Looks like this hasn't improved significantly

@glenn-jocher
Copy link
Member

@BBuf yes, the losses are much better balanced now. You should reduce cls by half again to about 2.5, and depending on your dataset, you may also want to train with --multi, which turns on multi-scale training. This is how we train COCO, and I think darknet has it on by default as well.

The other thing is you are training too long. As you can see, all of your results are already done by epoch 100, so you should use --epochs 100:

python3 train.py --epochs 100 --multi ...

One last thing you could try is to apply a cosine LR scheduler, which shows improvement in COCO training. See #238 (comment). You do this by commenting the current scheduler and uncommenting L139-140:

yolov3/train.py

Lines 139 to 142 in a3671bd

# lf = lambda x: 0.5 * (1 + math.cos(x * math.pi / epochs)) # cosine https://arxiv.org/pdf/1812.01187.pdf
# scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
# scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=range(59, 70, 1), gamma=0.8) # gradual fall to 0.1*lr0
scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[round(epochs * x) for x in [0.8, 0.9]], gamma=0.1)

@BBuf
Copy link
Author

BBuf commented Feb 24, 2020

I used all the improvements you mentioned above and got the following results:

图片

图片

@glenn-jocher
Copy link
Member

@BBuf then you are all up to date with this repo. Class 2 exceeds darknet mAP, class 1 does not, overall mAP does not. I'd use the darknet trained results for your custom dataset.

@BBuf
Copy link
Author

BBuf commented Feb 25, 2020

OK, thank you for your patience, then I will use the results of AlexeyAB DarkNet as the result of my custom data set.

@BBuf BBuf closed this as completed Feb 25, 2020
@nanhui69
Copy link

nanhui69 commented Sep 7, 2020

@glenn-jocher Hello, I recently trained two of my private detection datasets using AlexeyAB DarkNet and your project, one of which is two categories and one of which is a category, and my target sizes are all regular. Then I can achieve mAP values of 99.5 +% and 96% + respectively on AlexeyAB DarkNet, but using your project can only reach mAP values of 80% + and 70% +, respectively. I also use GIOU Loss in the AlexeyAB code, and all use the default parameters in your project. In addition, I use YOLOV3-Tiny network. I wonder if there might be some problems with this code?

which repo do you use? yolov3 _darknet or yolov4 darknet repo? .....

@glenn-jocher
Copy link
Member

@nanhui69 hi there! It seems like you are comparing the performance of YOLOv3 trained on your private datasets between the Ultralytics YOLOv3 and AlexeyAB's DarkNet. I maintain the YOLOv3 repo at Ultralytics, and we appreciate the comparison. It's great to hear that you achieved excellent mAP values with AlexeyAB DarkNet.

It's important to note that each repository may have different default configurations, including the hyperparameters, architecture, and training settings, which can affect the training results. We continually strive to provide optimal default settings for a wide range of use cases, but there may be specific adjustments needed for individual scenarios.

If you'd like, we can investigate your specific case further to help optimize the training process. Additionally, utilizing the latest YOLOv4 repository may also provide improved results, as it incorporates various advancements over YOLOv3.

Let me know if you need any assistance, and we're here to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants