Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HYPERPARAMETER EVOLUTION #392

Closed
glenn-jocher opened this issue Jul 25, 2019 · 106 comments
Closed

HYPERPARAMETER EVOLUTION #392

glenn-jocher opened this issue Jul 25, 2019 · 106 comments
Assignees
Labels
enhancement New feature or request tutorial Tutorial or example

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 25, 2019

Training hyperparameters in this repo are defined in train.py, including augmentation settings:

yolov3/train.py

Lines 35 to 54 in df4f25e

# Training hyperparameters f
hyp = {'giou': 1.2, # giou loss gain
'xy': 4.062, # xy loss gain
'wh': 0.1845, # wh loss gain
'cls': 15.7, # cls loss gain
'cls_pw': 3.67, # cls BCELoss positive_weight
'obj': 20.0, # obj loss gain
'obj_pw': 1.36, # obj BCELoss positive_weight
'iou_t': 0.194, # iou training threshold
'lr0': 0.00128, # initial learning rate
'lrf': -4., # final LambdaLR learning rate = lr0 * (10 ** lrf)
'momentum': 0.95, # SGD momentum
'weight_decay': 0.000201, # optimizer weight decay
'hsv_s': 0.8, # image HSV-Saturation augmentation (fraction)
'hsv_v': 0.388, # image HSV-Value augmentation (fraction)
'degrees': 1.2, # image rotation (+/- deg)
'translate': 0.119, # image translation (+/- fraction)
'scale': 0.0589, # image scale (+/- gain)
'shear': 0.401} # image shear (+/- deg)

We began with darknet defaults before evolving the values using the result of our hyp evolution code:

python3 train.py --data data/coco.data --weights '' --img-size 320 --epochs 1 --batch-size 64 -- accumulate 1 --evolve

The process is simple: for each new generation, the prior generation with the highest fitness (out of all previous generations) is selected for mutation. All parameters are mutated simultaneously based on a normal distribution with about 20% 1-sigma:

yolov3/train.py

Lines 390 to 396 in df4f25e

# Mutate
init_seeds(seed=int(time.time()))
s = [.15, .15, .15, .15, .15, .15, .15, .15, .15, .00, .05, .20, .20, .20, .20, .20, .20, .20] # sigmas
for i, k in enumerate(hyp.keys()):
x = (np.random.randn(1) * s[i] + 1) ** 2.0 # plt.hist(x.ravel(), 300)
hyp[k] *= float(x) # vary by sigmas

Fitness is defined as a weighted mAP and F1 combination at the end of epoch 0, under the assumption that better epoch 0 results correlate to better final results, which may or may not be true.

yolov3/utils/utils.py

Lines 605 to 608 in bd92457

def fitness(x):
# Returns fitness (for use with results.txt or evolve.txt)
return 0.5 * x[:, 2] + 0.5 * x[:, 3] # fitness = 0.5 * mAP + 0.5 * F1

An example snapshot of the results are here. Fitness is on the y axis (higher is better).
from utils.utils import *; plot_evolution_results(hyp)
evolve

@glenn-jocher glenn-jocher added enhancement New feature or request help wanted Extra attention is needed tutorial Tutorial or example labels Jul 25, 2019
@glenn-jocher glenn-jocher self-assigned this Jul 25, 2019
@YRunner
Copy link

YRunner commented Jul 26, 2019

I had this problem. 'shape '[16, 3, 85, 13, 13]' is invalid for input of size 56784'.The problem is located in the code here,' p = p.view(bs, self.na, self.nc + 5, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction',I'm a green hand and I'd appreciate any advice.

@glenn-jocher
Copy link
Member Author

@YRunner this issue is dedicated only to hyperparameter evolution. Is your post in reference to this topic?

@Chida15
Copy link

Chida15 commented Aug 5, 2019

I got the result like this, is it normal? [
evolve
The fitness is very low, should I epoch more times?

@glenn-jocher
Copy link
Member Author

@Chida15 haha, yes, well good job, you've run two different models here, the orange points, and it's showing you the best result highlighted in blue. For this to be effective you want to evolve hundreds of mutations. So I would change the for loop here to at least 200 generations.

yolov3/train.py

Line 371 in e77ca7e

for _ in range(1): # generations to evolve

@Chida15
Copy link

Chida15 commented Aug 5, 2019

@Chida15 haha, yes, well good job, you've run two different models here, the orange points, and it's showing you the best result highlighted in blue. For this to be effective you want to evolve hundreds of mutations. So I would change the for loop here to at least 200 generations.

yolov3/train.py

Line 371 in e77ca7e

for _ in range(1): # generations to evolve

ok, thanks a lot!

@sanazss
Copy link

sanazss commented Aug 14, 2019

Hi. I am trying to plot the evolution results but get an error that hyp is not defined. I am applying your latest version of repo. Any hint on that?thanks

@sanazss
Copy link

sanazss commented Aug 14, 2019

I solved it.

@glenn-jocher
Copy link
Member Author

@sanazss ah yes, you need to define hyp before running: from utils.utils import *; plot_evolution_results(hyp)

@varghesealex90
Copy link

varghesealex90 commented Aug 20, 2019

momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 120200
policy=steps
steps=70000,100000
scales=.1,.1

I see these params in the cfg file. I would like to use the same parameters . In such case, how would the updated hyp be?

hyp = {'giou': 1.582,  # giou loss gain
       'xy': 4.688,  # xy loss gain
       'wh': 0.1857,  # wh loss gain
       'cls': 27.76,  # cls loss gain  (CE should be around ~1.0)
       'cls_pw': 1.446,  # cls BCELoss positive_weight
       'obj': 21.35,  # obj loss gain
       'obj_pw': 3.941,  # obj BCELoss positive_weight
       'iou_t': 0.2635,  # iou training threshold
       'lr0': 0.002324,  # initial learning rate
       'lrf': -4.,  # final LambdaLR learning rate = lr0 * (10 ** lrf)
       'momentum': 0.97,  # SGD momentum
       'weight_decay': 0.0004569,  # optimizer weight decay
       'hsv_s': 0.5703,  # image HSV-Saturation augmentation (fraction)
       'hsv_v': 0.3174,  # image HSV-Value augmentation (fraction)
       'degrees': 1.113,  # image rotation (+/- deg)
       'translate': 0.06797,  # image translation (+/- fraction)
       'scale': 0.1059,  # image scale (+/- gain)
       'shear': 0.5768}  # image shear (+/- deg)

@glenn-jocher
Copy link
Member Author

@varghesealex90 the hyp dictionary is pretty self explanatory. The key names are the same in many cases to what you have above, i.e. hyp['momentum'] etc.

The parameters we do not use are angle, hue, burn_in. The LR scheduler hyps are set to reduce at 80% and 90% of total epochs with scales of 0.1 and 0.1 already.

In any case, the hyps have been evolved to their present state because they improve performance over what you have, so I would not change them unless you are experimenting.

This was referenced Sep 8, 2019
@DanielChungYi
Copy link

I have a problem here ,
p = p.view(bs, self.na, self.nc + 5, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction
RuntimeError: shape '[6, 3, 10, 13, 13]' is invalid for input of size 18252
I still can't fix it, can anyone help me?

@glenn-jocher
Copy link
Member Author

@DanielChungYi is your error reproducible in a new git clone?

@DanielChungYi
Copy link

@glenn-jocher I did clone the latest version of the code, but the problem still there. Please help me.

@DanielChungYi
Copy link

擷取

@glenn-jocher
Copy link
Member Author

@DanielChungYi ok I see. It's likely an issue with your custom dataset, as we can not reproduce this on the coco data. Unless you can supply a minimum reproducible example on the coco dataset there is not much we can do.

@glenn-jocher
Copy link
Member Author

@DanielChungYi also check your cfg and your number of classes, as you might have a mismatch.

@millermuttu
Copy link

even i am getting the same error

@millermuttu
Copy link

p = p.view(bs, self.na, self.nc + 5, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous()  # prediction

RuntimeError: shape '[64, 3, 8, 10, 10]' is invalid for input of size 19200

@millermuttu
Copy link

@DanielChungYi also check your cfg and your number of classes, as you might have a mismatch.

擷取

did you solved this problem????

@glenn-jocher glenn-jocher removed the help wanted Extra attention is needed label Aug 17, 2020
@goldwater668
Copy link

@glenn-jocher How to use the parameters of evolve.txt in training?

@glenn-jocher
Copy link
Member Author

@hande6688 see the yolov5 tutorials. A new hyp evolution yaml is created that you can point to when training yolov5: python train.py --hyp hyp.evolve.yaml

@github-actions
Copy link

github-actions bot commented Oct 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

github-actions bot commented Nov 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Nov 8, 2020
@glenn-jocher glenn-jocher removed the tutorial Tutorial or example label Nov 26, 2020
@glenn-jocher glenn-jocher unpinned this issue Nov 26, 2020
@nanhui69
Copy link

@glenn-jocher why the result of me about evolve is like the nayyersaahil28 users, and i had been modified code like fig1-2 ,
1
2
evolve

@glenn-jocher glenn-jocher added tutorial Tutorial or example and removed Stale labels Mar 10, 2021
@glenn-jocher
Copy link
Member Author

glenn-jocher commented Mar 10, 2021

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

@nanhui69
Copy link

nanhui69 commented Mar 11, 2021

@glenn-jocher but i have chaged the code "for _ in range(265): # generations to evolve---------------------------" ?
what's problem?

@nanhui69
Copy link

nanhui69 commented Mar 11, 2021

@glenn-jocher python3 train.py --epochs 30 --cache-images --evolve & ? should i need specify the my custom weight ,not the pretrainweight?

@glenn-jocher
Copy link
Member Author

@nanhui69 you can evolve any base scenario, including starting from any pretrained weights. There are no constraints.

@nanhui69
Copy link

@nanhui69 you can evolve any base scenario, including starting from any pretrained weights. There are no constraints.
i only want to solve the problem as the evolve‘s png shows above -- only one point ? what should i do?

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Mar 11, 2021

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

@nanhui69
Copy link

nanhui69 commented Mar 11, 2021

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

oh, i will check it again。。。。。

@nanhui69
Copy link

nanhui69 commented Mar 11, 2021

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

when i follow yolov5 steps ,the evolve.txt have only one line? what's that?

@glenn-jocher
Copy link
Member Author

@nanhui69 YOLOv5 evolution will create an evolve.txt file with 300 lines, one for each generation.

@nanhui69
Copy link

nanhui69 commented Mar 11, 2021

@nanhui69 YOLOv5 evolution will create an evolve.txt file with 300 lines, one for each generation.

so ,what's wrong with me ? and when evolve.txt exists ,to run evolve style, error appears?
hyp[k] = x[i + 7] * v[i] # mutate
IndexError: index 18 is out of bounds for axis 0 with size 18

@ananda1996ai
Copy link

ananda1996ai commented Jun 3, 2021

Hi @glenn-jocher. Can you please indicate how I might modify the fitness function?
I'd like have more weightage on recall followed by precision, rather than equal weightage of MAP and F1.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Jun 3, 2021

@ananda1996ai you can update your fitness function to any weighting you'd like here before evolving:

yolov3/utils/metrics.py

Lines 12 to 16 in ab7ff9d

def fitness(x):
# Model fitness as a weighted combination of metrics
w = [0.0, 0.0, 0.1, 0.9] # weights for [P, R, [email protected], [email protected]:0.95]
return (x[:, :4] * w).sum(1)

Though I would recommend the YOLOv5 🚀 Hyperparameter Evolution tutorial for best results:

YOLOv5 Tutorials

@ananda1996ai
Copy link

ananda1996ai commented Jun 5, 2021

@ananda1996ai you can update your fitness function to any weighting you'd like here before evolving:

yolov3/utils/metrics.py

Lines 12 to 16 in ab7ff9d

def fitness(x):
# Model fitness as a weighted combination of metrics
w = [0.0, 0.0, 0.1, 0.9] # weights for [P, R, [email protected], [email protected]:0.95]
return (x[:, :4] * w).sum(1)

Though I would recommend the YOLOv5 🚀 Hyperparameter Evolution tutorial for best results:

Thanks @glenn-jocher but I need to work with YoloV3 for my project and the hyperparameter evolution tutorial above indicates the fitness function used for evolve is as described in yolov3/utils.py.

yolov3/utils/utils.py

Lines 605 to 608 in bd92457

def fitness(x):
# Returns fitness (for use with results.txt or evolve.txt)
return 0.5 * x[:, 2] + 0.5 * x[:, 3] # fitness = 0.5 * mAP + 0.5 * F1

This is different from the utils/metrics.py you mention above. So in the current yolov3 master branch, just updating the fitness function in metrics.py is enough to change the evolution fitness measure, right?

@sjhsbhqf
Copy link

sjhsbhqf commented Apr 6, 2022

How could anchors of hyperparameter have a mutation rate of 2? Shouldn't it be between zero and one?
image

@glenn-jocher
Copy link
Member Author

@sjhsbhqf good question. This is a gain that is applied to the base mutation gain, so values > 1 do not cause any errors despite the comment showing the 0-1 range. This helps test different anchor counts, since an anchor count is an integer small changes will not modify it's value.

@qutyyds
Copy link

qutyyds commented Apr 24, 2022

你好。我试图绘制进化结果,但得到一个错误,即hyp没有定义。我正在应用您最新版本的存储库。对此有任何提示吗?谢谢
how to do about it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request tutorial Tutorial or example
Projects
None yet
Development

No branches or pull requests