Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with vggish checkpoint #13

Open
luc-leonard opened this issue Feb 3, 2022 · 9 comments
Open

Issue with vggish checkpoint #13

luc-leonard opened this issue Feb 3, 2022 · 9 comments

Comments

@luc-leonard
Copy link

Hello.

the vggishish_lpaps checkpoint is used here:

Errors are ignored in the code, but neither lpaps, nor vggishish manage to load it.

The checkpoint URL is here:

'vggishish_lpaps': 'https://a3s.fi/swift/v1/AUTH_a235c0f452d648828f745589cde1219a/specvqgan_public/vggishish16.pt',

The vggish weights can be found under the 'model' key, but I cannot find the lpaps weights anywhere in here. Are they not required ?

Best regards,

@v-iashin
Copy link
Owner

v-iashin commented Feb 3, 2022

Hi, I checked the code and I think you are right! Thanks a lot for the catch! I will commit the fixes.

@luc-leonard
Copy link
Author

Thanks you very much for the very quick answer and fix :D

@jwliu-cc
Copy link

The loss is going to 'nan' when i load the correct ckpt, do you have this problem? I trained on VAS dataset.

@yangdongchao
Copy link

yangdongchao commented Apr 18, 2022

Hi, I checked the code and I think you are right! Thanks a lot for the catch! I will commit the fixes.

Hi, I want to ask about the parameter of lpaps. The vggishish16 model is trained by vggsound. I want to know how you get the parameter of followwing layers? Whether you directly use the pre-trained model from taming transformer
self.lin0 = NetLinLayer(self.chns[0], use_dropout=use_dropout) self.lin1 = NetLinLayer(self.chns[1], use_dropout=use_dropout) self.lin2 = NetLinLayer(self.chns[2], use_dropout=use_dropout) self.lin3 = NetLinLayer(self.chns[3], use_dropout=use_dropout) self.lin4 = NetLinLayer(self.chns[4], use_dropout=use_dropout)

@v-iashin
Copy link
Owner

You may train them by adapting https://github.com/richzhang/PerceptualSimilarity script.

@yangdongchao
Copy link

You may train them by adapting https://github.com/richzhang/PerceptualSimilarity script.

Can you share the code that you use vggsound dataset to train lpaps?

@v-iashin
Copy link
Owner

Ok, I managed to look into this issue for a bit more.

Thanks to your questions I discovered that this problem is actually deeper than I originally anticipated. It seems that I completely missed that NetLinLayer layers have trainable parameters and only relied on training VGGishish. I think because the code did not complain about loading the checkpoint, as the topic starter noticed, I just moved on.

What happens is that these layers are actually randomly inited and, luckily, the model could even train to such great quality — thanks to the GAN loss. This means, that you can just drop the perceptual loss from the model and it will train much faster and to the same performance. On the practical side, it seems that having this dorky loss you may still get a bit of a boost in quality.

@yangdongchao
Copy link

Ok, I managed to look into this issue for a bit more.

Thanks to your questions I discovered that this problem is actually deeper than I originally anticipated. It seems that I completely missed that NetLinLayer layers have trainable parameters and only relied on training VGGishish. I think because the code did not complain about loading the checkpoint, as the topic starter noticed, I just moved on.

What happens is that these layers are actually randomly inited and, luckily, the model could even train to such great quality — thanks to the GAN loss. This means, that you can just drop the perceptual loss from the model and it will train much faster and to the same performance. On the practical side, it seems that having this dorky loss you may still get a bit of a boost in quality.

Thanks for your reply. I understand it.

@v-iashin
Copy link
Owner

Today I had a chance to inspect the issue a bit more thanks to @jhyau.

It seems that @jwliu-cc was right and these fixes let codebook training diverge to nans. For this reason, I am resetting the commits mentioned in this issue to the initial well-tested state despite having this nasty bug with vggish and lpaps checkpoint loading 🙁 .

Current solution:
perceptual_weight=0.0

This means that those who want to build upon SpecVQGAN could turn off the perceptual loss by setting the weight to zero and benefit from a significant speedup during training. This, however, would yield slightly different results which, according to our ablations, are still strong.

I also added a notice about it in README for other people to see.

jhyau added a commit to jhyau/SpecVQGAN that referenced this issue Jun 12, 2022
This reverts commit 3894458.

Reverting due to seeing nans in loss during codebook training
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants