You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried the pretrained model with mel spectrograms directly generated from singing samples and it does not sound as good as: https://github.com/yl4579/HiFTNet
So could the architecture be the reason for the difference in quality or does Vocos just need to be additionaly trained on singing to reach the same quality?
The text was updated successfully, but these errors were encountered:
I tried the pretrained model with mel spectrograms directly generated from singing samples and it does not sound as good as: https://github.com/yl4579/HiFTNet
The linked vocoder uses neural source-filter (https://nii-yamagishilab.github.io/samples-nsf/) like some other singing voice conversion models.
So could the architecture be the reason for the difference in quality or does Vocos just need to be additionaly trained on singing to reach the same quality?
The text was updated successfully, but these errors were encountered: