32kHz Vocos Multi Speaker Model Training Log #48

LEECHOONGHO · 2024-02-22T01:25:28Z

Training Loss, Generated Outputs.

I hope this will be a reference for model training.

https://api.wandb.ai/links/xi-speech-team/k0kdfwch

patriotyk · 2024-04-15T20:40:55Z

Do you have a standard tensorboard logs? It is interesting to compare.

LEECHOONGHO · 2024-04-16T01:11:42Z

@patriotyk Sorry, I've change the code to log on WandB server. I have no local logging files nor tensorboard logs.

patriotyk · 2024-04-16T05:01:01Z

What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.

Jon-Zbw · 2024-04-22T02:25:21Z

Training Loss, Generated Outputs.

I hope this will be a reference for model training.

https://api.wandb.ai/links/xi-speech-team/k0kdfwch

TKS for your work,could your share 32k model training detail like:
your encodec model(i found pretrained models :24k and 48k,so i guess 32k resample to 24k or 48k for encodec pretrained model,then resample to 32k ??)

LEECHOONGHO · 2024-04-23T12:38:49Z

Training Loss, Generated Outputs.
I hope this will be a reference for model training.
https://api.wandb.ai/links/xi-speech-team/k0kdfwch

TKS for your work,could your share 32k model training detail like: your encodec model(i found pretrained models :24k and 48k,so i guess 32k resample to 24k or 48k for encodec pretrained model,then resample to 32k ??)

I'm sry for your confuse.
I just trained Mel Vocoder not for encodec's decoder.

But I have plans to train Mel-Encodec?(Mel Spectrogram to RVQ Encoder, and Vocos Decoder for Various Speech data) in the future.

LEECHOONGHO · 2024-04-23T12:49:18Z

Do you have a standard tensorboard logs? It is interesting to compare.

What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.

I estimated mel loss, and Generator loss with newly gained dataset. and each was 0.0942 and 2.82.
Because of the dataset's Size, estimating Eval loss with eval dataset have no difference with sampled train data.

how about your model output's quality? any artifacts?

patriotyk · 2024-05-02T12:46:26Z

Do you have a standard tensorboard logs? It is interesting to compare.

What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.

I estimated mel loss, and Generator loss with newly gained dataset. and each was 0.0942 and 2.82. Because of the dataset's Size, estimating Eval loss with eval dataset have no difference with sampled train data.

how about your model output's quality? any artifacts?

I am still training(third week). It is very slow. I will update with my results when finish.

Mahmoud-ghareeb · 2024-05-07T07:48:44Z

how much data do we need for training

patriotyk · 2024-05-11T20:23:59Z

@LEECHOONGHO I have published my model here https://huggingface.co/patriotyk/vocos-mel-hifigan-compat-44100khz
Sounds great, and there is metrics.
@Mahmoud-ghareeb My model has been trained on 800+ hours of audio. Vocoder doesn't require text transcripts so you can easily use audio books for training. You even don't need to cut it by silence because vocos anyway internally splits provided audios to smaller segments.

Mahmoud-ghareeb · 2024-05-11T20:30:44Z

Great work! @patriotyk, Thank you so much

bzp83 · 2024-06-13T10:53:47Z

@LEECHOONGHO I have published my model here https://huggingface.co/patriotyk/vocos-mel-hifigan-compat-44100khz Sounds great, and there is metrics. @Mahmoud-ghareeb My model has been trained on 800+ hours of audio. Vocoder doesn't require text transcripts so you can easily use audio books for training. You even don't need to cut it by silence because vocos anyway internally splits provided audios to smaller segments.

I'm new to this... Could you please tell me what's the purpose of sharing the model? I mean, when I try to use it with a wav file, the output is very close to the original input file... So I'm confused here.

Thank you

patriotyk · 2024-06-13T11:31:32Z

This model generates audio from mel spectrograms. The functionality that you tried just generates mel from audio and then back audio from mel. But real tts systmes generate mels directly from text then vocoder generates audio.

bzp83 · 2024-06-13T11:36:35Z

Ah ok so generating mel from audio is different from what tts systems do? Is there any code snippet that would let me test the model you trained (ans possibly others)? Thank you!

wetdog mentioned this issue Apr 24, 2024

Quality wetdog/wavenext_pytorch#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

32kHz Vocos Multi Speaker Model Training Log #48

32kHz Vocos Multi Speaker Model Training Log #48

LEECHOONGHO commented Feb 22, 2024 •

edited

Loading

patriotyk commented Apr 15, 2024

LEECHOONGHO commented Apr 16, 2024

patriotyk commented Apr 16, 2024 •

edited

Loading

Jon-Zbw commented Apr 22, 2024

LEECHOONGHO commented Apr 23, 2024

LEECHOONGHO commented Apr 23, 2024 •

edited

Loading

patriotyk commented May 2, 2024

Mahmoud-ghareeb commented May 7, 2024 •

edited

Loading

patriotyk commented May 11, 2024 •

edited

Loading

Mahmoud-ghareeb commented May 11, 2024

bzp83 commented Jun 13, 2024

patriotyk commented Jun 13, 2024 •

edited

Loading

bzp83 commented Jun 13, 2024

32kHz Vocos Multi Speaker Model Training Log #48

32kHz Vocos Multi Speaker Model Training Log #48

Comments

LEECHOONGHO commented Feb 22, 2024 • edited Loading

patriotyk commented Apr 15, 2024

LEECHOONGHO commented Apr 16, 2024

patriotyk commented Apr 16, 2024 • edited Loading

Jon-Zbw commented Apr 22, 2024

LEECHOONGHO commented Apr 23, 2024

LEECHOONGHO commented Apr 23, 2024 • edited Loading

patriotyk commented May 2, 2024

Mahmoud-ghareeb commented May 7, 2024 • edited Loading

patriotyk commented May 11, 2024 • edited Loading

Mahmoud-ghareeb commented May 11, 2024

bzp83 commented Jun 13, 2024

patriotyk commented Jun 13, 2024 • edited Loading

bzp83 commented Jun 13, 2024

LEECHOONGHO commented Feb 22, 2024 •

edited

Loading

patriotyk commented Apr 16, 2024 •

edited

Loading

LEECHOONGHO commented Apr 23, 2024 •

edited

Loading

Mahmoud-ghareeb commented May 7, 2024 •

edited

Loading

patriotyk commented May 11, 2024 •

edited

Loading

patriotyk commented Jun 13, 2024 •

edited

Loading