You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @jasonppy, the loss computation during training looks like:
outputs = model(audios, texts_enc, texts_dec)
where model is the PENGI model, audios is float32 tensor, texts_enc and texts_dec are tokenized text input and text output.
logits = outputs.logits[:, total_prefix_length - 1: -1]
Remove the outputs corresponding to the total prefix length. This is equal to length of audio projection and length of input text
loss = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), texts_dec['input_ids'].flatten(), ignore_index=0)
Compute cross entropy per token and average. Swap 0 with whichever token index is used for padding.
For texts_dec in step 1, make sure to add/prepend ones equal to the total prefix length to the attention mask of tokenized text.
Hi authors,
I'm trying to get the likelihood of Pengi on (audio, question, answer) tuples, but wasn't able to do so. Is it possible to have get some help on this?
I think probably this forward function calculated the loss: https://github.com/microsoft/Pengi/blob/main/models/pengi.py#L174
where
audio
should be the output of preprocess_audio,texts_enc
should be the output of running preprocess_text on question,texts_dec
should be the output of running preprocess_text on answer. However I wasn't able to loss from the output, even if I passlabel = texts_dec['input_ids']
(https://github.com/microsoft/Pengi/blob/main/models/decoder.py#L219) , I still get bugs on dimension when calculating cross_entropy lossYour help is greatly appreciated.
Best,
Puyuan
The text was updated successfully, but these errors were encountered: