Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the likelihood from the model? #15

Closed
jasonppy opened this issue Jun 16, 2024 · 1 comment
Closed

How to get the likelihood from the model? #15

jasonppy opened this issue Jun 16, 2024 · 1 comment

Comments

@jasonppy
Copy link

jasonppy commented Jun 16, 2024

Hi authors,

I'm trying to get the likelihood of Pengi on (audio, question, answer) tuples, but wasn't able to do so. Is it possible to have get some help on this?

I think probably this forward function calculated the loss: https://github.com/microsoft/Pengi/blob/main/models/pengi.py#L174
where audio should be the output of preprocess_audio, texts_enc should be the output of running preprocess_text on question, texts_dec should be the output of running preprocess_text on answer. However I wasn't able to loss from the output, even if I pass label = texts_dec['input_ids'] (https://github.com/microsoft/Pengi/blob/main/models/decoder.py#L219) , I still get bugs on dimension when calculating cross_entropy loss

Your help is greatly appreciated.

Best,
Puyuan

@soham97
Copy link
Contributor

soham97 commented Jul 9, 2024

Hi @jasonppy, the loss computation during training looks like:

  1. outputs = model(audios, texts_enc, texts_dec)
    where model is the PENGI model, audios is float32 tensor, texts_enc and texts_dec are tokenized text input and text output.
  2. logits = outputs.logits[:, total_prefix_length - 1: -1]
    Remove the outputs corresponding to the total prefix length. This is equal to length of audio projection and length of input text
  3. loss = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), texts_dec['input_ids'].flatten(), ignore_index=0)
    Compute cross entropy per token and average. Swap 0 with whichever token index is used for padding.

For texts_dec in step 1, make sure to add/prepend ones equal to the total prefix length to the attention mask of tokenized text.

@soham97 soham97 closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants