Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the audio-text pair of AudioSet dataset. #16

Open
blue-blue272 opened this issue Jun 29, 2024 · 1 comment
Open

About the audio-text pair of AudioSet dataset. #16

blue-blue272 opened this issue Jun 29, 2024 · 1 comment

Comments

@blue-blue272
Copy link

AudioSet only contains audio and event labels. How do you obtain the caption description for audios in the audioset dataset?

@soham97
Copy link
Contributor

soham97 commented Jul 9, 2024

Hi @blue-blue272, we use two ways to get captions for AudioSet:

  • We crawl YouTube titles for the corresponding YouTube videos. The YouTube titles may or may not be related to the audio. Then we use MS CLAP to filter out not aligned audio and YouTube titles. The filtered out data comes to be around ~400k audio-text pairs
  • We use K2C augmentation proposed in the LAION paper (https://arxiv.org/abs/2211.06687) to generate captions for the rest of the AudioSet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants