New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

About the audio-text pair of AudioSet dataset. #16

Open

blue-blue272 opened this issue Jun 29, 2024 · 1 comment

blue-blue272 commented Jun 29, 2024

AudioSet only contains audio and event labels. How do you obtain the caption description for audios in the audioset dataset?

Contributor

soham97 commented Jul 9, 2024 •

edited

Loading

Hi @blue-blue272, we use two ways to get captions for AudioSet:

We crawl YouTube titles for the corresponding YouTube videos. The YouTube titles may or may not be related to the audio. Then we use MS CLAP to filter out not aligned audio and YouTube titles. The filtered out data comes to be around ~400k audio-text pairs
We use K2C augmentation proposed in the LAION paper (https://arxiv.org/abs/2211.06687) to generate captions for the rest of the AudioSet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment