You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @blue-blue272, we use two ways to get captions for AudioSet:
We crawl YouTube titles for the corresponding YouTube videos. The YouTube titles may or may not be related to the audio. Then we use MS CLAP to filter out not aligned audio and YouTube titles. The filtered out data comes to be around ~400k audio-text pairs
We use K2C augmentation proposed in the LAION paper (https://arxiv.org/abs/2211.06687) to generate captions for the rest of the AudioSet
AudioSet only contains audio and event labels. How do you obtain the caption description for audios in the audioset dataset?
The text was updated successfully, but these errors were encountered: