Joint Detection and Classification Convolutional Neural Network on Weakly Labelled Bird Audio Detection

Kong, Qiuqiang, Xu, Yong and Plumbley, Mark D. (2017) Joint Detection and Classification Convolutional Neural Network on Weakly Labelled Bird Audio Detection. In: 25th European Signal Processing Conference (EUSIPCO) 2017, Aug 28 - Sep 2 2017, Kos Island, Greece.

Abstract Bird audio detection (BAD) aims to detect whether there is a bird call in an audio recording or not. One difficulty of this task is that the bird sound datasets are weakly labelled, that is only the presence or absence of a bird in a recording is known, without knowing when the birds call. We propose to apply joint detection and classification (JDC) model on the weakly labelled data (WLD) to detect and classify an audio clip at the same time. First, we apply VGG like convolutional neural network (CNN) on mel spectrogram as baseline. Then we propose a JDC-CNN model with VGG as a classifier and CNN as a detector. We report the denoising method including optimally-modified log-spectral amplitude (OM-LSA), median filter and spectral spectrogram will worse the classification accuracy on the contrary to previous work. JDC-CNN can predict the time stamps of the events from weakly labelled data, so is able to do sound event detection from WLD. We obtained area under curve (AUC) of 95.70% on the development data and 81.36% on the unseen evaluation data, which is nearly comparable to the baseline CNN model.

Link to full paper ⤧ Next post Sobieraj et al. (2017a) ⤧ Previous post Kroos et al. (2017)

Kong et al. (2017a)