Kong, Qiuqiang, Sobieraj, Iwnoa, Wang, Wenwu and Plumbley, Mark D. (2016). “Deep neural network baseline for DCASE challenge 2016.”, DCASE 2016 Workshop, Budapest.

Abstract

The DCASE Challenge 2016 contains tasks for Acoustic Acene Classification (ASC), Acoustic Event Detection (AED), and audio tagging. Since 2006, Deep Neural Networks (DNNs) have been widely applied to computer visions, speech recognition and natural language processing tasks. In this paper, we provide DNN baselines for the DCASE Challenge 2016. For feature extraction, 40 Mel-filter bank features are used. Two kinds of Mel banks, same area bank and same height bank are discussed. Experimental results show that the same height bank is better than the same area bank. DNNs with the same structure are applied to all four tasks in the DCASE Challenge 2016. In Task 1 we obtained accuracy of 76.4% using Mel + DNN against 72.5% by using Mel Frequency Ceptral Coefficient (MFCC) + Gaussian Mixture Model (GMM). In Task 2 we obtained F value of 17.4% using Mel + DNN against 41.6% by using Constant Q Transform (CQT) + Nonnegative Matrix Factorization (NMF). In Task 3 we obtained F value of 38.1% using Mel + DNN against 26.6% by using MFCC + GMM. In task 4 we obtained Equal Error Rate (ERR) of 20.9% using Mel + DNN against 21.0% by using MFCC + GMM. Therefore the DNN improves the baseline in Task 1 and Task 3, and is similar to the baseline in Task 4, although is worse than the baseline in Task 2. This indicates that DNNs can be successful in many of these tasks, but may not always work.

Link to full paper ⤧  Next post Kong et al. (2016b) ⤧  Previous post Binge-watching videos teaches computers to recognise sounds