SINGING VOICE SEPARATION USING DEEP NEURAL NETWORKS AND F0 ESTIMATION

Abstract

Deep Neural Networks (DNN) have become a popular approach for speech enhancement, and singing voice separation. DNNs are typically trained to estimate a time-frequency mask using ground truth examples. In this submission, we combine DNN estimation as a first step with traditional refinement via F0 estimation, using the YINFFT algorithm. Extended abstract

Implementation

This is our submission to the MIREX 2016 Singing voice separation task (results). The software is distributed under BSD license.

Download