Accepted Papers

Robust Multilinear Tensor Decomposition with Rank Estimation
Xu Han, Laurent Albera, Kachenoura Amar, Huazhong Shu, Lotfi Senhadji

The famous HOSVD (Higher-Order Singular Value Decomposition) method is sensitive to low signal-to-noise ratio values even the multilinear ranks are known. In this paper, we present a novel method for Multilinear Tensor Decomposition (MTD) which is more robust with respect to the presence of noise and which allows for rank estimation. Experiments on simulated noisy tensors and real-world data show the effectiveness the proposed method compared with classical algorithms.
Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition
Mathieu Fontaine, Fabian-Robert Stöter, Antoine Liutkus, Umut Simsekli, Roland Badeau, Romain Serizel

This paper introduces a new method for multichannel speech enhancement based on a versatile modeling of the residual noise spectrogram. Such a model has already been presented before in the single channel case where the noise component is assumed to follow an alpha-stable distribution for each time-frequency bin, whereas the speech spectrogram, supposed to be more regular, is modeled as Gaussian. In this paper, we describe a multichannel extension of this model, as well as a Monte Carlo Expectation - Maximisation algorithm for parameters estimation. In particular, a multichannel extension of the Itakura-Saito nonnegative matrix factorization is exploited to estimate the spectral parameters for speech, and a Metropolis-Hastings algorithm is proposed to estimate the noise contribution.
Sum Conditioned Poisson Factorization
Gokhan Capan, Semih Akbayrak, Taha Ceritli, Ali Taylan Cemgil

We develop an extension to Poisson factorization, to model Multinomial data using a moment parametrization. Our construction is an alternative to the canonical construction of generalized linear models. This is achieved by defining K component Poisson Factorization models and constraining the sum of observation tensors across components. A family of fully conjugate tensor decomposition models for binary, ordinal or multinomial data is devised as a result, which can be used as a generic building block in hierarchical models for arrays of such data. We give parameter estimation and approximate inference procedures based on Expectation Maximization and variational inference. The flexibility of the resulting model on binary and ordinal matrix factorizations is illustrated. Empirical evaluation is performed for movie recommendation on ordinal ratings matrix, and for knowledge graph completion on binary tensors. The model is tested for both prediction and producing ranked lists.
Curve Registered Coupled Low Rank Factorization
Jérémy Cohen, Rodrigo Cabral-Farias, Bertrand Rivet

We propose an extension of the canonical polyadic (CP) tensor model where one of the latent factors is allowed to vary through data slices in a constrained way. The components of the latent factors, which we want to retrieve from data, can vary from one slice to another up to a diffeomorphism. We suppose that the diffeomorphisms are also unknown, thus merging curve registration and tensor decomposition in one model, which we call registered CP. We present an algorithm to retrieve both the latent factors and the diffeomorphism, which is assumed to be in a parametrized form. At the end of the paper, we show simulation results comparing registered CP with other models from the literature.
Source Analysis using Block Term Decomposition in Atrial Fibrillation
Pedro Marinho, Vicente Zarzoso

Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia in clinical practice, and is becoming a major public health concern. The mechanisms of this arrhythmia are not completely understood, and to a better understanding of them it is necessary an accurate analysis of the atrial activity (AA) signal. The block term decomposition (BTD) has been recently proposed as a tool to extract the AA in AF signals. In this paper, a deep analysis of the estimated sources by BTD is made, showing that the classical method to select the atrial source among the other sources may not work in some cases, even for the matrix-based methods. Is then proposed an automated method to precisely select the atrial source, considering a new parameter that has not been used before. Experimental results show the validity of this method not only for the BTD estimated sources, but also for the matrix-based methods.
Some Issues in Computing the CP Decomposition of NonNegative Tensors
Mohamad Jouni, Mauro Dalla Mura, Pierre Comon

We point out a problem existing in nonnegative tensor decompositions, stemming from the representation of decomposable tensors by outer products of vectors, and propose approaches to solve it. In fact, a scaling indeterminacy appears whereas it is not inherent in the decomposition, and the choice of scaling factors has an impact during the execution of iterative algorithms and should not be overlooked. Computer experiments support the interest in the greedy algorithm proposed, in the case of the CP decomposition.
Nonnegative PARAFAC2: A Flexible Coupling Approach
Jérémy Cohen, Rasmus Bro

Modeling variability in tensor decomposition methods is one of the challenges of source separation. One possible solution to account for variations from one data set to another, jointly analysed, is to resort to the PARAFAC2 model. However, so far imposing constraints on the mode with variability has not been possible. In the following manuscript, a relaxation of the PARAFAC2 model is introduced, that allows for imposing nonnegativity constraints on the varying mode. An algorithm to compute the proposed flexible PARAFAC2 model is derived, and its performance is studied on both synthetic and chemometrics data.
Applications of Polynomial Common Factor Computation in Signal Processing
Ivan Markovsky, Antonio Fazzi, Nicola Guglielmi

We consider the problem of computing the greatest common divisor of a set of univariate polynomials and present applications of this problem in system theory and signal processing. One application is blind system identification: given the responses of a system to unknown inputs, find the system. Assuming that the unknown system is finite impulse response and at least two experiments are done with inputs that have finite support and their Z-transforms have no common factors, the impulse response of the system can be computed up to a scaling factor as the greatest common divisor of the Z-transforms of the outputs. Other applications of greatest common divisor problem in system theory and signal processing are finding the distance of a system to the set of uncontrollable systems and common dynamics estimation in a multi-channel sum-of-exponentials model.
Joint Nonnegative Matrix Factorization for Underdetermined Blind Source Separation in Nonlinear Mixtures
Ivica Kopriva

An approach is proposed for underdetermined blind separation of nonnegative dependent (overlapped) sources from their nonlinear mixtures. The method performs empirical kernel maps based mappings of original data matrix onto reproducible kernel Hilbert spaces (RKHSs). Provided that sources comply with probabilistic model that is sparse in support and amplitude nonlinear underdetermined mixture model in the input space becomes overdetermined linear mixture model in RKHS comprised of original sources and their mostly second-order monomials. It is assumed that linear mixture models in different RKHSs share the same representation, i.e. the matrix of sources. Thus, we propose novel sparseness regularized joint nonnegative matrix factorization method to separate sources shared across different RKHSs. The method is validated comparatively on numerical problem related to extraction of eight overlapped sources from three nonlinear mixtures.
On the Number of Signals in Multivariate Time Series
Markus Matilainen, Klaus Nordhausen, Joni Virta

We assume a second-order source separation model where the observed multivariate time series is a linear mixture of latent, temporally uncorrelated time series with some components pure white noise. To avoid the modelling of noise, we extract the non-noise latent components using some standard method, allowing the modelling of the extracted univariate time series individually. An important question is the determination of which of the latent components are of interest in modelling and which can be considered as noise. Bootstrap-based methods have recently been used in determining the latent dimension in various methods of unsupervised and supervised dimension reduction and we propose a set of similar estimation strategies for second-order stationary time series. Simulation studies and a sound wave example are used to show the method's effectiveness.
A Generative Model for Natural Sounds Based on Latent Force Modelling
William Wilkinson, Joshua Reiss, Dan Stowell

Generative models based on subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitude modulation to be a crucial component of auditory perception. Probabilistic latent variable analysis can be particularly insightful, but existing approaches don't incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay or feedback. We use latent force modelling, a probabilistic learning paradigm that encodes physical knowledge into Gaussian process regression, to model correlation across spectral subband envelopes. We augment the standard latent force model approach by explicitly modelling dependencies across multiple time steps. Incorporating this prior knowledge strengthens the interpretation of the latent functions as the source that generated the signal. We examine this interpretation via an experiment showing that sounds generated by sampling from our probabilistic model are perceived to be more realistic than those generated by comparative models based on nonnegative matrix factorisation, even in cases where our model is outperformed from a reconstruction error perspective.
Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
Emad M. Grais, Hagen Wierstorf, Dominic Ward, Mark D. Plumbley

In deep neural networks with convolutional layers, all the neurons in each layer typically have the same size receptive fields (RFs) with the same resolution. Convolutional layers with neurons that have large RF capture global information from the input features, while layers with neurons that have small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural networks (MR-FCN), where each layer has a range of neurons with different RF sizes to extract multi-resolution features that capture the global and local information from its input features. The proposed MR-FCN is applied to separate the singing voice from mixtures of music sources. Experimental results show that using MR-FCN improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional neural networks (FCNs) on the audio source separation problem.
Application of Independent Component Analysis to Tumor Transcriptomes Reveals Specific And Reproducible Immune-related Signals
Urszula Czerwinska, Laura Cantini, Ulykbek Kairov, Emmanuel Barillot, Andrei Zinovyev

Independent Component Analysis (ICA) can be used to model gene expression data as an action of a set of statistically independent hidden factors. The ICA analysis with a downstream component analysis was successfully applied to transcriptomic data previously in order to decompose bulk transcriptomic data into interpretable hidden factors. Some of these factors reflect the presence of an immune infiltrate in the tumor environment. However, no foremost studies focused on reproducibility of the ICA-based immune-related signal in the tumor transcriptome. In this work, we use ICA to detect immune signals in six independent transcriptomic datasets. We observe several strongly reproducible immune-related signals when ICA is applied in sufficiently high-dimensional space (close to one hundred). Interestingly, we can interpret these signals as cell-type specific signals reflecting a presence of T-cells, B-cells and myeloid cells, which are of high interest in the field of oncoimmunology. Further quantification of these signals in tumoral transcriptomes has a therapeutic potential.
Probit Latent Variables Estimation for a Gaussian Process Classifier. Application to the Detection of High-Voltage Spindles
Rémi Souriau, Vincent Vigneron, Jean Lerbet, Hsin Chen

The Deep Brain Stimulation (DBS) is a surgical procedure efficient to relieve symptoms of some neurodegenerative disease like the Parkinson’s disease (PD). However, apply permanently the deep brain stimulation due to the lack of possible control lead to several side effects. Recent studies shown the detection of High-Voltage Spindles (HVS) in local field potentials is an interesting way to predict the arrival of symptoms in PD people. The complexicity of signals and the short time lag between the apparition of HVS and the arrival of symptoms make it necessary to have a fast and robust model to classify the presence of HVS (\(Y=1\)) or not (\(Y=-1\)) and to apply the DBS only when needed. In this paper, we focus on a Gaussian process model. It consists to estimate the latent variable \(f\) of the probit model: \(\operatorname{Pr}(Y=1|input) = \Phi(f(input))\) with \(\Phi\) the distribution function of the standard normal distribution.
Multichannel Audio Source Separation Exploiting NMF-based Generic Source Spectral Model in Gaussian Modeling Framework
Thanh Duong, Ngoc Duong, Cong-Phuong Nguyen, Quoc Cuong Nguyen

Nonnegative matrix factorization (NMF) has been well-known as a powerful spectral model for audio signals. Existing work, including ours, has investigated the use of generic source spectral models (GSSM) based on NMF for single-channel audio source separation and shown its efficiency in different settings. This paper extends the work to multichannel case where the GSSM is combined with the source spatial covariance model within a unified Gaussian modeling framework. Especially, unlike a conventional combination where the estimated variances of each source are further constrained by NMF separately, we propose to constrain the total variances of all sources altogether and found a better separation performance. We present the expectation-maximization (EM) algorithm for the parameter estimation. We demonstrate the effectiveness of the proposed approach by using a benchmark dataset provided within the 2016 Signal Separation Evaluation Campaign.
Orthogonality-Regularized Masked NMF with KL-Divergence for Learning on Weakly Labeled Audio Data
Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

Non-negative Matrix Factorization (NMF) is a well established tool for audio analysis. However, it is not well suited for learning on weakly labeled data, i.e. data where the exact timestamp of the sound of interest is not known. To overcome this shortcoming of NMF, we recently proposed the Orthogonality-Regularized Masked NMF (ORM-NMF), that allows to extract meaningful representations from weakly labeled audio data. Here we extend the method to allow it to use the generalized Kullback-Leibler (KL) divergence as a cost function of NMF. We demonstrate that the proposed Orthogonality-Regularized Masked NMF with KL divergence can be used for Audio Event Detection of rare events and evaluate the method on the development data from Task2 of DCASE2017 Challenge.
Convergence of Jacobi-Type Algorithms for Simultaneous Approximate Diagonalization of Matrices or Tensors
Konstantin Usevich, Jianze Li and Pierre Comon

Approximate orthogonal/unitary diagonalization of matrices and tensors is at the core of many source separation algorithm. We consider a family of Jacobi-type algorithms for approximate diagonalization (including the JADE and CoM algorithms). We report recent results on local and globalconvergence of these algorithms.
Accelerating Likelihood Optimization for ICA on Real Signals
Pierre Ablin, Jean-Francois Cardoso, Alexandre Gramfort

We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA). We consider both the unconstrained problem and the problem constrained to white signals. The Hessian of the objective function is costly to compute, which renders Newton's method impractical for large data sets. Many algorithms proposed in the literature can be rewritten as quasi-Newton methods, for which the Hessian approximation is cheap to compute. These algorithms are very fast on simulated data where the linear mixture assumption really holds. However, on real signals, we observe that their rate of convergence can be severely impaired. In this paper, we investigate the origins of this behavior, and show that the recently proposed Preconditioned ICA for Real Data (Picard) algorithm overcomes this issue on both constrained and unconstrained problems.
Orthogonally-Constrained Extraction of Independent Non-Gaussian Component from Non-Gaussian Background without ICA
Zbynek Koldovsky, Petr Tichavsky, Nobutaka Ono

We propose a new algorithm for Independent Component Extraction that extracts one non-Gaussian component and is capable to exploit the non-Gaussianity of background signals without decomposing them into independent components. The algorithm is suitable for situations when the signal to be extracted is determined through initialization; it shows an extra stable convergence when the target component is dominant. In simulations, the proposed method is compared with Natural Gradient and One-unit FastICA, and it yields improved results in terms of the Signal-to-Interference ratio and the number of successful extractions.
A New Link Between Joint Blind Source Separation Using Second Order Statistics and the Canonical Polyadic Decomposition
Dana Lahat, Christian Jutten

In this paper, we discuss the joint blind source separation (JBSS) of real-valued Gaussian stationary sources with uncorrelated samples from a new perspective. We show that the second-order statistics of the observations can be reformulated as a coupled decomposition of several tensors. The canonical polyadic decomposition (CPD) of each such tensor, if unique, results in the identification of one or two mixing matrices. The proposed new formulation implies that standard algorithms for joint diagonalization and CPD may be used to estimate the mixing matrices, although only in a sub-optimal manner. We discuss the uniqueness and identifiability of this new approach. We demonstrate how the proposed approach can bring new insights on the uniqueness of JBSS in the presence of underdetermined mixtures.
A Blind Source Separation Method based on Output Nonlinear Correlation for Bilinear Mixtures
Andréa Guerrero, Yannick Deville, Shahram Hosseini

In this paper, a blind source separation method for bilinear mixtures of two source signals is presented, that relies on nonlinear correlation between separating system outputs. An estimate of each source is created by linearly combining observed mixtures and maximizing a cost function based on the correlation between the element-wise product of the estimated sources and the corresponding quadratic term. A proof of the method separability, i.e. of the uniqueness of the solution to the cost function maximization problem, is also given. The algorithm used in this work is also presented. Its effectiveness is demonstrated through tests with artificial mixtures created with real Earth observation spectra. The proposed method is shown to yield much better performance than a state-of-the-art method.
Using Taylor Series Expansions and Second-Order Statistics for Blind Source Separation in Post-Nonlinear Mixtures
Denis Gustavo Fantinato, Leonardo Tomazeli Duarte, Yannick Deville, Christian Jutten, Romis Attux, Aline Neves

In the context of Post-Nonlinear (PNL) mixtures, source separation based on Second-Order Statistics (SOS) is a challenging topic due to the inherent difficulties when dealing with nonlinear transformations. Under the assumption that sources are temporally colored, the existing SOS-inspired methods require the use of Higher-Order Statistics (HOS) as a complement in certain stages of PNL demixing. However, a recent study has shown that the sole use of SOS is sufficient for separation if certain constraints on the separation system are obeyed. In this paper, we propose the use of a PNL separating model based on constrained Taylor series expansions which is able to satisfy the requirements that allow a successful SOS-based source separation. The simulation results corroborate the proposal effectivenes
New Classes of Blind Quantum Source Separation and Process Tomography Methods based on Spin Component Measurements along Two Directions
Yannick Deville, Alain Deville

We here present major extensions of the fields of blind quantum source separation (BQSS) and blind quantum process tomography (BQPT) for the Heisenberg Hamiltonian. They are based on a new type of spin component measurements performed directly for the available quantum states, which yields new nonlinear mixing models, with extended source signals and mixing parameters. The first two types of proposed BQSS and/or BQPT methods are based on quantum-source independent component analysis. They therefore require typically one thousand quantum states to estimate the mixing parameters, and some of them yield closed-form solutions. We then define a complementary, inversion-based, BQSS/BQPT method which requires only one quantum state, but which is based on solving nonlinear equations numerically.
A Grassmannian Minimum Enclosing Ball Approach for Common Subspace Extraction
Emilie Renard, Kyle A. Gallivan, Pierre-Antoine Absil

We study the problem of finding a subspace representative of multiple datasets by minimizing the maximal dissimilarity between this subspace and all the subspaces generated by those datasets. After arguing for the choice of the dissimilarity function, we derive some properties of the corresponding formulation. We propose an adaptation of an algorithm used for a similar problem on Riemannian manifolds. Experiments on synthetic data show that the subspace recovered by our algorithm is closer to the true common subspace than the solution obtained using an SVD.
Decoupling Multivariate Functions Using Second-Order Information and Tensors
Philippe Dreesen, Jeroen De Geeter, Mariya Ishteva

The power of multivariate functions is their ability to model a wide variety of phenomena, but have the disadvantages that they lack an intuitive or interpretable representation, and often require a (very) large number of parameters. We study decoupled representations of multivariate vector functions, which are linear combinations of univariate functions in linear combinations of the input variables. This model structure provides a description with fewer parameters, and reveals the internal workings in a simpler way, as the nonlinearities are one-to-one functions. In earlier work, a tensor-based method was developed for performing this decomposition by using first-order derivative information. In this article, we generalize this method and study how the use of second-order derivative information can be incorporated. By doing this, we are able to push the method towards more involved configurations, while preserving uniqueness of the underlying tensor decompositions. Furthermore, even for some non-identifiable structures, the method seems to return a valid decoupled representation. These results are a step towards more general data-driven and noise-robust tensor-based framework for computing decoupled function representations.
Blind Signal Separation by Synchronized Joint Diagonalization
Hiroshi Sawada

Joint Diagonalization (JD) is a well-known method for blind signal separation (BSS) by exploiting the nonstationarity of signals. In this paper, we propose Synchronized Joint Diagonalization (SJD) that solves multiple JD problems simultaneously and tries to synchronize the activity of the same signal along the time axis over the multiple JD problems. SJD attains not only signal separation by the mechanism of JD but also permutation alignment by the synchronization when applied to frequency-domain BSS. Although the formulation of SJD starts from the minimization of multi-channel Itakura-Saito divergences between a covariance matrix and a diagonal matrix, the simplified cost function with the finest time blocks becomes similar to that of Independent Vector/Component Analysis (IVA/ICA). We discuss the relationship among SJD and existing techniques. Experimental results on speech separation are shown to demonstrate the behavior of these methods.
Exploiting Structures of Temporal Causality for Robust Speaker Localization in Reverberant Environments
Christopher Schymura, Peng Guo, Yanir Maymon, Boaz Rafaely, Dorothea Kolossa

This paper introduces a framework for robust speaker localization in reverberant environments based on a causal analysis of the temporal relationship between direct sound and corresponding reflections. It extends previously proposed localization approaches for spherical microphone arrays based on a direct-path dominance test. So far, these methods are applied in the time-frequency domain without considering the temporal context of direction-of-arrival measurements. In this work, a causal analysis of the temporal structure of subsequent directions-of-arrival estimates based on the Granger causality test is proposed. The cause-effect relationship between estimated directions is modeled via a causal graph, which is used to distinguish the direction of the direct sound from corresponding reflections. An experimental evaluation in simulated acoustic environments shows, that the proposed approach yields an improvement in localization performance especially in highly reverberant conditions.
Relative Transfer Function Estimation From Speech Keywords
Ryan Corey, Andrew Singer

Far-field speech capture systems rely on microphone arrays to spatially filter sound, attenuating unwanted interference and noise and enhancing a speech signal of interest. To design effective spatial filters, we must first estimate the acoustic transfer functions between the source and the microphones. It is difficult to estimate these transfer functions if the source signals are unknown. However, in systems that are activated by a particular speech phrase, we can use that phrase as a pilot signal to estimate the relative transfer functions. Here, we propose a method to estimate relative transfer functions from known speech phrases in the presence of background noise and interference using template matching and time-frequency masking. We find that the proposed method can outperform conventional estimation techniques, but its performance depends on the characteristics of the speech phrase.
The 2018 Signal Separation Evaluation Campaign
Antoine Liutkus, Fabian-Robert Stöter, Nobutaka Ito

This paper reports the organization and results for the 2018 community-based Signal Separation Evaluation Campaign (SiSEC 2018). This year's edition was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning based systems. For this purpose, we prepared a new music separation database: MUSDB18, featuring close to 10 h of audio. Additionally, open-source software was released to automatically load, process and report performance on MUSDB18. Furthermore, a new official Python version for the BSS Eval toolbox was released, along with reference implementations for three oracle separation methods: ideal binary mask, ideal ratio mask, and multichannel Wiener filter.
Image Completion with Nonnegative Matrix Factorization under Separability Assumption
Tomasz Sadowski, Rafal Zdunek

Nonnegative matrix factorization is a well-known unsupervised learning method for part-based feature extraction and dimensionality reduction of a nonnegative matrix with a variety of applications. One of them is a matrix completion problem in which missing entries in an observed matrix are recovered on the basis of partially known entries. In this study, we present a geometric approach to the low-rank image completion problem with separable nonnegative matrix factorization of an incomplete data. The proposed method recursively selects extreme rays of a simplicial cone spanned by an observed image, and updates the latent factors with the hierarchical alternating least-squares algorithm. The numerical experiments performed on several images with missing entries demonstrate that the proposed method outperforms other algorithms in terms of computational time and accuracy.
Long-Term SNR Estimation Using Noise Residuals and a Two-Stage Deep-Learning Framework
Xuan Dong, Donald Williamson

Knowing the signal-to-noise ratio of a noisy speech signal is important since it can help improve speech applications. This paper presents a two-stage approach for estimating the long-term signal-to-noise ratio (SNR) of speech signals that are corrupted by background noise. The first stage produces noise residuals from a speech separation module. The second stage then uses the residuals and a deep neural network (DNN) to predict long-term SNR. Traditional SNR estimation approaches use signal processing, unsupervised learning, or computational auditory scene analysis (CASA) techniques. We propose a deep-learning based approach, since DNNs have outperformed other techniques in several speech processing tasks. We evaluate our approach across a variety of noise types and input SNR levels, using the TIMIT speech corpus and NOISEX-92 noise database. The results show that our approach generalizes well in unseen noisy environments, and it outperforms several existing methods.
Probabilistic Sparse Non-negative Matrix Factorization
Jesper L. Hinrich, Morten Mørup

In this paper, we propose a probabilistic sparse non-negative matrix factorization model that extends a recently proposed variational Bayesian non-negative matrix factorization model to explicitly account for sparsity. We assess the influence imposing sparsity within a probabilistic framework on either the loading matrix, score matrix, or both and further contrast the influence of imposing an exponential or truncated normal distribution as prior. The probabilistic methods are compare to conventional maximum likelihood based NMF and sparse NMF on three image datasets; 1) A (synthetic) swimmer dataset, 2) The CBCL face dataset, and 3) The MNIST handwritten digits dataset. We find that the probabilistic sparse NMF is able to automatically learn the level of sparsity and find that the existing probabilistic NMF as well as the proposed probabilistic sparse NMF prunes inactive components and thereby automatically learns a suitable number of components. We further find that accounting for sparsity can provide more part based representations but for the probabilistic modeling the choice of priors and how sparsity is imposed can have a strong influence on the extracted representations.
Spatial Filtering of EEG Signals to Identify Periodic Brain Activity Patterns
Dounia Mulders, Cyril de Bodt, Nicolas Lejeune, André Mouraux, Michel Verleysen

Long-lasting periodic sensory stimulation is increasingly used in neuroscience to study, using electroencephalography (EEG), the cortical processes underlying perception in different modalities. This kind of stimulation can elicit synchronized periodic activity at the stimulation frequency in neuronal populations responding to the stimulus, referred to as a steady-state response (SSR). While the frequency analysis of EEG recordings is particularly well suited to capture this activity, it is limited by the intrinsic noisy nature of EEG signals and the low signal-to-noise ratio (SNR) of some responses. This paper compares and adapts spatial filtering methods for periodicity maximization to enhance the SNR of periodic EEG responses, a key condition to generalize their use as a research or clinical tool. This approach uncovers both temporal dynamics and spatial topographic patterns of SSRs, and is validated using EEG data from 15 healthy subjects exposed to periodic cool and warm stimuli.
Perceptual Evaluation of Blind Source Separation in Object-based Audio Production
Philip Coleman, Qingju Liu, Jon Francombe, Philip Jackson

Object-based audio has the potential to enable multimedia content to be tailored to individual listeners and their reproduction equipment. In general, object-based production assumes that the objects–the assets comprising the scene–are free of noise and interference. However, there are many applications in which signal separation could be useful to an object-based audio workflow, e.g., extracting individual objects from channel-based recordings or legacy content, or recording a sound scene with a single microphone array. This paper describes the application and evaluation of blind source separation (BSS) for sound recording in a hybrid channel-based and object-based workflow, in which BSS-estimated objects are mixed with the original stereo recording. A subjective experiment was conducted using simultaneously spoken speech recorded with omnidirectional microphones in a reverberant room. Listeners mixed a BSS-extracted speech object into the scene to make the quieter talker clearer, while retaining acceptable audio quality, compared to the raw stereo recording. Objective evaluations show that the relative short-term objective intelligibility and speech quality scores increase using BSS. Further objective evaluations are used to discuss the influence of the BSS method on the remixing scenario; the scenario shown by human listeners to be useful in object-based audio is shown to be a worse-case scenario.
Improving Single-Network Single-Channel Separation of Musical Audio with Convolutional Layers
Gerard Roma, Owen Green, Pierre Alexandre Tremblay

Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is considered to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classification and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allows improving separation results with respect to using only fully-connected layers.
DNN-Based Music Source Separation Using MMDenseNet and BLSTM Architectures
Stefan Uhlich, Franck Giron, Michael Enenkl, Thomas Kemp, Naoya Takahashi, Yuki Mitsufuji

Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling
Ryan M. Corey, Andrew C. Singer

Source Separation with Long Time Dependency Gated Recurrent Units Neural Networks
Dominic Ward, Qiuqiang Kong, Mark D. Plumbley

Latent Mixture Models for Automatic Music Transcription
Cian O’Brien, Mark D. Plumbley

Polyphonic music transcription is a challenging problem, requiring the identification of a collection of latent pitches which can explain an observed music signal. Many state-of-the-art methods are based on the Non-negative Matrix Factorization (NMF) framework, which itself can be cast as a latent variable model. However, the basic NMF algorithm fails to consider many important aspects of music signals such as low-rank or hierarchical structure and temporal continuity. Here we propose a probabilistic model to address some of the shortcomings of NMF. Based on the Probabilistic Latent Component Analysis framework, we propose an algorithm which represents signals using a collection of low-rank dictionaries built from a base pitch dictionary. Experiments on a standard music transcription data set show that our method can successfully decompose signals into a hierarchical and smooth structure, improving the quality of the transcription.
Tensorlab 4.0 – A Preview
Michiel Vandecappelle, Martijn Boussey, Nico Vervliety, Matthieu Vendevilley, Rob Zinky and Lieven De Lathauwer

Since its initial release in 2013, Tensorlab has evolved into a powerful Matlab toolbox for the analysis of tensors and the computation of tensor decompositions. This upcoming release of Tensorlab, version 4.0, widens the applicability of the toolbox to a larger range of real-world applications. New β-divergence and low-rank weighted least squares (WLS) cost-functions are introduced for the canonical polyadic decomposition (CPD), offering higher flexibility to the user. Further, updating algorithms for the CPD allow both the tracking of streaming data and the incremental computation of the CPD of a large tensor. An LS-CPD algorithm is included to compute the CPD of a tensor that is only implicitly available as the solution of an underdetermined linear system.
Training Strategies for Deep Latent Models and Applications to Speech Presence Probability Estimation
Shlomo Chazan, Sharon Gannot, Jacob Goldberger

In this study we address models with latent variable in the context of neural networks. We analyze a neural network architecture, mixture of deep experts (MoDE), that models latent variables using the mixture of expert paradigm. Learning the parameters of latent variable models is usually done by the expectation-maximization (EM) algorithm. However, it is well known that back-propagation gradient based algorithms are the preferred strategy for training neural networks. We show that in the case of neural networks with latent variables, the back-propagation algorithm is actually a recursive variant of the EM that is more suitable for training neural networks. To demonstrate the viability of the proposed MoDE network it is applied to the task of speech presence probability estimation, widely applicable to many speech processing problem, e.g. speaker diarization and separation, speech enhancement and noise reduction. Experimental results show the benefits of the proposed architecture over standard fully-connected networks with the same number of parameters.
Jointly Detecting and Separating Singing Voice: A Multi-Task Approach
Daniel Stoller, Sebastian Ewert, Simon Dixon

A main challenge in applying deep learning to music processing is the availability of training data. One potential solution is Multi-task Learning, in which the model also learns to solve related auxiliary tasks on additional datasets to exploit their correlation. While intuitive in principle, it can be challenging to identify related tasks and construct the model to optimally share information between tasks. In this paper, we explore vocal activity detection as an additional task to stabilise and improve the performance of vocal separation. Further, we identify problematic biases specific to each dataset that could limit the generalisation capability of separation and detection models, to which our proposed approach is robust. Experiments show improved performance in separation as well as vocal detection compared to single-task baselines. However, we find that the commonly used Signal-to-Distortion Ratio (SDR) metrics did not capture the improvement on non-vocal sections, indicating the need for improved evaluation methodologies.
An Approximate Message Passing Approach for DOA Estimation in Phase Noisy Environments
Guillaume Beaumont, Angélique Dremeau, Ronan Fablet

In underwater acoustics, wave propagation can be greatly disrupted by random fluctuations in the ocean environment. In particular, phase measurements of the complex pressure field can be loudly noisy and can defeat conventional direction-or-arrival (DOA) estimation algorithms. In this paper, we propose a new Bayesian approach able to handle such phase noise as informative prior on the measurements. In particular, phase-noise modeling is integrated into a message propagation algorithm that we name ''paSAMP'' algorithm (for Phase-Aware Swept Approximate Message Passing). This algorithm can be seen as an extension of the ''prSAMP'' algorithm, a recently proposed phase retrieval algorithm (namely, without any informative prior on the missing phase). In addition, the phase prior is combined to a sparse assumption on the directions of arrival to achieve a highly resoluted estimation. Tested on simulated data mimicking real environments, paSAMP turns out to succesfully integrate the generative model with a multiplicative noise and offers better performance in terms of DOA estimation than other conventional approaches (e.g. classic beamforming). In addition, the method proves to be more robust to additive noise than other variational methods (e.g. based on Mean-Field approximation).
An Expectation-Maximization Approach to Tuning Generalized Vector Approximate Message Passing
Christopher Metzler, Philip Schniter, Richard Baraniuk

Generalized Vector Approximate Message Passing (GVAMP) is an efficient iterative algorithm for approximately minimum-mean-squared-error estimation of a random vector \(\mathbf{x}\sim p_{\mathbf{x}}(\mathbf{x})\) from generalized linear measurements, i.e., measurements of the form \(\mathbf{y}=Q(\mathbf{z})\) where \(\mathbf{z}=\mathbf{Ax}\) with known \(\mathbf{A}\), and \(Q(\cdot)\) is a noisy, potentially nonlinear, componentwise function. Problems of this form show up in numerous applications, including robust regression, binary classification, quantized compressive sensing, and phase retrieval. In some cases, the prior \(p_{\mathbf{x}}\) and/or channel \(Q(\cdot)\) depend on unknown deterministic parameters \(\boldsymbol{\theta}\), which prevents a direct application of GVAMP. In this paper we propose a way to combine expectation maximization (EM) with GVAMP to jointly estimate \(\mathbf{x}\) and \(\boldsymbol{\theta}\). We then demonstrate how EM-GVAMP can solve the phase retrieval problem with unknown measurement-noise variance.
A Study on the Benefits of Phase-Aware Speech Enhancement in Challenging Noise Scenarios
Martin Krawczyk-Becker, Timo Gerkmann

In recent years, there has been a renaissance of research on the role of the spectral phase in single-channel speech enhancement. One of the recent proposals is to not only estimate the clean speech phase but also use this phase estimate as an additional source of information to facilitate the estimation of the clean speech magnitude. To assess the potential benefit of such approaches, in this paper we systematically explore in which situations the additional information about the clean speech phase is most valuable. For this, we compare the performance of phase-aware and phase-blind clean speech estimators in different noise scenarios, i.e. at different signal-to-noise ratios (SNRs) and for noise sources with different degrees of stationarity. Interestingly, the results indicate that the greatest benefits can be achieved in situations where conventional magnitude-only speech enhancement is most challenging, namely in highly non-stationary noises at low SNRs. Finally, we discuss how these findings can be explained algorithmically.
Phase Reconstruction for Time-Frequency Inpainting
Ama Marina Kreme, Valentin Emiya, Caroline Chaux

We are interested here in missing data reconstruction in time -frequency (TF) plane. We assume that we have a TF representation of an audio signal and that some — but not all — complex coefficients suffer from phase erasement. In orther words, we aim at reconstructing missing phases of some complex coefficients issued from a short-time Fourier transform (STFT) assuming that the phases of the other coefficients as well as the modulus of all coefficients are known in all the TF plane. The mathematical formulation of the inverse problem is first described and then, three methods are proposed: a first one based on the well known Griffin and Lim algorithm and two other ones based on positive semidefinite programming (SDP) optimization methods namely PhaseLift and PhaseCut, that are extended to the case of partial phase knowledge. The three derived algorithms are tested on real audio signals in two situations: the case where the missing data are randomly distributed and the case where they are localized. Algorithms are compared in terms of accuracy as well as runtimes.
Feature Selection in Weakly Coherent Matrices
Stephane Chretien, Zhen Wai Olivier Ho

A problem of paramount importance in both pure (Restricted Invertibility problem) and applied mathematics (Feature extraction) is the one of selecting a submatrix of a given matrix, such that this submatrix has its smallest singular value above a specified level. Such problems can be addressed using perturbation analysis. In this paper, we propose a perturbation bound for the smallest singular value of a given matrix after appending a column, under the assumption that its initial coherence is not large, and we use this bound to derive a fast algorithm for feature extraction.
Variable Projection Applied to Block Term Decomposition of Higher-Order Tensors
Guillaume Olikier, Pierre-Antoine Absil, Lieven De Lathauwer

Higher-order tensors have become popular in many areas of applied mathematics such as statistics, scientific computing, signal processing or machine learning, notably thanks to the many possible ways of decomposing a tensor. In this paper, we focus on the best approximation in the least-squares sense of a higher-order tensor by a block term decomposition. Using variable projection, we express the tensor approximation problem as a minimization of a cost function on a Cartesian product of Stiefel manifolds. The effect of variable projection on the Riemannian gradient algorithm is studied through numerical experiments.
Independent Vector Analysis Exploiting Pre-Learned Banks of Relative Transfer Functions for Assumed Target's Positions
Jaroslav Cmejla, Tomáš Kounovský, Jiri Malek, Zbyněk Koldovský

On-line frequency-domain blind separation of audio sources performed through Independent Vector Analysis (IVA) suffers from the problem of determining the order of the separated outputs. In this work, we apply a supervised IVA based on pilot components obtained using a bank of Relative Transfer Functions (RTF). The bank is assumed to be available for potential positions of a target speaker within a confined area. In every frame, the most suitable RTF is selected from the bank based on a criterion. The pilot components are obtained as pre-separated target and interference, respectively, through the Minimum-Power Distortionless Beamforming and Null Beamforming. The supervised IVA is tested in a real-world scenario with various levels of up-to-dateness of the bank. We show that the global permutation problem is resolved even when the bank contains only pure delay filters. The Signal-to-Interference Ratio in separated signals is mostly better than that achieved by the pre-separation, unless the bank contains very precise RTFs.
Does k Matter? k-NN Hubness Analysis for Kernel Additive Modelling Vocal Separation
Delia Fano Yela, Dan Stowell, Mark Sandler

Kernel Additive Modelling (KAM) is a framework for source separation aiming to explicitly model inherent properties of sound sources to help with their identification and separation. KAM separates a given source by applying robust statistics on the selection of time-frequency bins obtained through a source-specific kernel, typically the k-NN function. Even though the parameter k appears to be key for a successful separation, little discussion on its influence or optimisation can be found in the literature. Here we propose a novel method, based on graph theory statistics, to automatically optimise k in a vocal separation task. We introduce the k-NN hubness as an indicator to find a tailored k at a low computational cost. Subsequently, we evaluate our method in comparison to the common approach to choose k. We further discuss the influence and importance of this parameter with illuminating results.
Improving Reverberant Speech Separation with Binaural Cues Using Temporal Context and Convolutional Neural Networks
Alfredo Zermini, Qiuqiang Kong, Yong Xu, Mark Plumbley, Wenwu Wang

Given binaural features as input, such as interaural level difference and interaural phase difference, Deep Neural Networks (DNNs) have been recently used to localize sound sources in a mixture of speech signals and/or noise, and to create time-frequency masks for the estimation of the sound sources in reverberant rooms. Here, we explore a more advanced system, where feed-forward DNNs are replaced by Convolutional Neural Networks (CNNs). In addition, the adjacent frames of each time frame (occurring before and after this frame) are used to exploit contextual information, thus improving the localization and separation for each source. The quality of the separation results is evaluated in terms of Signal to Distortion Ratio (SDR).
Generating Talking Face Landmarks from Speech
Sefik Emre Eskimez, Ross K Maddox, Chenliang Xu, Zhiyao Duan

The presence of a corresponding talking face has been shown to significantly improve speech intelligibility in noisy conditions and for hearing impaired population. In this paper, we present a system that can generate landmark points of a talking face from an acoustic speech in real time. The system uses a long short-term memory (LSTM) network and is trained on frontal videos of 27 different speakers with automatically extracted face landmarks. After training, it can produce talking face landmarks from the acoustic speech of unseen speakers and utterances. The training phase contains three key steps. We first transform landmarks of the first video frame to pin the two eye points into two predefined locations and apply the same transformation on all of the following video frames. We then remove the identity information by transforming the landmarks into a mean face shape across the entire training dataset. Finally, we train an LSTM network that takes the first- and second-order temporal differences of the log-mel spectrogram as input to predict face landmarks in each frame. We evaluate our system using the mean-squared error (MSE) loss of landmarks of lips between predicted and ground-truth landmarks as well as their first- and second-order temporal differences. We further evaluate our system by conducting subjective tests, where the subjects try to distinguish the real and fake videos of talking face landmarks. Both tests show promising results.
Using Hankel Structured Low-Rank Approximation for Sparse Signal Recovery
Ivan Markovsky, Pier Luigi Dragotti

Structured low-rank approximation is used in model reduction, system identification, and signal processing to find low-complexity models from data. The rank constraint imposes the condition that the approximation has bounded complexity and the optimization criterion aims to find the best match between the data---a trajectory of the system---and the approximation. In some applications, however, the data is sub-sampled from a trajectory, which poses the problem of sparse approximation using the low-rank prior. This paper considers a modified Hankel structured low-rank approximation problem where the observed data is a linear transformation of a system's trajectory with reduced dimension. We reformulate this problem as a Hankel structured low-rank approximation with missing data and propose a solution methods based on the variable projections principle. We compare the Hankel structured low-rank approximation approach with the classical sparsity inducing method of ell-1-norm regularization. The ell-1-norm regularization method is effective for sum-of-exponentials modeling with a large number of samples, however, it is not suitable for damped system identification.
Static and Dynamic Modeling of Absence Epileptic Seizures Using Depth Recordings
Saeed Akhavan, Ronald Phlypo, Hamid Soltanian-Zadeh, Mahmoud Kamarei, Christian Jutten

This research temporally explores absence epileptic seizures using depth cortical data recorded from different layers of the somatosensory cortex of Genetic Absence Epilepsy Rats from Strasbourg (GAERS). We characterize the recorded absence seizures by a linear combination of a few static and dynamic sources. Retrieving these sources from the recorded absence seizures is the main target of this study which helps us uncover the temporal evolution of absence seizures. The method used in this study provides an interesting and original solution to the classical data denoising consisting in removing the background activity and cleaning the data. The obtained results show the presence of a static source and a few specific dynamic sources during the recorded absence seizures. It is also shown that the sources have similar origins in different GAERS.
Muticriteria Decision Making Based on Independent Component Analysis: A Preliminary Investigation Considering the TOPSIS Approach
Guilherme Pelegrina, Leonardo Tomazeli Duarte, João Romano

This work proposes the application of independent component analysis to the problem of ranking different alternatives by considering criteria that are not necessarily statistically independent. In this case, the observed data (the criteria values for all alternatives) can be modeled as mixtures of latent variables. Therefore, in the proposed approach, we perform ranking by means of the TOPSIS approach and based on the independent components extracted from the collected decision data. Numerical experiments attest the usefulness of the proposed approach, as they show that working with latent variables leads to better results compared to already existing methods.
A Latent Variable Model for Simultaneous Dimensionality Reduction and Connectivity Estimation
Ricardo Pio Monti and Aapo Hyvarinen

Connectivity estimation is a fundamental problem in many areas of science. However, in the context of high-dimensional data it may be neither feasible nor useful to model the connectivities between all observed variables. Grouping variables into clusters or communities is a useful preprocessing step, but it is not clear how to do so optimally in view of connectivity estimation. A further practical problem is that we may have data from different classes (e.g. multiple subjects in an experiment), and we need to incorporate useful constraints about the similarities between the classes. In this abstract, we present a latent variable model to simultaneously address both of the aforementioned challenges. The model is essentially a factor analysis model where the factors (i.e., latent variables) are allowed to have arbitrary correlations. The associated factor loading matrix is constrained to express a community structure via the introduction of non-negativity and orthonormality constraints. Such constraints also allow us to prove the identifiability of the model, providing a clear interpretation for latent factors. Experimental results demonstrate the capabilities of the proposed model.
Loss Function Weighting Based on Source Dominance for Monaural Source Separation using Recurrent Neural Networks
Seungtae Kang and Gil-Jin Jang

In this paper, we propose a weighted loss function for monaural source separation using recurrent neural networks whereappropriate training data for the original sources are available. The weight varies for each time-frequency instance according to the mutual dominance of the binaural source signals. The mutual dominance is computed by the multiplications of the inverse source-to-mixture ratios of the ground truth signals of the two sources, and the weights are obtained by appropriate scaling of the mutual dominance. The squared error between the target and the estimated becomes more important as the difference becomes larger. The proposed weighting is applied to one of the conventional monaural source separation techniques that exploits recurrent neural networks, and showed improved performances over the same dataset.
Revisiting Synthesis Model in Sparse Audio Declipper
Pavel Záviška, Pavel Rajmic, Zdeněk Průša, Vítězslav Veselý

The state of the art in audio declipping has currently been achieved by SPADE (SParse Audio DEclipper) algorithm by Kitić et al. Until now, the synthesis/sparse variant, S-SPADE, has been considered significantly slower than its analysis/cosparse counterpart, A-SPADE. It turns out that the opposite is true: by exploiting a recent projection lemma, individual iterations of both algorithms can be made equally computationally expensive, while S-SPADE tends to require considerably fewer iterations to converge. In this paper, the two algorithms are compared across a range of parameters such as the window length, window overlap and redundancy of the transform. The experiments show that although S-SPADE typically converges faster, the average performance in terms of restoration quality is not superior to A-SPADE.
Consistent Dictionary Learning for Signal Declipping
Lucas Rencker, Francis Bach, Wenwu Wang, Mark Plumbley

Clipping, or saturation, is a common nonlinear distortion in signal processing. Recently, declipping techniques have been proposed based on sparse decomposition of the clipped signals on a fixed dictionary, with additional constraints on the amplitude of the clipped samples. Here we propose a dictionary learning approach, where the dictionary is directly learned from the clipped measurements. We propose a soft-consistency metric that minimizes the distance to a convex feasibility set, and takes into account our knowledge about the clipping process. We then propose a gradient descent-based dictionary learning algorithm that minimizes the proposed metric, and is thus consistent with the clipping measurement. Experiments show that the proposed algorithm outperforms other dictionary learning algorithms applied to clipped signals. We also show that learning the dictionary directly from the clipped signals outperforms consistent sparse coding with a fixed dictionary.
Learning Fast Dictionaries for Sparse Representations using Low-Rank Tensor Decompositions
Cassio Dantas, Jérémy Cohen, Remi Gribonval

A new dictionary learning model is introduced where the dictionary matrix is constrained as a sum of R Kronecker products of K terms. It offers a more compact representation and requires fewer training data than the general dictionary learning model, while generalizing tensor dictionary learning. The proposed Higher Order Sum of Kroneckers model can be computed by merging dictionary learning approaches with the tensor Canonic Polyadic Decomposition. Experiments on image denoising illustrate the advantages of the proposed approach.
Truncated Variational Sampling for “Black Box” Optimization of Generative Models
Jörg Lücke, Zhenwen Dai, Georgios Exarchakis

We investigate the optimization of two probabilistic generative models with binary latent variables using a novel variational EM approach. The approach distinguishes itself from previous variational approaches by using latent states as variational parameters. Here we use efficient and general purpose sampling procedures to vary the latent states, and investigate the "black box" applicability of the resulting optimization approach. For general purpose applicability, samples are drawn from approximate marginal distributions as well as from the prior distribution of the considered generative model. As such, sampling is defined in a generic form with no analytical derivations required. As a proof of concept, we then apply the novel procedure (A)~to Binary Sparse Coding (a model with continuous observables), and (B)~to basic Sigmoid Belief Networks (which are models with binary observables). Numerical experiments verify that the investigated approach efficiently as well as effectively increases a variational free energy objective without requiring any additional analytical steps.