Speaker and Noise Independent Voice Activity Detection

Voice activity detection (VAD) in the presence of heavy, non-stationary noise is a challenging problem that has attracted attention in recent years. Most modern VAD systems require training on highly specialized data: either labeled mixtures of speech and noise that are matched to the application, or, at the very least, noise data similar to that encountered in the application. Because obtaining labeled data can be a laborious task in practical applications, it is desirable for a voice activity detector to be able to perform well in the presence of any type of noise without the need for matched training data. In this paper, we propose a VAD method based on non-negative matrix factorization. We train a universal speech model from a corpus of clean speech but do not train a noise model. Rather, the universal speech model is sufficient to detect the presence of speech in noisy signals. Our experimental results show that our technique is robust to a variety of non stationary noises mixed at a wide range of signal-to-noise ratios and significantly outperforms baseline algorithms.

Publications

Speaker and Noise Independent Voice Activity Detection

Interspeech

Publication date: August 25, 2013

Francois Germain, Dennis Sun, Gautham Mysore

Best Student Paper Award

Research Areas: AI & Machine Learning Audio