We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis [6]. The idea is to extract certain characteristic spectro-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear com- binations of these learned bases. In other words, their model ex- tracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic frame- work to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.
Learn More