We propose an efficient technique to learn probabilistic hierarchical topic models that are designed to preserve the manifold structure of audio data. The consideration of the data manifold is important, as it has been shown to provide superior performance in certain audio applications such as source separation. However, the high computational cost of a sparse encoding step due to the requirement of a large dictionary prevents it from being used in real-world applications such as real-time speech enhancement and the analysis of big audio data. In order to achieve a substantial speed-up of this step, while still respecting the data manifold, we propose to harmonize a particular type of locality sensitive hashing with the hierarchical topic model. The proposed use of hashing can reduce the computational complexity of the sparse encoding by providing candidates of non-zero activations, where the candidate set is built based on Hamming distance. The hashing step is followed by comprehensive sparse coding that considers those candidates only, rather than the entire dictionary. Experimental results show that the proposed hashing technique can provide audio source separation results comparable to the similar system without hashing, but with significantly less and cheaper computation.
Learn More