With improved sensors, the amount of data available in many vision problems has increased dramatically and allows the use of sophisticated learning algorithms to perform inference on the data. However, since these algorithms scale with data size, pruning the data is sometimes necessary. The pruning procedure must be statistically valid and a representative subset of the data must be selected without introducing selection bias. Information theoretic measures have been used for sampling the data, retaining its original information content. We propose an efficient Rényi entropy based subset selection algorithm. The algorithm is first validated and then applied to two sample applications where machine learning and data pruning are used. In the first application, Gaussian process regression is used to learn object pose. Here it is shown that the algorithm combined with the subset selection is significantly more efficient. In the second application, our subset selection approach is used to replace vector quantization in a standard object recognition algorithm, and improvements are shown.
Learn More