
Classification-based Approximate Policy Iteration

IEEE Transactions on Automatic Control (TAC), 60(11): 2989- 2993, 2015.

Publication date: May 1, 2015

Amir massoud Farahmand, Doina Precup, Andre Barreto, Mohammad Ghavamzadeh

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework that can exploit regularities of both. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms, and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results.

Learn More

Research Area:  Adobe Research iconAI & Machine Learning