Publications

Finite-Sample Analysis of Proximal Gradient TD Algorithms

Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (UAI-2015), pp. 504-513, Amsterdam, Netherlands, 2015.

Publication date: August 1, 2015

Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

winner of the Facebook best student paper award

In this paper, we show for the first time how gra- dient TD (GTD) reinforcement learning methods can be formally derived as true stochastic gradient algorithms, not with respect to their original objective functions as previously attempted, but rather using derived primal-dual saddle-point objective functions. We then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and no finite-sample analysis had been attempted. Two novel GTD algorithms are also proposed, namely projected GTD2 and GTD2-MP, which use proximal “mirror maps” to yield improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide exper- imental results showing the improved performance of our accelerated gradient TD methods.

Learn More

Research Area:  Adobe Research iconAI & Machine Learning