
Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing

Innovative Applications of Artificial Intelligence (IAAI)

Publication date: February 4, 2017

Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, Emma Brunskill

In this paper we consider the problem of evaluating one digi- tal marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary—an assumption that is typically false, particularly for digital marketing appli- cations. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction meth- ods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital mar- keting data set.

Learn More

Research Area:  Adobe Research iconAI & Machine Learning