Introduction to Actor Critic Methods

Talk by Sean Meyn (University of Florida, USA) at the Kick-Off Workshop of the 2nd phase of the SFB, September 14-15, 2021.

Find the recording here.

Abstract:

The goal of actor critic methods is to estimate the best policy among a parameterized family for a controlled Markov chain. Through the magic of Markov chain theory, it is possible to obtain unbiased estimates of the objective through the geometry of TD-learning. These algorithms were born from the dissertations of Van Roy and Konda in the 1990s, under the supervision of Tsitsiklis at MIT.

The lecture will consist of two parts. Part 1 is an introduction to the TD(1) algorithm, that is one part of the actor-critic method. The elegant theory is accompanied by a significant warning: while the algorithm solves a projection problem, it is a Monte-Carlo method that can come with massive variance. Part 2 is an introduction to the actor critic algorithm, and the crucial role of the TD(1) algorithm. It seems likely that the variance can be tamed in these algorithms, but this remains a research frontier.