Reinforcement Learning (RL) builds learning systems from the ground up by trial and error: as the agent takes actions, they receive rewards and observations that allow them to build a more accurate model of the world and subsequently optimize better for rewards. Major successes of this method are ranging from Alpha Go and Alpha Zero algorithms designed for 2-player zero-sum games like the game of Go, to hardware optimization or data-center cooling. Most of these algorithms rely partly on theoretically grounded principles, but also on additional heuristics that are not yet fully understood. In particular, it is not clear yet how these agents would behave if their assigned task was to change, even slightly. As these algorithms are being implemented and deployed in crucial domains in industry and science, it is necessary to understand the mechanisms leading to the learnt behaviours and to provide theoretical guarantees on their performance, including in changing environments.
In this talk, we will first review and explain the fundamental principles of RL, and we give the key concepts used to build RL agents. We then highlight the challenges that arise when the environment is changing and we explain why it is necessary to take this assumption into account to build safe and lifelong-learning agents.
- invited by Alexandra Carpentier