Recently, reinforcement learning (RL) has achieved inspiring success in game playing domains, including human-level control in Atari games and mastering the game of Go. Looking into the future, we expect to build machine learning systems that use RL to turn predictions into actions; applications include robotics, dialog systems, online education, adaptive medical treatment, to name but a few.
In this talk, I show how theoretical insights from supervised learning can help understand RL, and better appreciate the unique challenges that arise from multi-stage decision making. The first part of the talk focuses on an interesting phenomenon, that a short planning horizon can produce better policies when there is limited data. I explain it by making a formal analogy to empirical risk minimization, and argue that a short planning horizon helps avoid overfitting. The second part of the talk concerns a core algorithmic challenge in state-of-the-art RL: sample-efficient exploration in large state spaces. I introduce a new complexity measure, the Bellman rank, which allows us to apply a unified algorithm to a number of important RL settings, in some cases obtaining polynomial sample complexity for the first time.
Nan Jiang is a PhD candidate in Computer Science and Engineering at University of Michigan, He works with Satinder Singh on a variety of topics related to reinforcement learning. Specific research interests include provable use of function approximation, off-policy evaluation, state representation learning, spectral learning of dynamical systems, and inverse RL for AI safety. Nan received his bachelor degree in Control and Automation from Tsinghua University in 2011. He received the Best Paper Award at AAMAS 2015, and Rackham Predoctoral Fellowship in 2016.