R<sup>2</sup> in RL

Internship proposal.

Title: R² in RL
Supervisor: Yannis Flet-Berliac, Philippe Preux
Duration: 5 to 6 months
When: Spring-Summer 2020
Where: SequeL, Inria Lille, Villeneuve d'Ascq, France
Expected background: master MVA, or equivalent.
Keywords: reinforcement learning, deep RL, algorithms.
Context:
Reinforcement learning is a sub-field of machine learning in which we aim at designing agents that learn to act. Acting usually involves performing a sequence of actions in order to achieve a goal. Examples are countless; games are good examples, like pacman or chess in which the player has to perform a series of action either to reach a maximal score, or to defeat his opponent. Applications of RL go way beyond games.
RL algorithms learn by exploring their environment and collecting information on transitions (state s_t, action a_t, return r_t, next state s_t+1). Usually, an RL algorithm uses all these samples. We have recently proposed that this is not a good idea, as the informative quality of samples is not the same for all of them: some samples are informative, others are not, and are misleading. To assess the informative quality of a sample, we use the notion of R². This combination has led to improve very significantly the experimental performance of state-of-the-art policy gradient algorithms on a suite of classical RL tasks.
What:
The goal of this internship is:
- to study further this idea. This study will be theoretical and experimental. We want to better understand how this criteria is impacting the learning process, how to make the best profit of it, possibly identify other ways to take advantage of this notion, and investigate it. As we consider algorithms that are quite difficult to study in a meaningfull way from a theoretical point of view, this study will also be experimental.
Bibliography:
- Sutton and Barto, Reinforcement Learning, an Introduction
- Flet-Berliac, Preux, Unpublished paper.
Working environment: SequeL is a well-known research group in reinforcement learning and bandits. It is composed of 4 permanent researchers, 20+ PhD students, a couple of post-docs and engineers. SequeL provides a very rich and stimulating for doing cutting-edge research in RL.

Back to homepage.