Internship proposal.
- Title: R2 in RL
- Supervisor: Yannis Flet-Berliac, Philippe Preux
- Duration: 5 to 6 months
- When: Spring-Summer 2020
- Where: SequeL, Inria Lille, Villeneuve d'Ascq, France
- Expected background: master MVA, or equivalent.
- Keywords: reinforcement learning, deep RL, algorithms.
- Context:
Reinforcement learning is a sub-field of machine learning in which we aim at designing agents that learn to act. Acting usually involves performing a sequence of actions in order to achieve a goal. Examples are countless; games are good examples, like pacman or chess in which the player has to perform a series of action either to reach a maximal score, or to defeat his opponent. Applications of RL go way beyond games.
RL algorithms learn by exploring their environment and collecting information on transitions (state st, action at, return rt, next state st+1). Usually, an RL algorithm uses all these samples. We have recently proposed that this is not a good idea, as the informative quality of samples is not the same for all of them: some samples are informative, others are not, and are misleading. To assess the informative quality of a sample, we use the notion of R2. This combination has led to improve very significantly the experimental performance of state-of-the-art policy gradient algorithms on a suite of classical RL tasks.
- What:
The goal of this internship is:
- to study further this idea. This study will be theoretical and experimental. We want to better understand how this criteria is impacting the learning process, how to make the best profit of it, possibly identify other ways to take advantage of this notion, and investigate it. As we consider algorithms that are quite difficult to study in a meaningfull way from a theoretical point of view, this study will also be experimental.
Bibliography:
Working environment: SequeL is a well-known research group in reinforcement learning and bandits. It is composed of 4 permanent researchers, 20+ PhD students, a couple of post-docs and engineers. SequeL provides a very rich and stimulating for doing cutting-edge research in RL.
Back to homepage.