RL for composite task

Internship proposal.

Title: RL for composite task
Supervisor: Philippe Preux
Duration: 5 to 6 months
When: Spring-Summer 2020
Where: Scool, Inria Lille, Villeneuve d'Ascq, France
Expected background: master in CS, specialized in machine learning.
Keywords: reinforcement learning, algorithms, experimental
Context:
Reinforcement learning is a sub-field of machine learning in which we aim at designing agents that learn to act. Acting usually involves performing a sequence of actions in order to achieve a goal. Examples are countless; games are good examples, like pacman or chess in which the player has to perform a series of actions either to reach a maximal score, or to defeat his opponent. Applications of RL go way beyond games.
Many tasks consist in dealing with a set of more or less independant agents among which some resources have to shared in order to reach the goal. To illustrate this, let us take a very simple example, the game of small horses (petits chevaux in French). In this game, each player has to manage a set of pieces (horses); the pieces have to be moved from the pasture to the stable; in turn, each player rolls a dice and this gives the amount of cells the player has to move one of his pieces. The players compete to be the first to reach their goal (all their horses in their stable). So, each time a player rolls the dice, he has to allocate its result to one of its pieces in order that he would be the first to bring all his horses back to the stables. Hence, in this task, at each turn, the player has to decide which piece to move in order to reach a global objective. Obviously, a basic brute-force reinforcement learning can learn to solve this task by self-playing a large number of games. However, during this internship, we want to go beyond brute-force: brute-force is wasting an enormous amount of data, and it is so stupid (some call that "artificial" intelligence)
What:
The goal of this internship is:
- study the litterature of this problem,
- explore new ideas. This exploration can be theoretical or algorithmic.
- perform an experimental assessment of the ideas. For this, rather than small horses, we wil consider the ``barricade'' game which is essentially the same sort of game, though more complex (and more interesting to play for human beings). It might be useful to first implement a brute-force RL algorithm to solve it, to solve as a baseline solution.
Bibliography:
- Sutton, Barto, Reinforcement Learning, an Introduction, 2nd edition, 2018
- Lapan, Deep Reinforcement Learning Hands-On by, Pakt, 2018
Working environment: Scool is a well-known research group in reinforcement learning and bandits. It is composed of 6 permanent researchers, 20+ PhD students, a couple of post-docs and engineers. Scool provides a very rich and stimulating for doing cutting-edge research in RL.

Back to homepage.