Internship proposal.
- Title: TD-Gammon revisited
- Supervisor: Philippe Preux
- Duration: 5 to 6 months
- When: Spring-Summer 2020
- Where: SequeL, Inria Lille, Villeneuve d'Ascq, France
- Expected background: master in CS, specialized in machine learning.
- Keywords: reinforcement learning, deep RL, games, backgammon, algorithms, experimental
- Context:
Reinforcement learning is a sub-field of machine learning in which we aim at designing agents that learn to act. Acting usually involves performing a sequence of actions in order to achieve a goal. Examples are countless; games are good examples, like pacman or chess in which the player has to perform a series of action either to reach a maximal score, or to defeat his opponent. Applications of RL go way beyond games.
Around the early 1990's, G. Tesauro created TD-Gammon which is a program that learned by RL to play Backgammon at expert level. The key ingredients are learning by temporal difference, and a neural network to represent the value function.
Recently, this combination of RL and nets has been dubbed deep reinforcement learning, though there is usually nothing really deep in it. In particular TD-Gammon is really a single hidden layer neural network. Combining a somewhat deeper net with many tricks to make RL more efficient, RL is now world champion in chess, go, othello, shogi, ... and Atari games.
- What:
The goal of this internship is:
- to reproduce Tesauro's original work,
- perform a thorough experimental study of it,
- think and explore how to use recent advances in RL to improve TD-Gammon,
- perform the necessary experimental studies accordingly.
Regarding the first objective, one of my interns in 2019 (Alessio Della Libera) re-implemented from scratch a Backgammon simulator, as well as TD-Gammon, and performed an initial experimental study of it. This internship will be based on this work. For the rest, it will be the intern's job to do it.
Bibliography:
Working environment: SequeL is a well-known research group in reinforcement learning and bandits. It is composed of 4 permanent researchers, 20+ PhD students, a couple of post-docs and engineers. SequeL provides a very rich and stimulating for doing cutting-edge research in RL.
Back to homepage.