TD Gammon revisited

Internship proposal.

Title: TD-Gammon revisited
Supervisor: Philippe Preux
Duration: 5 to 6 months
When: Spring-Summer 2020
Where: SequeL, Inria Lille, Villeneuve d'Ascq, France
Expected background: master in CS, specialized in machine learning.
Keywords: reinforcement learning, deep RL, games, backgammon, algorithms, experimental
Context:
Reinforcement learning is a sub-field of machine learning in which we aim at designing agents that learn to act. Acting usually involves performing a sequence of actions in order to achieve a goal. Examples are countless; games are good examples, like pacman or chess in which the player has to perform a series of action either to reach a maximal score, or to defeat his opponent. Applications of RL go way beyond games.
Around the early 1990's, G. Tesauro created TD-Gammon which is a program that learned by RL to play Backgammon at expert level. The key ingredients are learning by temporal difference, and a neural network to represent the value function.
Recently, this combination of RL and nets has been dubbed deep reinforcement learning, though there is usually nothing really deep in it. In particular TD-Gammon is really a single hidden layer neural network. Combining a somewhat deeper net with many tricks to make RL more efficient, RL is now world champion in chess, go, othello, shogi, ... and Atari games.
What:
The goal of this internship is:
- to reproduce Tesauro's original work,
- perform a thorough experimental study of it,
- think and explore how to use recent advances in RL to improve TD-Gammon,
- perform the necessary experimental studies accordingly.
Bibliography:
- Sutton and Barto, Reinforcement Learning, an Introduction
- G. Tesauro, Temporal Difference Learning and TD-Gammon, CACM, 1995
Working environment: SequeL is a well-known research group in reinforcement learning and bandits. It is composed of 4 permanent researchers, 20+ PhD students, a couple of post-docs and engineers. SequeL provides a very rich and stimulating for doing cutting-edge research in RL.

Back to homepage.