Basics on reinforcement learning


Reinforcement Learning (RL) has recently emerged as a powerful technique in modern machine learning, allowing a system to learn through a process of trial and error using feedback. It has been succesfully applied in many use-cases, including systems such as AlphaZero, that learnt to master the games of chess, Go and Shogi.

The goal of this course is to introduce the students to basic concepts of RL such as, Markov decision processes, dynamic programming, model-free methods, approximation methods via value function and policy evaluation and many more useful tools.  This is a theoretical course but we will provide examples of real-world applications to demonstrate the usefulness of RL.

Teaching and Learning Methods

Lectures, homework, exercises. Each lecture starts summarizing key concepts from previous lecture. Part of each lecture is often dedicated to illustrative examples and exercises.

Course Policies

Attendance to lectures and exercise sessions is not mandatory by highly recommended.


[1] M. Puterman, “Markov decision processes: discrete stochastic dynamic programming”, John Wiley & Sons, 2014

[2] Richard S. Sutton and Andrew G. Barto, "Reinforcement learning: An introduction", Second Edition, MIT Press, 2019

[3] D. P. Bertsekas, ``Reinforcement Learning and Optimal Control”, Athena Scientific, 2019


Basic knowledge in linear algebra, matrix analysis, calculus, probability theory and random processes as well as capability in programming. Useful prerequisite course at EURECOM (for those who did not attend preparatory classes): MathEng 


Throughout this course, we will cover, basic tools that are widely used in RL both in theory and practice such as:

  • Markov decision processes
  • Dynamic programming
  • Model-free prediction
  • Model-free control
  • Value function approximation
  • Policy-gradient methods
  • Integration of learning and planning
  • Exploration and exploitation

If the time permits, we will also cover some known case-studies of RL in games.

Learning outcomes:

After completion of this course, the students will

  • Acquire basic knowledge of RL techniques
  • Identify and cast a problem using RL techniques when these applied
  • Formulate decision problems
  • Set up and run computational experiments

Number of hours: 21 hours


Homework Assignments (30% of the final grade), Final Exam (70%of the finale grade)

Extra Credits*: Optional Course Project (30%)

*This is not mandatory but it can increase significantly your grade.