Algorithmic Machine Learning


This course aims at providing a solid and practical algorithmic foundation for the design and use of scalable machine learning algorithms, with particular emphasis on the MapReduce programming model. Students will get familiar with a wide range of topics, through the application of theoretic ideas on problems of practical interest. This is a "reverse class", in which students are required to study (or revise) a particular topic at home, and apply what they have learned to solve real-world problems, including industrial applications, during numerous laboratory sessions. Laboratory sessions are based on modern technologies such as Jupyter Notebooks.

Teaching and Learning Methods: Laboratory sessions (group of 2 students).

Course Policies: Attendance to the Lab. sessions are mandatory.

  • Book: JAMES G., WITTEN D., HASTIE T., TIBSHIRANI R. An Introduction to Statistical Learning. Springer, 2013, 440p.

  • Book: BISHOP C. Pattern Recognition and Machine Learning. Springer-Verlag, 2006, 768p.

  • Book: RYZA S., LASERSON U., OWENS S., WILLS J. Advanced Analytics with Spark. O’Reilly, 2017, 280p.

  • Book: SHALEV-SHWARTZ S., BEN-DAVID S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014, 410p.

  • Book: LESKOVEC J. (Stanford University), RAJARAMAN A. (Milliways Laboratories), ULLMAN J.D. (Stanford University). Mining of Massive Datasets. Cambridge University Press, 2014, 476p.


The prerequisites for this course are: “Machine Learning and Intelligent Systems” (MALIS), “Distributed Systems and Cloud Computing” (CLOUDS), and “Advanced Statistical Inference” (ASI).

  • Introduction and Apache Spark refresher
  • Notebook: Python and Apache spark ramp-up exercises
  • Notebook: Recommender systems, applied to the music industry
  • Notebook: Regression using decision trees and random forests, applied to the airline industry
  • Notebook: Monte-Carlo methods, applied to financial risk analysis
  • Notebook: Anomaly detection, applied to the telecommunication industry
  • Notebook: Time-series analysis, applied to neuroimaging
  • Notebook: Industry-driven applications

Learning outcomes:

  • Understand a data science problem statement, identify the theoretical tools and algorithmic implementation to solve the problem
  • Design and implement end-to-end software methods to analyze and prepare data, use data to learn a statistical model, and use the model to make inference
  • Validate and assess the quality of an end-to-end software method to address a data science problem

Nb hours: 21.00

Evaluation: Lab. reports (100% of the final grade).