Graduate School and Research Center in Digital Sciences

Algorithmic Machine Learning

[AML]
T Technical Teaching


Abstract

This course aims at providing a solid and practical algorithmic foundation to the design and use of scalable machine learning algorithms, with particular emphasis on the MapReduce programming model. Students will get familiar with a wide range of topics, through the application of theoretic ideas on problems of practical interest. This is a "reverse class", in which students are required to study (or revise) a particular topic at home, and apply what they have learned solving real world problems, including industrial applications, during numerous laboratory sessions. Laboratory sessions are based on modern technologies such as Jupyter Notebooks.

 Teaching and Learning Methods: Laboratory sessions (group of 2 students) 

Course Policies: Attendance to Lab sessions is mandatory.

Bibliography

  • An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
  • Pattern recognition and Machine Learning, by C. Bishop, Springer
  • Advanced Analytics with Spark, by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
  • Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David
  • Mining of Massive Datasets, Jure Leskovec (Stanford University), Anand Rajaraman (Milliways Laboratories), Jeffrey David Ullman (Stanford University), Cambridge University Press

Requirements

The prerequisites for this course are the MALIS (FALL) and ASI (SPRING) courses

Description

  •  Introduction and Apache Spark refresher
  • Notebook: Python and Apache spark ramp-up exercises
  • Notebook: Recommender systems, applied to the music industry
  • Notebook: Regression using decision trees and random forests, applied to the airline industry
  • Notebook: Monte-Carlo methods, applied to financial risk analysis
  • Notebook: Anomaly detection, applied to the telecommunication industry
  • Notebook: Time-series analysis, applied to neuroimaging
  • Notebook: Industry-driven applications

Learning outcomes:

  • Understand a data science problem statement, identify the theoretical tools and algorithmic implementation to solve the problem
  • Design and implement end-to-end software methods to analyze and prepare data, use data to learn a statistical model, and use the model to make inference
  • Validate and assess the quality of an end-to-end software method to address a data science problem

Nb hours: 42.00, at least 12 Lab sessions (36 hours) 

Grading Policy: Lab reports

Nb hours: 42.00
Nb hours per week: 3.00