Deep Learning

DeepLearning
Abstract

Deep Learning is a branch of Machine Learning which allows to build models that have shown superior performance fora wide range of applications, in particular Computer Vision and Natural Language Processing. Thanks to the joint availability of large amounts of data and affordable processing power, Deep Learning has gained popularity not only in academia, but also in the industry. Moreover, the availability of numerous software libraries to implement, train and use Deep Learning models has dramatically accelerated the adoption such modeling approach in a variety of application domains.

The objective of this course is to provide a detailed introduction to the main statistical modeling techniques used in Deep Learning, to fundamental aspects of stochastic optimization, to a geometric interpretation of loss landscapes, and to sequential data modeling, including deep recurrent neural networks, memory-based architectures, up to the latest trends in Deep Learning such as attention mechanisms and Transformer architectures. Finally, the course also provides a solid introduction to energy-based modeling, sampling methods and their connections to stochastic differential equations used deep generative models.

Ultimately, the objective of the course is to help students develop a critical thinking related to Deep Learning, both to be able to properly understand and apply any new development that is proposed by the scientific community, and to be able to “use a grain of salt” when defining the modeling approach to be used in practical application scenarios.

The course is organized around classical, frontal lectures where the theory is exposed and discussed. In addition, online tutorials serve as hands-on exercises, where students can experiment to gain a deeper understanding of the theoretical concepts discussed in class

 

Teaching and Learning Methods: The course is composed of a combination of lectures and on-line tutorials.

 

Course Policies: Attendance at all sessions is mandatory.

Bibliography
  • Book: GOODFELLOW I., BENGIO Y., COURVILLE A. Deep Learning. MIT Press, 2016, 800p.
  • Book: BISHOP M. C., BISHOP H., Deep learning, Springer, 2024

Requirements
 
  • Knowledge of and familiarty with probability theory and linear algebra.
  • Knowledge of the python programming language.

Description

Description

The course is intended to expose students with the fundamentals and recent developments in Deep Learning. The content revolves around the following lectures:

  • Deep Neural Networks: these two lectures cover the basics of deep neural networks, their mathematical description and interpretation, the definition of various layers (including normalization layers), the need for regularization and the various techniques to address overfitting, the concepts of computational graphs and automatic differentiation, stochastic optimization algorithms, their properties and variants, and advanced topics such as Deep model compression.
  • Convolutional Neural Networks: these two lectures cover the details of the convolutional layers, their mathematical description and interpretation, and their interpretation as linear algebra operations that can be executed on modern parallel processing hardware. In addition, ·         these lectures introduce the field of Deep Learning for computer vision, popular architectures with a mathematical explanation of their principles, and several example applications including object detection, and image segmentation.

  • Sequence Modeling: these two lectures focus on modeling sequential data, with a particular focus on natural language. The traditional design of recurrent neural networks is explained in detail and with mathematical rigor, and it is expanded to cover memory-based architecture such as LSTMs, the attention mechanisms and Transformer networks. The most prominent examples of language models, such as BERT, and GPT are also discussed.

  • Energy-based Modeling: this lecture presents the idea of using a surrogate, parametric function to model data densities, such that both density estimation and generative modeling can be reduced to a regression problem. This is an advanced topic that requires the introduction of statistical sampling techniques, and the simulation of discrete versions of continuous, stochastic differential equations. These tools are then used to define advanced models that use the score of the data distribution to learn their latent representations. As such, the applications that are enabled by the theory and discussed in the lecture range from density estimation for anomaly detection, and synthetic generation of realistic data, including images and other modalities.

 

Learning outcomes:

  • To be able to understand the key fundamentals associated with Deep Learning and Deep Network architectures for Machine Learning
  • To be able to understand new Deep Learning architectures proposed in the scientific literature
  • To develop critical thinking when dealing with modeling choices
  • To be able to define, train and use Deep Learning models

Evaluation: 

  • Quizzes during the course, Written exam at the end of the semester.