Introduction to statistics

IntroStat

Abstract

Statistics is a foundation of many areas of science and engineering, as it provides a systematic methodology for analyzing data. This course introduces fundamental concepts in statistics that one must understand to use statistical methods in practice. These concepts will also provide guidelines to practitioners who use machine learning, as statistics form the basis of machine learning.

Teaching and Learning Methods: Lectures and homework.

Course Policies: The valid usage of statistical methods requires a mathematical understanding of the underlying mechanism. As such, the course covers both mathematical and algorithmic aspects of statistics.

Bibliography

Book: EFRON B., HASTIE T. Computer Age Statistical Inference: Algorithms, Evidence and Data Science. Cambridge University Press, 2016, 493p. (The textbook is freely available at the authors’ website: https://web.stanford.edu/~hastie/CASI/index.html)
Book: BERGER J. Statistical Decision Theory and Bayesian Analysis. Springer, 1985, 618p.

Requirements

The language of statistics is probability theory. As such, the course requires basic knowledge of probabilities in finite discrete outcomes, that of calculus (such as differentiation and integration), and that of linear algebra (such as eigenvectors and eigenvalues of a matrix, and the solution of linear equations).

Description

The course focuses on fundamental concepts in statistics, using the simplest examples such as the estimation of the mean from a finite sample and linear regression. It covers both frequentist and Bayesian approaches. The former includes statistical hypothesis testing and maximum likelihood estimation. The latter includes the notion of prior and posterior distributions and Bayes' rule. I will explain how these two approaches differ in interpreting the `` data'' and `` model'', and in defining ``optimal'' decisions. I will also teach Monte Carlo methods, a key ingredient in both approaches.

Learning outcomes:

Students will learn fundamental notions of statistics and the way of statistical thinking. These include:

Conclusions and decisions that can be made from data are highly dependent on how the data are obtained. The key concepts in this regard are selection bias and consistency.
There are various trade-offs in statistical methods, notably those between
- bias,
- variance
- computation.

This knowledge is necessary for practice, as it is required for the selection of a method and a model, and hyper-parameter tuning. The key concepts here include generalization error, cross-validation, regularization and the curse of dimensionality.

Nb hours: 21 hours

Grading Policy:

Homework (25% of the final grade)
Final Exam (75% of the final grade)