Introduction to statistics

IntroStat

Abstract

Abstract

Statistics is a methodology for inferences about a population from a finite sample. It is a basis for many areas of science and engineering, including data science, machine learning and artificial intelligence. This course teaches fundamental statistical concepts in a mathematically rigorous but self-contained way, emphasising its connections with other fields such as causal inference, design of experiments, information theory and machine learning.

Teaching and Learning Methods: Lectures and homework.

Course Policies: Class attendance may be taken into account in the final grade.

Bibliography

Book: EFRON B., HASTIE T. Computer Age Statistical Inference: Algorithms, Evidence and Data Science. Cambridge University Press, 2016, 493p. (The textbook is freely available at the authors’ website: https://web.stanford.edu/~hastie/CASI/index.html)
Book: BERGER J. Statistical Decision Theory and Bayesian Analysis. Springer, 1985, 618p.

Requirements

Prerequisites

Familiarity with basic notation in set theory, such as inclusion, union, intersection, negation, etc.

Description

Description

The course starts with probability theory, defining random variables, probability distributions and other key concepts from a measure-theoretic viewpoint.

It then teaches the basics of statistical estimation, focusing on the simple but illustrative example of estimating the mean of a population from a finite sample. The emphasis will be on the bias and variance of a statistical estimator and its consistency in estimating the true population mean for increasing sample sizes. How these concepts play key roles in fields such as causal inference and machine learning will also be explained.

The course then introduces parametric models and maximum likelihood estimation (MLE). The emphasis will be on understanding conditions for MLE to consistently estimate the ``true parameter’’ as the sample size increases and on understanding what it means by ``true parameter.’’ For the latter, it will be shown that MLE is equivalent to minimising the Kullback-Leibler divergence between the true and model probability distributions. This equivalence enables understanding what MLE does when the true distribution is not realisable by the parametric model. These topics help one to understand more complex models in statistics and machine learning.

Lastly, the course teaches hypothesis testing, starting from an illustrative example of Fisher’s ``tasting tea’’ experiment, where the hypothesis to be tested is that a person’s tasting ability to distinguish the two ways of pouring milk into a tea: tea first or milk first. Randomisation will be shown to be the key to testing the hypothesis. Similar hypotheses appear in various scientific and industrial contexts, such as when investigating the effectiveness of a new medical treatment in curing disease, the effectiveness of a new advertisement policy to increase revenue, and so on. The course explains key concepts such as null and alternative hypotheses, significance levels, p-values critical regions, type-1 and type-2 errors and the test power.

Learning outcomes:

Understanding

mathematical definition of probability;
key concepts in statistical estimation, such as consistency, bias and variance;
maximum likelihood estimation, parametric models under model well-specification and misspecification, connection to Kullback-Leibler divergence;
procedure of hypothesis testing, meaning of p-values and significance level, randomised experiments..

Nb hours: 21 hours

Grading Policy:

Homework (25% of the final grade)
Final Exam (75% of the final grade)