Speech and audio processing

Speech

Abstract

This course provides an introduction to the automatic processing of speech and audio signals. It starts with a treatment of the human speech production and perception mechanisms and looks at how our understanding of them has influenced attempts to process speech and audio signals automatically. The course then considers the analysis, coding and parameterisation of signals in the case of different speech and audio processing tasks. After an introduction to essential pattern recognition techniques, the course considers specific applications including speech recognition, speaker recognition and speaker diarization. The course also includes a treatment of speech and audio coding, noise compensation and speech enhancement.

Teaching and Learning Methods: The course is comprised of lectures and exercises and laboratory sessions.

Course policies: Attendance of laboratory sessions is mandatory.

Bibliography

Book: HUANG X., ACERO A., HON H-W. Spoken language processing: a guide to theory, algorithms, and system development. Prentice Hall, 2001, 1008p.
Book: RABINER L., JUANG B-H. Fundamentals of speech recognition. Pearson College Div, 1993, 496p.
Book: SIMPSON P. La conception de systèmes avec FPGA. Dunod, 2014, 304p. (in French)

Requirements

A proficiency in engineering mathematics, fundamental signal processing, statistics and probability.

Description

Production, perception and analysis
Towards modelling, classification and recognition
Deterministic approaches to speech recognition
Stochastic approaches to speech recognition
Speaker recognition and diarization
Speech and audio coding
Noise compensation and speech enhancement

Learning outcomes:

to provide students with knowledge of the human speech production and perception mechanisms, the fundamentals of speech and audio signal processing and the essential, relevant techniques for pattern recognition;
to apply these techniques to the automatic treatment of speech and audio signals;
to implement, analyse and evaluate their performance in various speech and audio processing tasks.

Nb hours: 21 hours

Evaluation:

Lab reports (no final grade),
Final Exam (100% of the final grade)