Speech and audio processing


This course provides an introduction to the automatic processing of speech and audio signals.  It starts with a treatment of the human speech production and perception mechanisms and looks at how our understanding of them has influenced attempts to process speech and audio signals automatically.  The course then considers the analysis, coding and parameterisation of signals in the case of different speech and audio processing tasks.  After an introduction to essential pattern recognition techniques, the course considers specific applications including speech recognition, speaker recognition and speaker diarization.  The course also includes a treatment of speech and audio coding, noise compensation and speech enhancement.

Teaching and Learning Methods: The course is comprised of lectures and exercises and laboratory sessions.

Course policies: Attendance of laboratory sessions is mandatory.

  • Book: HUANG X., ACERO A., HON H-W. Spoken language processing: a guide to theory, algorithms, and system development. Prentice Hall, 2001, 1008p.
  • Book: RABINER L., JUANG B-H. Fundamentals of speech recognition. Pearson College Div, 1993, 496p.
  • Book: SIMPSON P. La conception de systèmes avec FPGA.  Dunod, 2014, 304p. (in French)


A proficiency in engineering mathematics, fundamental signal processing, statistics and probability.

  • Production, perception and analysis
  • Towards modelling, classification and recognition
  • Deterministic approaches to speech recognition
  • Stochastic approaches to speech recognition
  • Speaker recognition and diarization
  •  Speech and audio coding
  • Noise compensation and speech enhancement   

Learning outcomes:  

  • to provide students with knowledge of the human speech production and perception mechanisms, the fundamentals of speech and audio signal processing and the essential, relevant techniques for pattern recognition;
  • to apply these techniques to the automatic treatment of speech and audio signals;
  • to implement, analyse and evaluate their performance in various speech and audio processing tasks. 

Nb hours: 21 hours


  • Lab reports (no final grade),

  • Final Exam (100% of the final grade)