Audio conferencing enhancement through 3D sound and high quality speech

Nagle, Arnault

This thesis deals with audio conferencing over IP and its improvement through high quality and 3D sound. Our goal is to develop solutions enabling the merging of well-known architectures such as the centralized or the fully distributed ones, techniques that are likely to impact quality and 3D sound. We have to define the controls to manage 3D audio conferencing for each architecture. Quality tests and tests about spatialization must be performed to validate our solutions. The first axis of this thesis is looking further into those current architectures in order to propose solutions integrating 3D sound and improvement techniques. The second axis of our research relies on the definition of the controls enabling the management of the audio conferencing. We define the necessary extensions to control the positions of each participant in the audio conferencing according to the architecture. Our third axis deals with quality tests and tests about spatialization in order to validate the dual-mono coding method and select the most appropriate coders. First we prove that the monaural hearing and the diotic hearing are not equivalent. Second, coders G.711 and G.722 are the most suitable for the centralized audio conferencing with a high audio quality compared the CELP coders. They have low-complexity, and are robust to packet losses, multi-talker, 3D sound and tandeming. For the wideband loosely coupled architecture, AMR-WB at 23.85 kbits/s, G.729.1 at 32 kbits/s, and G.722 at 64 kbits/s seem to be the best coders whatever the packet losses are. In narrowband, G.711, AMR at 12.2 kbits/s, and G.729.1 at 12 kbits/s are the best ones. Coders have to be chosen according to the bitrate and complexity constraints

Communication systems
Eurecom Ref:
© TELECOM ParisTech. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: