Video person recognition strategies using head motion and facial appearance

Matta, Federico

In this doctoral dissertation, we principally explore the use of the temporal information available in video sequences for person and gender recognition; in particular, we focus on the analysis of head and facial motion, and their potential application as biometric identifiers. We also investigate how to exploit as much video information as possible for the automatic recognition; more precisely, we examine the possibility of integrating the head and mouth motion information with facial appearance into a multimodal biometric system, and we study the extraction of novel spatio-temporal facial features for recognition. We initially present a person recognition system that exploits the unconstrained head motion information, extracted by tracking a few facial landmarks in the image plane. In particular, we detail how each video sequence is firstly pre-processed by semi-automatically detecting the face, and then automatically tracking the facial landmarks over time using a template matching strategy. Then, we describe the geometrical normalisations of the extracted signals, the calculation of the feature vectors, and how these are successively used to estimate the client models through a Gaussian mixture model (GMM) approximation. In the end, we achieve person identification and verification by applying the probability theory and the Bayesian decision rule (also called Bayesian inference). Afterwards, we propose a multimodal extension of our person recognition system; more precisely, we successfully integrate the head motion information with mouth motion and facial appearance, by taking advantage of a unified probabilistic framework. In fact, we develop a new temporal subsystem that has an extended feature space enriched by some additional mouth parameters; at the same time, we introduce a complementary spatial subsystem based on a probabilistic extension of the original eigenface approach. In the end, we implement an integration step to combine the similarity scores of the two parallel subsystems, using a suitable opinion fusion (or score fusion) strategy. Finally, we investigate a practical method for extracting novel spatio-temporal facial features from video sequences, which are used to discriminate identity and gender. For this purpose we develop a recognition system called tomofaces, which applies the temporal X-ray transformation of a video sequence to summarise the facial motion and appearance information of a person into a single X-ray image. Then, we detail the linear projection from the X-ray image space to a low dimensional feature space, the estimation of the client models obtained by computing their cluster representatives, and the recognition of identity and gender through a nearest neighbour classifier using distances

Digital Security
Eurecom Ref:
© Université de Nice. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: