The human face is an attractive biometric identifier and face recognition has certainly improved a lot since its beginnings some three decades ago, but still its application in real world has achieved limited success. In this doctoral dissertation we focus on a local feature of the human face namely the lip and analyse it for its relevance and influence on person recognition. In depth study is carried out with respect to various steps involved, such as detection, evaluation, normalization and the applications of the human lip motion.
Initially we present a lip detection algorithm that is based on the fusion of two independent methods. The first method is based on edge detection and the second one on region segmentation, each having distinct characteristics and thus exhibit different strengths and weaknesses. We exploit these strengths by combining the two methods using fusion. Then we present results from extensive testing and evaluation of the detection algorithm on a realistic database. Next we give a comparison of the visual features of lip motion for their relevance to person recognition. For this purpose we extract various geometric and appearance based lip features and compare them using three feature selection measures; mRMR, Bhattacharya Distance and Mutual Information.
Next we extract features which model the behavioural aspect of lip motion during speech and exploit them for person recognition. The behavioural features include static features, such as the normalized length of major/minor axis, coordinates of lip extrema points and dynamic features based on optical flow. These features are used to build client model by Gaussian Mixture Model (GMM) and finally the classification is achieved using a Bayesian decision rule. Recognition results are then presented on a text independent database specifically designed for testing behavioural features that require comparatively more data.
Lastly we propose a temporal normalization method for handling the variation caused by lip motion during speech. Given a group of videos for a person uttering the same sentence multiple times we study the lip motion in one of the videos and select certain key frames as synchronization frames. We then synchronize these frames from the first video with the remaining videos of the same person. Finally all the videos are normalized temporally by interpolation using lip morphing. For evaluation of our normalization algorithm we have devised a spatio-temporal person recognition algorithm that compares normalized and un-normalized videos.