Deep learning for semantic description of visual human traits

Antipov, Grigory

1er prix du Prix de Thèse TELECOM ParisTech 2018

The recent progress in artificial neural networks (rebranded as "deep learning") has significantly boosted the state-of-the-art in numerous domains of computer vision offering an opportunity to approach the problems which were hardly solvable with conventional machine learning. Thus, in the frame of this PhD study, we explore how deep learning techniques can help in the analysis of one the most basic and essential semantic traits revealed by a human face, namely, gender and age. In particular, two
complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes. Convolutional Neural Network (CNN) has currently become a standardmodel for image-based object recognition in general, and therefore, is a natural choice for addressing the first of these two problems. However, our preliminary studies have shown that the effectiveness of CNNs for a particular task strongly depends on the problem itself and on the strategy which is used for training. Therefore, in this thesis, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation CNNs. For example, we demonstrate that learning a CNN to directly recognize gender is less effective than learning the same neural network to firstly recognize a person identity, and then adapting it for gender prediction. We also show that age estimation CNN benefits from a specific representation of age labels which is known as Label Distribution Age Encoding (LDAE). All in all, the conclusions of the performed study
allow us to design the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and to win an international competition on apparent age estimation. When evaluated on a very challenging internal dataset, our best models reach 98:7% of classification accuracy and an average error of 4:26 years for gender recognition and age estimation, respectively. In order to address the problem of synthesis and editing of human faces, we design and train GAcGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Despite GANs are widely praised as one of the best models for image synthesis, applying them for face editing remains an open problem because of the poor preservation of the original face identity by the existing approaches. In this thesis, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing
the original identity in synthetic faces. The key idea of our approach is the usage of a separately trained face recognition CNN which helps to minimize the person identity difference between the original and the edited faces. In order to show the practical interest of the designed face editing method, we apply it for age normalization in a cross-age face verification scenario. In average, our method allows improving
the accuracy of an off-the-shelf face verification software by about 8 points.

Digital Security
Eurecom Ref:
© TELECOM ParisTech. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: