Revealing True Identity: Detecting Makeup Attacks in Face-based Biometric Systems
October 12, 2020
Introduction
Face recognition systems are vulnerable to different types of presentation attacks, such as printed face images and plastic masks. To mitigate such vulnerability, face recognition systems are augmented with presentation attack detection modules. One of the most challenging attack types to detect with such modules is makeup. Makeup can substantially change the facial characteristics of a person, including, for example, their age and gender. Yet, makeup preserves the naturalness of the 3D geometry of the face, unlike flat printed images for example; and its material is normally worn every day by many people, unlike plastic masks for example. In this blog post, we introduce our work, in collaboration with Simon Fraser University, on detecting makeup attacks by recovering the true identity of a person behind a worn makeup.
Approach
Our framework is demonstrated in the figure below. The basic idea is very simple and intuitive. If we remove the makeup from the face and find that the face without makeup is considerably different from the face with the makeup, then the makeup was meant to conceal the identity and hence considered an attack. That is what the Makeup Removal and Matcher blocks do in the figure. The makeup removal module is trained to generate an image without makeup from an input image with makeup. However, the makeup removal module does not know how to deal with images that do not have makeup. Therefore, to enhance the model’s performance, we filter away images that do not have makeup before feeding them to the makeup removal module. This is the role of the Classifier block in the figure.

Makeup Removal
The most critical component of our solution is the makeup removal module. It is trained using a modified CycleGAN framework. Two generators are trained along with two discriminators. Each generator either removes or adds back the makeup. One discriminator distinguishes real makeup images from the generated ones, and the other distinguishes real no-makeup images from the generated ones. During testing, only the generator that removes makeup is used. Different losses are deployed during training to make sure that a two-way transformation of an input image through the model preserves perceptual structure, identity, and feature representation.

Experimental Results
Data
Experiments are conducted on the PADISI-Face dataset. Specifically, our model uses the RGB images and three wavelengths 940 nm, 1050 nm, and 1200 nm of the SWIR images. Examples of RGB and 1050 nm images are shown below.

Overall System Performance
The two plots below show two views of the ROC curves for two versions of our entire face presentation attack system. One version uses only RGB images and the other uses both RGB and SWIR images as explained above. Adding SWIR channels significantly enhances the performance of the system, boosting the true positive rate (TPR) at 0.1 false positive rate (FPR) from 71.65% to 96.85%.


Makeup Removal Performance
We compare the performance of our makeup removal approach to three SOTA approaches, which are:
BTD: C. Cao, F. Lu, C. Li, S. Lin, and X. Shen. 2019. Makeup Removal via Bidirectional Tunable De-Makeup Network. IEEE Transactions on Multimedia 21, 11 (Nov 2019), 2750–2761.
BeautyGAN: Tingting Li, Ruihe Qian, Chao Dong, Si Liu, Qiong Yan, Wenwu Zhu, and Liang Lin. 2018. BeautyGAN: Instance-Level Facial Makeup Transfer with Deep Generative Adversarial Network. In Proceedings of the 26th ACM International Conference on Multimedia. 645–653.
RES: Wei Shen and Rujie Liu. 2017. Learning residual images for face attribute manipulation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4030–4038.
The first comparison is on the structural similarity between the generated images with no makeup and reference images for the same subjects with no makeup. The table below shows the results on two structural similarity metrics. Our approach outperformed all others in both metrics.

The second comparison was based on human judgement of the quality of the generated images. In this study, 57 participants were asked to rank the generated images from different methods according to their quality. The results show that our method was ranked first for more than 77% of the time, and was ranked among the top two more than 94% of the time.
