Detection and Continual Learning of Novel Face Presentation Attacks

January 4, 2021

Introduction

Advances in deep learning, combined with availability of large datasets, have led to impressive improvements in face presentation attack detection research. However, state-of-the-art face antispoofing systems are still vulnerable to novel types of attacks that are never seen during training. Moreover, even if such attacks are correctly detected, these systems lack the ability to adapt to newly encountered attacks. The post-training ability of continually detecting new types of attacks and self-adaptation to identify these attack types, after the initial detection phase, is highly appealing. In this paper, we enable a deep neural network to detect anomalies in the observed input data points as potential new types of attacks by suppressing the confidence-level of the network outside the training samples’ distribution. We then use experience replay to update the model to incorporate knowledge about new types of attacks without forgetting the past learned attack types. Experimental results are provided to demonstrate the effectiveness of the proposed method on two benchmark datasets as well as a newly introduced dataset which exhibits a large variety of attack types.

Approach

We develop an algorithm for continual detection of emerging novel types of face Presentation Attacks (PAs). Our objective is to enable the system to identify novel PAs. The model then is updated to learn new attack types such that it does not forget the past learned attack types. Our idea is based on enabling the network to identify new attack types as testing samples that are outside the training distribution samples (OTDS) in an embedding. The base model is then updated to classify these samples as new attacks types in a continual learning (CL) setting, where catastrophic forgetting is addressed using experience replay. Despite being effective in continual learning (CL) settings, the idea of detecting OTDS has not been explored for PAD.

The main contributions of our work are as follows:

  • A new formulation of face PAD as a continual learning problem to equip a PAD system with defense mechanisms that allow learning novel attack types continually.
  • An algorithm to identify novel attack types as OTDS anomalies by continually screening the input data representations and enable the model to correctly classify them as attacks, in the future, via experience replay.
  • Demonstrating that a model which has been pre-trained in an unsupervised fashion can achieve state-of-the-art (SOTA) detection performance on unknown attack detection protocols as a result of transferred knowledge.
  • A new face anti-spoofing dataset with diverse attack types to evaluate our algorithm in CL settings.

The figure on the side presents the Block-diagram architecture of the proposed continual PAD learning system: 1. PAD module identifies novel attack samples among the input data stream during model execution; 2. The samples from BFs and PA types are stored to build a dataset; 3. The novel samples and samples stored in the replay buffer are used to update the model through pseudo-rehearsal and the replay buffer.

To tackle the vulnerability of PAD systems in overconfident regions (shown on the right), we need to suppress the confidence of the model in those regions. To this end, we fit a parametric distribution to model the learned bimodal distribution in the embedding space. Our idea is based on expanding the base classifier subnetwork and classify the data points into three classes, namely BFs, PAs, and OTDS. The intuition behind this idea is that novel PA instances are expected to be different from the training data in the embedding space. This means that we can identify them if the input lies outside the components of the bimodal distribution fitted on the embedding. Hence, if we can generate samples that lie outside this distribution, i.e., intuitively the gray region in the figure below, we can augment the samples from this region with the training data and retrain the classifier subnetwork. As a result, the system will be capable of identifying OTDS data points during execution.

Experimental Results

Since no prior method in the literature addresses the continual PAD setting explored in this work, we use three baselines to compare the proposed method with. The presented performance is compared against static training (ST), joint training (JT), and full replay (FR).
In the ST setting, we report the performance of the base model after initial training without further updating when new attack types in the dataset are encountered. This setting represents performance of existing PAD algorithms when novel PAs are observed and serves as a lower bound. Improvement over this baseline demonstrates relative effectiveness of our approach. In the JT setting, we train the model on the whole labeled training dataset including all attacks types in the initial training. This setting serves an upper-bound which assumes all attack types are known a priori. FR is a variation of our algorithm in which we assume that the memory budget is unlimited. As a result, we can save and replay all the stored data points in the buffer. We also report performance of our algorithm (NACL) when random sampling (RS) is used to select the buffer samples. In the RS setting, we randomly store selected samples in the memory buffer. Comparison with RS is performed to investigate the effect of using the proposed sampling selection technique. For a fair comparison, we use the same buffer size for both RS and NACL methods.

Similar to most works in the CL literature, there is a boundary between two subsequent tasks in our formulation. This boundary can be attributed to the instances at which the model is updated after a period of data collection. During each task or period at which the model is not updated, the system may encounter more than one attack types. We consider two sets of experiments for a thorough validation.

First, we consider that the initial training task in our experiments consists of training on bona-fide samples and only the first type of PA, according to the index used in the dataset. Each subsequent task is constructed by introducing one novel attack type. We report the performance of our algorithm and the baselines in part (a) of the figure below. At each time step, we reported the model performance on the full testing split of the datasets. We have used (1-APCER), (1-BPCER), and (1-ACER) for visualization because learning curves are usually perceived to be increasing functions. Since the testing split is fixed, successful learning is analogous to rising learning curves.

We observe, as expected, that ST is highly vulnerable with respect to novel attack types leading to high values for the APCER and ACER metrics. Note that the high value for BPCER is expected but is not sufficient. This baseline demonstrates the vulnerability of current PAD systems, when novel attacks are encountered, and justifies the necessity of developing algorithms for PAD in CL settings. When we use the designed novel attack detection mechanism, we can clearly see that performance improves significantly towards the JT upper-bound as more attacks are identified and learned. Performance degradation in terms of BPCER metric is expected due to occurrence of catastrophic forgetting but we see improvements in APCER outweigh this degradation (see ACER plots). Note that RS, FR, and NACL are all equipped with the proposed mechanism and their major difference is in the implementation of the experience replay procedure. We do not see a clear winner between these methods across all metrics but note that NACL and RS offer storing significantly less amount of data in the memory compared to FR (only 100 samples). We also note that in the majority of the time-steps NACL outperforms RS. We conclude that experience replay is an effective approach to address catastrophic forgetting.

In the second set of our experiments, we consider that the initial training task consists of training on bona-fide samples and only the first PA type, according to the index used in the datasets. Subsequent tasks are constructed by introducing two novel attack types at each time-step. This setting is closer to a realistic situation. We have visualized the learning curves for our algorithm and the baselines in part (b) of the figure below. Comparing the results with those of part (a), we see that improvements in terms of the ACPER metric are similar. This observation suggests that our algorithm is robust even when multiple attacks are encountered in each time-step. We also note that performance degradation in terms of the BPCER metric is less than in part (a). This observation is expected because the base model has been updated less compared to the single-attack per task scenario. As a result, catastrophic forgetting has been less severe. We conclude that our approach is effective for automatically identifying novel attacks and retraining the model.