Deep Matching and Validation Network

October 19, 2017

Introduction

Image forgery is becoming a widespread problem as a result of the proliferation of digital content, and sophisticated image editing tools have been pushing the limits of image composition in recent years in order to produce more natural and aesthetic images. Meanwhile, new professionals, forensic experts, and legal prosecutors are finding it increasingly difficult to detect and localize image forgeries on a large scale. These new challenges necessitate the development of innovative and scalable image forensics technologies.

Figure 1, depicts some of the most common manipulations, such as splicing, copy-move, erasing, and retouching. Splicing is considered a more complicated image manipulation technique because it involves external images.

Though it appears that classic splicing detection algorithms based on visual features can be easily modified for the more general splicing detection with localization problem, also known as constrained image splicing detection (CISD), we note two major drawbacks of these classic algorithms: 1) they use handcrafted features, which are less robust against image transformations and are certainly not optimal for the CISD problem; 2) separately tuning each of the CISD stages in a general forgery detection framework (GFDF) optimizes performance separately rather than jointly.

Approach

We conceptually follow the GFDF and propose Deep Matching and Verification Network (DMVN), a new deep neural network-based solution for the detection and localization of image splicing. As shown in Figure 2 below, these two problems can be solved jointly using a multitask network in an end-to-end manner.

We create the Deep Dense Matching layer to find potential splicing regions for two given image features, and a Visual Consistency Validator module to determine detection by cross-verifying image content on potential splicing regions. In contrast to classic approaches, the proposed approach does not rely on handcrafted features, heuristic rules, parameters, or additional post-processing, yet can perform both splicing localization and detection.

Experimental Results

Our experiments on two very large datasets, the paired CASIA dataset and the NIST-provided Nimble 2017 image splicing detection dataset, show that this new approach is much faster and achieves a much higher AUC score than classic approaches and that it also provides meaningful splicing masks that can assist in further forensics analysis (illustrated in Figure 3 below).

Finally, while we train our DMVN model with regard to both the localization and detection branches, due to the feed-forward nature of DMVN, the proposed DMVN might be trained with respect to simply the detection branch while still achieving the ability to localize splicing masks. This means that the proposed DMVN model can be easily fine tuned to a new CISD dataset using only label annotations, and thus splicing mask annotation in CISD training data collection can save a significant amount of time and cost.