March 15, 2018
Because of the rapid growth of social networks and advancements in image editing software, it is much easier to manipulate an image in a more realistic manner. Although many user manipulations are not malicious in nature, image forgery has become a more serious and widespread problem in recent years. One of the most common types of image forgery is image copy-move forgery since it is simple to accomplish using a variety of photo-editing software. Figure 1 depicts image copy-move forgery samples that make use of evidence duplication and evidence removal.
Classic image copy-move forgery detection methods typically involve three major steps: unit feature representation, unit-level matching, and postprocessing. Although the general concept of classic solutions is simple, tuning many heuristics but data-dependent parameters is inconvenient. Deep neural networks (DNN) have recently been introduced to the image forgery community, but most works still use them as a replacement for classic feature extractors.
In contrast to classic approaches that involve multiple stages of parameter tuning and training, we present a fully trainable end-to-end DNN solution for image copy-move forgery detection. In particular, we conceptually follow the three major steps commonly used in classic solutions, but we implement them within a single end-to-end DNN. Figure 2 depicts the overall flowchart of the proposed DNN approach. Because each of our processing modules is a collection of standard or custom DNN layers, when we cascade them together, our model remains end-to-end trainable.
We not only avoid setting various heuristic parameters and thresholds in this manner, but we also jointly train all modules in terms of the forgery mask reconstruction loss. This is the first end-to-end DNN solution that predicts forgery masks for the image copy-move forgery detection problem that we are aware of.
We use two datasets to evaluate image copy-move forgery performance: the synthesized 10K dataset and the CASIA TIDE v2.0 dataset, using the widely used recall and f-score metrics. When comparing image-level and pixel-level detection performance (as shown in Tables 1 and 2), the proposed DNN solution outperforms the baseline algorithms by a significant margin in both quality and speed.
As shown in the Figure above, the proposed solution gracefully handles a number of well-known challenges, including but not limited to 1) copy-move under affine transform with rotation, scale, and perspective changes; and 2) multiple source and/or multiple destination regions. Unlike traditional approaches that rely on additional post processing to handle different cases, we handle all challenges implicitly in our end-to-end DNN solution—that is, we don’t need to specify parameters like how many source regions to analyze, what is the largest distance to reject a matching or any heuristic parameters, but instead simply let the network do the prediction. Furthermore, our detected forgery masks are extremely meaningful in terms of capturing the “object” concept, which is extremely useful in computer-aided image forensics analysis.
In terms of drawbacks, we discovered that the proposed DNN method 1) tends to predict “blob”-like regions; 2) makes errors in pure texture images; and 3) occasionally incorrectly predicts genuine but visually similar regions as forged regions. However, this preliminary solution clearly demonstrates the promising future of using DNNs in image forgery detection.