ManTra-Net: Manipulation Tracing Network For Detection And Localization of Image Forgeries With Anomalous Features

June 15, 2019


Fake news, Internet rumors, insurance fraud, extortion, and even academic publications have all been affected by image forgery, which has recently become an epidemic. Furthermore, the majority of image forgeries have yet to be recognized. Each of the retracted articles due to deliberate manipulation might account for a mean of $392,582 indirect expenses in biomedical research publications alone, implying much larger indirect costs due to misguided research. As a result, new algorithms must be developed to aid in the fight against image manipulation and forgery.

Image forgery can be done in a variety of ways. Splicing, copy-move, removal, and enhancement are the four techniques that have received the greatest attention. Real-life forgeries, on the other hand, are more complicated, and malignant forgers frequently use a series of manipulations to conceal their fake. This necessitates the development of new unified forgery detection systems that are not confined to one or a few known manipulation kinds but are capable of handling more complex and/or unknown manipulation types.

Localization of forging regions is another issue that is frequently disregarded. The majority of available approaches are only concerned with image-level detection, that is, whether or not an image is fabricated. Furthermore, technologies that allow for localization frequently require extensive, time-consuming pre-and post-processing. The mismatch between feature learning and forgery mask production also reveals a forgery detection and localization strategy that isn’t fully optimized.

In the previous four years, notable image forgery localization/detection (IFLD) algorithms have used a variety of clues/features, ranging from handmade features like DCT correlation to entirely implicit learned DNN features, as seen in the table below. On the other hand, despite the fact that DNN approaches are gaining popularity, no dominant DNN design uses the same network architecture. And the majority of them concentrate on a single sort of forgery.

Screenshot 2022-01-14 at 12.31.09 AM


We address the aforementioned concerns and present Mantra-Net as a novel approach for generalized IFLD. It detects fabricated pixels by identifying local abnormal properties, therefore it isn’t restricted to a single sort of forgery or manipulation. Furthermore, it is an end-to-end solution, which eliminates the need for pre- and/or post-processing. It also includes all trainable modules that can be combined to optimize the IFLD task.

To make the manipulation trace feature more sensitive and robust, we study the IMC problem for more and finer manipulation types, breaking down the seven manipulation families (hierarchy level 0) until they are individual algorithms (hierarchy level 5 with 385 manipulation types and finer differences). It is the first to consider this large number of fine-grained manipulation types. And we use the IMC-WGG W&D architecture (wider and deeper with more filters in each convolutional layer and more convolutional blocks) excluding the decision block for the manipulation trace feature extractor.

After a fair comparison between IMC-VGG, IMC-ResNet, and IMC-DcCNN, we chose IMC-VGG architecture with fully convolutional networks for its smaller gap between training and validation, but a much higher accuracy in KCMI testing. In addition, for the first layer, we chose a combined version of SRMConv2D, BayarConv2D, and the classic Conv2D for the first convolutional layer for the best performance.

We also propose a novel deep anomaly detection network architecture composed of three stages: adaptation, anomalous feature extraction, and decision. For identifying potential forged regions, we first identify the dominant feature of an image, and any feature sufficiently different from this dominant feature is thus anomalous.

Experimental Results

Our extensive experimental results using only pretrained models show that the proposed Mantra-Net is sensitive to subtle manipulation and robust to postprocessing obfuscating manipulation and that it achieves good generalizability to unseen data and unknown manipulation types, even for the most recent DNN-based manipulations like face swapping and deep image inpainting, as shown in the Figures below.

Even though we do not apply any model finetuning or post-processing, it is safe to conclude that ManTra-Net clearly outperforms those classic unsupervised methods and is comparable to those SOTA DNN methods, especially since the proposed Mantra-Net achieves very consistent performance across all testing datasets, indicating that it does generalize well on different datasets.