Multimedia (images, videos, text, and speech or combinations of more than one modality) forensics is the science of computer algorithms (especially machine learning and artificial intelligence) to automatically analyze multimedia, construct provenance trees and identify and provide evidence of multimedia forgeries. Multimedia forgeries range from manually manipulating multimedia assets, to multimedia repurposing and deep neural network-generated content.
Image Manipulation and Image Repurposing Identification
Manipulating visual information (images and videos) have perhaps existed since visual information itself existed. Take, for example, the iconic portrait of President Lincoln of 1860. As it turns out, the portrait was actually a fake composite of Lincoln’s head and the body of Vice President John Calhoun, who died in 1850.
Image manipulation forgeries include any methods that attempt to change the original digital content of the image (i.e. pixels) including, for example, image splicing and copy-move forgeries, where part(s) of the image are altered by copying and pasting parts of the same or other images (e.g. to introduce or hide evidence). Meanwhile, another infamous multimedia forgery technique depends on repurposing an unaltered image to convey misleading information by changing one or more of the accompanying metadata, such as changing the geolocation where the image was captured, events or names of people in the image. At VIMAL, we have been developing deep learning methods for detecting image splicing and copy move image forgeries, as well as deep learning methods for identifying repurposed multimedia packages (i.e. images accompanied by other types of metadata).
Deep neural networks have shown superior performance in virtually all visual intelligence tasks, including generating synthetic, yet photo-realistic, images and videos. One of these visual intelligence tasks is using deep neural networks to generate fake videos of people saying and/or doing things they have never said or done before, hence the name deepfakes. The process of generating deepfakes simply entails teaching a neural network the visual appearance of two people (the person in the original video and the targeted person) and then having the network swap either the entire face in the video with the face of the target person (i.e. face replacement) or only altering the facial expressions and lip and eye movements (i.e. face reenactment). At VIMAL, we have been developing state-of-the-art methods and systems for detecting deepfake videos.