MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation
Haiying Guan, Mark Kozak, Eric Robertson, Yooyoung Lee, Amy N. Yates, Andrew P. Delgado, Daniel F. Zhou, Timothée N. Kheyrkhah, Jeff Smith, Jonathan G. Fiscus
We provide a benchmark for digital media forensic challenge evaluations. A series of datasets are used to assess the progress and deeply analyze the performance of diverse systems on different media forensic tasks across last two years. The benchmark data contains four major parts: (1) 35 million images and 300,000 video clips world data download from the internet with their characteristics and labels; (2) up to 176,000 pristine high provenance (HP) images and 11,000 HP videos; (3) approximately 100,000 manipulated images and 4,000 manipulated videos from approximately 5,000 image manipulation journals, and over 500 video manipulation journals with manipulation history graphs and annotation details. (4) a series of evaluation datasets with reference ground-truth to support 6 challenge tasks in media forensic challenge evaluations. In the paper, we first introduce the objectives, challenges, and approaches to building media forensic evaluation datasets. We then discuss our approaches to forensic dataset collection, annotation, and manipulation, and present the design and infrastructure to effectively and efficiently build the evaluation datasets to support different evaluation tasks. Given a specified query, we build an infrastructure that dynamically generates the evaluation comparison subsets for the specified evaluation analysis report. Finally, we demonstrate the evaluation results in the past evaluations.
IEEE Winter Conference on Applications of Computer Vision (WACV 2019)