MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation

Published: January 11, 2019


Haiying Guan, Mark Kozak, Eric Robertson, Yooyoung Lee, Amy N. Yates, Andrew P. Delgado, Daniel F. Zhou, Timothée N. Kheyrkhah, Jeff Smith, Jonathan G. Fiscus


We provide a benchmark for digital media forensic challenge evaluations. A series of datasets are used to assess the progress and deeply analyze the performance of diverse systems on different media forensic tasks across last two years. The benchmark data contains four major parts: (1) 35 million images and 300,000 video clips world data download from the internet with their characteristics and labels; (2) up to 176,000 pristine high provenance (HP) images and 11,000 HP videos; (3) approximately 100,000 manipulated images and 4,000 manipulated videos from approximately 5,000 image manipulation journals, and over 500 video manipulation journals with manipulation history graphs and annotation details. (4) a series of evaluation datasets with reference ground-truth to support 6 challenge tasks in media forensic challenge evaluations. In the paper, we first introduce the objectives, challenges, and approaches to building media forensic evaluation datasets. We then discuss our approaches to forensic dataset collection, annotation, and manipulation, and present the design and infrastructure to effectively and efficiently build the evaluation datasets to support different evaluation tasks. Given a specified query, we build an infrastructure that dynamically generates the evaluation comparison subsets for the specified evaluation analysis report. Finally, we demonstrate the evaluation results in the past evaluations.
Proceedings Title: IEEE Winter Conference on Applications of Computer Vision (WACV 2019)
Conference Dates: January 8-11, 2019
Conference Location: Waikola, HI
Conference Title: WACV 2019
Pub Type: Conferences

Download Paper

Created January 11, 2019, Updated February 08, 2019