MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation

Haiying Guan; Mark Kozak; Eric Robertson; Yooyoung Lee; Amy Yates; Andrew Delgado; Daniel F. Zhou; Timothée N. Kheyrkhah; Jeff Smith; Jonathan G. Fiscus

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation

Published

January 11, 2019

Author(s)

Haiying Guan, Mark Kozak, Eric Robertson, Yooyoung Lee, Amy Yates, Andrew Delgado, Daniel F. Zhou, Timothée N. Kheyrkhah, Jeff Smith, Jonathan G. Fiscus

Abstract

We provide a benchmark for digital media forensic challenge evaluations. A series of datasets are used to assess the progress and deeply analyze the performance of diverse systems on different media forensic tasks across last two years. The benchmark data contains four major parts: (1) 35 million images and 300,000 video clips world data download from the internet with their characteristics and labels; (2) up to 176,000 pristine high provenance (HP) images and 11,000 HP videos; (3) approximately 100,000 manipulated images and 4,000 manipulated videos from approximately 5,000 image manipulation journals, and over 500 video manipulation journals with manipulation history graphs and annotation details. (4) a series of evaluation datasets with reference ground-truth to support 6 challenge tasks in media forensic challenge evaluations. In the paper, we first introduce the objectives, challenges, and approaches to building media forensic evaluation datasets. We then discuss our approaches to forensic dataset collection, annotation, and manipulation, and present the design and infrastructure to effectively and efficiently build the evaluation datasets to support different evaluation tasks. Given a specified query, we build an infrastructure that dynamically generates the evaluation comparison subsets for the specified evaluation analysis report. Finally, we demonstrate the evaluation results in the past evaluations.

Proceedings Title

IEEE Winter Conference on Applications of Computer Vision (WACV 2019)

Conference Dates

January 8-11, 2019

Conference Location

Waikola, HI

Conference Title

WACV 2019

Pub Type

Conferences

Download Paper

Local Download

Video analytics, Forensic Science and Digital evidence

Citation

Guan, H. , Kozak, M. , Robertson, E. , Lee, Y. , Yates, A. , Delgado, A. , Zhou, D. , Kheyrkhah, T. , Smith, J. and , J. (2019), MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation, IEEE Winter Conference on Applications of Computer Vision (WACV 2019), Waikola, HI, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=927035 (Accessed July 30, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created January 10, 2019, Updated December 31, 2019

Was this page helpful?

MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation

Author(s)

Abstract

Download Paper

Citation

Additional citation formats

Issues