NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.
Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.
An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation
Published
Author(s)
Haiying Guan, Mark Kozak, Eric Robertson, Yooyoung Lee, Amy Yates, Andrew Delgado, Daniel F. Zhou, Timothée N. Kheyrkhah, Jeff Smith, Jonathan G. Fiscus
Abstract
We provide a benchmark for digital media forensic challenge evaluations. A series of datasets are used to assess the progress and deeply analyze the performance of diverse systems on different media forensic tasks across last two years. The benchmark data contains four major parts: (1) 35 million images and 300,000 video clips world data download from the internet with their characteristics and labels; (2) up to 176,000 pristine high provenance (HP) images and 11,000 HP videos; (3) approximately 100,000 manipulated images and 4,000 manipulated videos from approximately 5,000 image manipulation journals, and over 500 video manipulation journals with manipulation history graphs and annotation details. (4) a series of evaluation datasets with reference ground-truth to support 6 challenge tasks in media forensic challenge evaluations. In the paper, we first introduce the objectives, challenges, and approaches to building media forensic evaluation datasets. We then discuss our approaches to forensic dataset collection, annotation, and manipulation, and present the design and infrastructure to effectively and efficiently build the evaluation datasets to support different evaluation tasks. Given a specified query, we build an infrastructure that dynamically generates the evaluation comparison subsets for the specified evaluation analysis report. Finally, we demonstrate the evaluation results in the past evaluations.
Proceedings Title
IEEE Winter Conference on Applications of Computer Vision (WACV 2019)
Guan, H.
, Kozak, M.
, Robertson, E.
, Lee, Y.
, Yates, A.
, Delgado, A.
, Zhou, D.
, Kheyrkhah, T.
, Smith, J.
and , J.
(2019),
MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation, IEEE Winter Conference on Applications of Computer Vision (WACV 2019), Waikola, HI, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=927035
(Accessed October 9, 2025)