Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Dataset construction challenges for digital forensics



James R. Lyle, Graeme Horsman


As the digital forensic field develops, taking steps towards ensuring a level of reliability in the processes implemented by its practitioners, emphasis on the need for effective testing has increased. In order to test, test datasets are required, but creating these is not a straightforward task. A poorly constructed and documented test dataset undermines any testing which has taken place using it, eroding the reliability of any subsequent test results. In essence, given the time, effort and knowledge required to generate datasets, the field must guide those carrying out this task to ensure that it is done right at the first instance without wasting resources. Yet, there are currently few standards and best practices defined for dataset creation in digital forensics. This work defines three categories of dataset which typically exist in digital forensic - tool/process evaluation datasets, actions datasets and scenario-based dataset, where the minimum requirements for their creation are outlined and discussed to support those creating them and to help ensure that where datasets are created, they offer maximum value to the field.
Forensic Science International: Digital Investigation


Digital Forensics, Datasets, Testing, Tool-testing


Lyle, J. and Horsman, G. (2021), Dataset construction challenges for digital forensics, Forensic Science International: Digital Investigation, [online],, (Accessed April 23, 2024)
Created July 29, 2021, Updated October 14, 2021