There are several uses envisioned for the data sets, but we also expect that there will be unforeseen applications. The four most obvious applications are testing forensic tools, establishing that lab equipment is functioning properly, testing proficiency in specific skills and training laboratory staff. Each type of data set has slightly different requirements. Most data sets can be used for more than one function. For example, the Russian Tea Room can be used to evaluate the behavior of a tool to search UNICODE text or display UNICODE text. This set can also be used as a skill test for an examiner to demonstrate proficiency in working with UNICODE text or as a training exercise.
Data sets for tool testing need to be completely documented. The user of the data set needs to know exactly what is in the data set and where it is located. These data sets should also provide specification for a set of explicit tests. However, the user should have sufficient documentation to develop and execute other test cases if necessary or desirable. These data sets could be part of a realistic investigation scenario, but it is easier to control expected results if each data set is focused on a particular type of tool function. Examples of focused function areas are string searching, deleted file recovery and email extraction.
There will tend to be many small test images, each focused on a particular feature for the tool function being tested.
These data sets need to focus on issues in acquisition, access and restoration of data. These data sets might need to have a strong procedural component.
These data sets would be primarily investigation scenario based tests to give a real flavor to the data set. These would be similar to the data sets for proficiency testing, but generally available.
These data sets would be primarily investigation scenario based tests to give a real flavor to the data set. These would be similar to or the same as the data sets for staff training. There would be some small images to focus on specific skills. For example, a data set that would require the examiner to demonstrate some system skill such as loading a new font onto an analysis computer.