New biometric research data — ranging from fingerprints to facial photographs and iris scans — is now available from the National Institute of Standards and Technology (NIST).
Stripped of identifying information and created expressly for research purposes, the data is designed primarily for testing systems that verify a person’s identity before granting access — be it to another room or another country. Few available resources exist to help developers evaluate the performance of the software algorithms that form the heart of these systems, and the NIST data will help fill that gap.
“This all gets back to reproducible research,” said NIST computer scientist Greg Fiumara. “The data will help anyone who is interested in testing the error rates of biometric identification systems.”
The files, which are available on the NIST website, are organized into three Special Databases (SDs). Numbered SD 300, SD 301 and SD 302, they represent the first in what is intended to be an expanding collection of biometric resources.
While the three databases contain varied types of data collected at different times, two of them contain information gathered during the Nail to Nail Fingerprint Challenge, an IARPA-funded competition that NIST helped to design and carry out.
One of the new resources, SD 301, is significant for being the first “multimodal” dataset NIST has ever released. Multimodal means that an individual’s different biometric markers — in this case face, fingerprints and iris scan — are all linked so that they can be used together for identification by systems that use a combination of identification approaches, such as a photograph from the individual’s face in addition to their fingerprints.
“This opens up possibilities for types of multimodal research that haven’t been done before,” Fiumara said. “We want to get more secure and more accurate identification, as multimodal systems are harder to spoof.”
SD 302 contains fingerprint data from a few hundred people gathered by a mixture of eight commercially available and prototype devices.
Data collected during both portions of the Nail to Nail challenge includes prints taken with contactless fingerprint devices, a technology that could simplify and speed up print gathering as it improves.
“It also includes latent fingerprint data, in which prints are left while handling everyday objects,” Fiumara said. “Realistically and expertly collected latent data is difficult to come by.”
All of the individuals represented in the two sets have formally consented to the inclusion of their biometric and demographic data and its distribution for use in advancing research, Fiumara said. The data has been scrubbed of identifying information such as their names and places of residence.
Rounding out the datasets is SD 300, a collection of fingerprints taken from 900 old ink cards. All of the record cards have been stripped of identifying data and are from individuals who are now deceased. According to Fiumara, a benefit of the data is helping manufacturers evaluate how well their modern systems can produce results that will be interoperable with hard-copy ink records, which will remain important to the criminal justice system for some time.
As a whole, the group of three SDs contain data retained with archival-grade lossless compression — a step forward, Fiumara said, because the research data sets in the past often did not retain this level of fidelity to the original image.
Each dataset in the series has an accompanying user’s guide offering background about collection methods and other details useful to researchers.