A new study of face recognition technology created after the onset of the COVID-19 pandemic shows that some software developers have made demonstrable progress at recognizing masked faces.
The findings, produced by the National Institute of Standards and Technology (NIST), are detailed in a new report called Ongoing Face Recognition Vendor Test (FRVT) Part 6B: Face Recognition Accuracy with Face Masks Using Post-COVID-19 Algorithms (NISTIR 8331). It is the agency’s first study that measures the performance of face recognition algorithms developed following the arrival of the pandemic. A previous report from July explored the effect of masked faces on algorithms submitted before March 2020, indicating that software available before the pandemic often had more trouble with masked faces.
“Some newer algorithms from developers performed significantly better than their predecessors. In some cases, error rates decreased by as much as a factor of 10 between their pre- and post-COVID algorithms,” said NIST’s Mei Ngan, one of the study’s authors. “In the best cases, software algorithms are making errors between 2.4 and 5% of the time on masked faces, comparable to where the technology was in 2017 on nonmasked photos.”
The new study adds the performance of 65 newly submitted algorithms to those that were tested on masked faces in the previous round, offering cumulative results for 152 total algorithms. Developers submitted algorithms to the FRVT voluntarily, but their submissions do not indicate whether an algorithm is designed to handle face masks, or whether it is used in commercial products.
Using the same set of 6.2 million images as it had previously, the team again tested the algorithms’ ability to perform “one-to-one” matching, in which a photo is compared with a different photo of the same person — a function commonly used to unlock a smartphone. (The team did not test algorithms’ ability to perform “one-to-many” matching — often used to find matches in a large database — but plans to do so in a later round.) And as with the July report, the images had mask shapes digitally applied, rather than showing people wearing actual masks.
Some of the report’s findings include:
When both the new image and the stored image are of masked faces, error rates run higher. With a couple of notable exceptions, when the face was occluded in both photos, false match rates ran 10 to 100 times higher than if the original saved image showed an uncovered face. Smartphones often use one-to-one matching for security, and it would be far more likely for a stranger to successfully unlock a phone if the saved image was of a masked person.
The more of a face a mask covers, the higher the algorithm’s error rate tends to be. Continuing a trend from the July 2020 report, round mask shapes — which cover only the mouth and nose — generated fewer errors than wide ones that stretch across the cheeks, and those covering the nose generated more errors than those that did not.
Mask colors affect the error rate. The new study explored the effects of two new mask colors — red and white — as well as the black and light blue masks the July study tested. While there were exceptions, the red and black masks tended to yield higher error rates than the other colors did. The research team did not investigate potential reasons for this effect.
A few algorithms perform well with any combination of masked or unmasked faces. Some developers have created “mask-agnostic” software that can handle images regardless of whether or not the faces are masked. The algorithms detect the difference automatically, without being told.
A final significant point that the NIST research team makes also carries over from previous studies: Individual algorithms differ. End users need to get to know how their chosen software performs in their own specific situations, ideally using real physical masks rather than the digital simulations the team used in the study.
“It is incumbent upon the system owners to know their algorithm and their data,” Ngan said. “It will usually be informative to specifically measure accuracy of the particular algorithm on the operational image data collected with actual masks.”