Face recognition algorithms produce different rates of accuracy based on sex, age, and race or country of birth, although algorithms that are more accurate generally produce fewer errors, according to a new report by the National Institute of Standards and Technology (NIST).
NIST says the study on demographic effects is the “first of-its-kind and is the third report so far by NIST under its Face Recognition Vendor Test effort. It concludes that “We found empirical evidence for the existence of demographic differentials in the majority of contemporary face algorithms that we evaluated.”
The report makes two broad conclusions about the types of false matching algorithms generate, saying there are more false positives than false negatives. False positives refer to getting a positive match with samples of two different persons while false negatives show a failure to get a match for an individual when two images of the same person are used.
NIST says false positives “occur when the similarity between two photos is low, reflecting either some change in the person’s appearance or in the image properties” while false positives “occur when the digitized faces of two people are similar.”
The report has been expected and is of interest to lawmakers, privacy and civil liberties advocates, algorithm developers, and users of the face recognition technology, which is being rolled out in government identity programs such as the Department of Homeland Security efforts to identity and verify people entering and exiting the U.S. and traveling through security checkpoints at airports.
The study points to recent reports and media stories about biases in face recognition technology but cautions that such reporting should discuss the algorithm that was evaluated.
For false positives, the report says they there were more for women than men—two to five times higher—which was true “across algorithms and datasets,” NIST says. For false positives related to race, the study finds the rates are highest with people from West and East Africa and East Asia, and slightly less with South Asians and Central Americans with the lowest rates generally from East Europeans.
Using U.S. law enforcement images, the study says the highest false positives occur with American Indians while rates are elevated with African Americans and Asians, noting that “the relative ordering depends on sex and varies with algorithm.”
The report says that “a number of algorithms developed in China” show better results for false positives for East Asians.
When it comes to age, NIST says false positives are higher with the elderly and children and lowest with middle-aged adults.
With respect to false negative results, the report says the particular algorithm used can make a big difference, from error rates below a half-percent to greater than 10 percent.
“For the more accurate algorithms, false negative rates are usually low with average demographic differentials being, necessarily, smaller still,” NIST says. “This is an important result: use of inaccurate algorithms will increase the magnitude of false negative differentials.”
The report also adds that in real-time cases, which would be those where someone is allowing their image to be captured, a second image capture can rectify an initial false negative.
Customs and Border Protection, a DHS component, and its airport and airline partners are rolling out face recognition matching for people departing the U.S. on international flights and arriving to the country on international flights. The Transportation Security Administration, another DHS component, is also evaluating the technology at some aviation security checkpoints.
CBP maintains that its accuracy rates for matching travelers exiting the country are around 98 percent.
False negative rates found with images captured for border crossings are “generally higher in individuals born in Africa and the Caribbean, the effect being strong in older individuals,” the report says.
Using mugshots in the U.S., negative results are higher with Asians and American Indians, with the error rates higher than those of people with white and black faces, it says, adding that the lowest false negative rates occur in black faces.
Error rates are also usually higher in women and younger children, especially with mugshots, but the report says “there are many exceptions to this, so universal statements pertaining to algorithms false negative rates across sex and age are not supported.”
NIST says the quality of a photo is important in whether false negative rates are higher or lower. Photos acquired when someone is applying for a credential or benefit and compared to other “application” photos produce “very low” error rates and it becomes challenging to measure demographic differences, it says.
“This implies that better image quality reduces false negative rates and differentials,” NIST says.
NIST also says that when higher quality application photos are compared with lower quality border crossing photos, false negative rate are higher, particularly with women, although “the differentials are smaller and not consistent.”
For the report, NIST evaluated 189 algorithms from 99 developers and used nearly 18.3 million images of 8.5 million people. NIST is a part of the Department of Commerce.