How Effective are Neural Networks for Object Recognition?

A few weeks ago, The New Yorker magazine published a story titled “Total Recall,” in which one of their correspondents, Patrick Radden Keefe, traveled to England to interview a team of “super-recognizers.” While most humans have spots of trouble putting names to faces, super-recognizers have an uncanny ability to recognize human faces. The city of London is known for a relatively high number of security cameras, and the city’s Metropolitan Police Service has begun employing these super-recognizers to comb through footage of unsolved crimes. The results have been successful, and other police departments around the world are now considering similar tactics. When asked about the possibility of a computer program aiding in the process of facial recognition, the idea was entirely dismissed by several of these super-recognizers.

Although technology is not replacing super-recognizers yet, researchers from the Center for Data Science have been working to bridge the gap between human and machine visual recognition for quite some time, and with promising results.

In 2014, three researchers from NYU—Avi Ziskind, a former Postdoctoral Researcher at NYU’s Psychology department, Yann LeCunn, from the Center for Data Science, and Denis Pelli, from NYU’s Computer Science Department—gave a presentation titled, “Two Machine-Learning Models of Object Recognition Exhibit Key Feature of Human Performance” at the 2014 Moore-Sloan Data Science Initiative Launch Event. They presented their research on two machine-learning models that had been trained to exhibit human-like levels of object recognition.

The first model was a convolutional neural network, a type of model that is loosely based on the ways in which the human brain functions. The second was a texture statistics model, which measures the probability that a given image matches a previously known image. When given pieces of text, the models displayed two hallmarks of human recognition: an understanding of both spatial frequency and font complexity.

In The New Yorker article, the possibility of computer-based facial recognition was partially dismissed because super-recognizers often deal with grainy footage, or images that are poorly lit. But the models developed by Ziskind, LeCunn, and Pelli were well equipped to deal with visual noise, at least in the case of text.

The two graphs below measure the neural network’s performance against a human observer. The network was trained to accommodate for two types of visual noise which are also present in human vision: white noise and 1/f noise. When trained for both types of noises, the threshold curve for the neural network was remarkably similar to the threshold curve for human vision, and in some cases, the neural network exhibited a higher threshold for recognizing text.

Screen Shot 2016-09-09 at 6.59.42 PM

The texture statistics model did not perform as closely to its human vision counterpart, but still performed well overall.

Screen Shot 2016-09-09 at 7.01.23 PM

While text analysis is a different beast than facial recognition, the core concepts are not fundamentally different. The work of Ziskind, LeCunn, and Pelli shows that computer facial recognition may be much closer closer than we think.