Today, it recognizes handwriting; tomorrow, it may vastly improve the military’s surveillance and targeting efforts.
A computer program, funded in large part by the U.S. military, has displayed the ability to learn and generate new ideas as quickly and accurately as can a human. While the scope of the research was limited to understanding handwritten characters, the breakthrough could have big consequences for military’s ability to collect, analyze and act on image data, according to the researchers and military scientists. That, in turn, could lead to far more capable drones, far faster intelligence collection, and far swifter targeting through artificial intelligence.
You could be forgiven for being surprised that computers are only now catching up to humans in their ability to learn. Every day, we are reminded that computers can process information of enormous volume at the speed of light, while we are reliant on slow, chemical synaptic connections. But take the simple task of recognizing an object: a face. Facebook’s DeepFace program can recognize faces about as well a human, but in order to do that, it had to learn from a dataset of more than 4 million images of 4,000 faces. Humans, generally speaking, have the ability to remember a face after just one encounter. We learn after “one shot,” so to speak.
In their paper, “Human-level Concept Learning Through Probabilistic Program Induction,” published today in the journal Science, Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum, present a model that they call the Bayesian Program Learning framework. BPL, they write, can classify objects and generate concepts about them using a tiny amount of data — one single instance.
To test it, they showed several people —and BPL — 20 handwritten letters from 10 different alphabets, then asked them to match the letter to the same character written by someone else. BPL scored 97%, about as well as the humans and far better than other algorithms. For comparison, a deep (convolutional) learning model scored about 77%, while a model t designed for “one-shot” learning reached 92% — still around twice the error rate of humans and BPL.
BPL also passed a visual form of the Turing Test by drawing letters that most humans couldn’t distinguish from a human’s handwriting. (Named after British mathematician Alan Turing, a Turing Test challenges an program’s ability to produce an intellectual product — teletype communication in the most traditional sense — that is indistinguishable from what a human could produce.)
“I think for the more creative tasks — where you ask somebody to draw something, or imagine something that they haven’t seen before, make something up — I don’t think that we have a better test,” Tenenbaum told reporters on a conference call. “That’s partly why Turing proposed this. He wanted to test the more flexible, creative abilities of the human mind. Why people have long been drawn to some kind of Turing test.”
Achieving humanesque learning rates in computer systems is a decades-old goal. One early demonstration of the concept appeared in Gail A. Carpenter and Stephen Grossberg’s 1997 (submitted) paper on what was then called adaptive resonance theory, or ART. Their model, based on a neural network, learned quickly but was limited to simple pattern recognition
By contrast, BPL can infer a causal reason why the data is the way that it is. It displays a more human-like ability to learn quickly and also to generate concepts, two abilities that are intimately linked. “It addresses a long-standing machine learning problem. How do you develop a learning …and an intelligence system that can both extract meaningful representations from complex data as well as successfully learn to transfer those representations to learning new concepts?” said Salakhutdinov.
It’s a question in which the military is keenly interested. The relevance of the work was “not lost on multiple defense funding agencies who contributed to funding ... either directly or indirectly in some of the prior foundations at the lab,” said Tenenbaum.
Tenenbaum and his partners weren’t working on direct military applications, yet they received funding from the Air Force Office of Scientific Research, the Office of Naval Research, the Army Research Office, the Defense Advanced Research Projects Agency, or DARPA, and the Intelligence Advanced Research Agency, or IARPA. The work also received funding from the National Science Foundation and from private companies.
Why so much interest? “The key to the ‘Third Offset Strategy’ as presented by Deputy Secretary of Defense Bob Work is ‘human machine collaboration’ – so absolutely any breakthroughs in machine learning that advance the combined ability of our human machine collaborative solutions are extremely important. This specifically enhances the development of robust decision aids,” wrote Steven K. Rogers, the senior scientist for automatic target recognition and sensor fusion at the Air Force Research Laboratory, in an email to Defense One.
Consider the plight of the sensor operator on a drone team flying combat air patrols over, say, Afghanistan. Today, such a team might fly for 6,000 hours before striking a specific target. That’s time spent watching, waiting, collecting intelligence, and making determinations about what people on the ground are doing before finally launching a Hellfire missile. A machine that can recognize objects and, more importantly, behaviors could help with that.
If you ask Pentagon leaders, they’ll say that they have no interest in leaving the ultimate kill decision to drones. But a computer program that could do some of the watching, waiting, categorizing, and tagging could reduce strain on operators and perhaps enable even more patrols, intelligence-gathering, and targeted strikes.
It takes a lot of people to make meaning of “streaming multi-domain sensory data,” he said. “Any breakthroughs in those areas will help us address our challenges – certainly automated processes to recognize entities and relationships in the streaming data is a key capability,” Rogers said.
“Meaning making” in the context of sensor operators for drones equals understanding what the subject, say a fellow on the roadside outside of Mosul, is doing. Is he digging a hole for an IED or planting vegetables? If he meets another man and embraces him, is he transferring weapons or welcoming home his son? It’s the sort of categorization job that involves some ability to place yourself in the man’s shoes, and ask obvious questions. “If I were an insurgent, would I bury an IED here? If I were farmer, would I be here at all?”
Today, the most sophisticated military object recognition software, such as the Visual Profiler program from Israeli-based Video Inform, can classify a variety of objects from satellite images. But they can’t make sense of what people on the ground are doing.
Object recognition in the commercial sector isn’t much better. “In the public literature it is clear that under well constrained environmental conditions machines can perform close to human levels in some tasks – but the variability and extended operating conditions we face in the military make them not suitable for consideration in replacing Airmen in general,” said Rogers.
The BPL, too, is not a program that will replace airmen tomorrow. Its more immediate applications lie in improving the interface with smartphones. Tenenbaum conceded that it is “a huge leap” from classifying letters to divining human actions via satellite and drone imagery. “A lot of things will have to be done that involve expertise that none of us have” on the research team, he said.
But a huge leap is not an impossible one. To classify styles and types of writing, the BPL program had to learn something about the intent of the writer in order to arrive at the best possible route for reproducing the written character. It’s an ability that could—with time, research, and the input of experts—lead to an AI system that could function like a human intelligence analyst, a system capable of sharing mindspace with a would-be insurgent.
“What’s distinctive about the way our system looks at handwritten characters or the way a similar type of system looks at speech recognition or speech analysis is that it does see it, in a sense, as a sort of intentional action ... When you see a handwritten character on the page what your mind does, we think, [and] what our program does, is imagine the plan of action that led to it, in a sense, see the movements, that led to the final result,” said Tenanbaum. “The idea that we could make some sort of computational model of theory of mind that could look at people’s actions and work backwards to figure out what were the most likely goals and plans that led to what [the subject] did, that’s actually an idea that is common with some of the applications that you’ve pointed to. … We’ve been able to study [that] in a much more simple and, therefore, more immediately practical and actionable way with the paper that we have here.”