For Federal Healthcare Agencies, a Case for Better AI Outcomes

As officials increasingly turn to algorithms for assistance, they should be wary of tools that make harnessing artificial intelligence look easy.

On Dec. 31, a Toronto-based company called BlueDot—which created and runs a global health monitoring platform driven by artificial intelligence—alerted its government and commercial clients that an unusual form of coronavirus was spreading rapidly in the Chinese port city of Wuhan. 

This was almost a full week before the Centers for Disease Control and Prevention reported the outbreak of what is now known as COVID-19, and nine days before the World Health Organization reported it. 

This anecdote reminds us that artificial intelligence is assuming an ever-larger and more critical role in public health. Private healthcare organizations have been employing artificial intelligence capabilities for years for a wide variety of use cases, and now we see federal civilian and defense healthcare agencies following suit. 

The Department of Veterans Affairs, for example, is employing AI to scan medical records to look for vets at high risk of suicide and to help VA doctors interpret cancer lab results and suggest medications. It has even stood up a new National Artificial Intelligence Institute to carry out AI research and development projects. Likewise, the National Institutes of Health, the Centers for Disease Control and Prevention, the Defense Health Agency, the Food and Drug Administration, and other healthcare-related agencies are similarly exploring and fielding AI tools in support of their many missions. 

But as agencies turn increasingly to algorithms for assistance, there is a risk. It lies in the fact that there is often a seduction with making AI look easy. Many algorithms focus on tasks like object detection, image classification, and text classification, that are easily programmable through common deep learning frameworks, with an abundance of tutorials to show how to build and train these models instantaneously. Off-the-shelf algorithms are typically trained to process a single type or “mode” of data, such as images or text, for example. And the ease and accessibility of these algorithms tempt inexperienced managers to approach every use case with a “unimodal” mindset. 

It is critical to understand that not every AI use case, especially in healthcare, can be whittled down to a single type of data. Healthcare is an extraordinarily complex domain that requires information from many sources. Unimodal algorithms that may have impressive results in a computer lab often flounder when exposed to real-world health data and use cases. This is because unimodal AI is typically limited in its ability to be effective or to “generalize” across a broad range of inputs and applications. Humans, with our innate intelligence, generalize with great ease. We can recognize a cat, for example, regardless of how it may be portrayed in a picture or which breed of cat we are looking at. Conversely, AI algorithms struggle with generalizing because they are typically designed and trained to classify only certain types of data. 

As health agencies adopt AI for applications such as precision medicine, population health, and outcomes evaluation, they should consider aggregating data from multiple sources, such as time series data from Fitbits, textual data from social media, image data from MRIs and X-rays, and columnar data from lab results. Triangulating multiple modes of data produces better results in the form of improved accuracy, more generalizable algorithms, and better insights.

Multi-modal machine learning (MMML) is an area of AI that combines multiple data types to perform real-world tasks. There are many data modalities, including natural language, infrared images, video data, MRI images, IoT streaming data, acoustic signals, and text, to name a few. And within each type, there are numerous sub-modalities.  

MMML solutions have shown improved outcomes in a wide array of use cases, including detecting lung lesions, predicting wellness, diagnosing skin cancers, predicting mild cognitive impairment, predicting pain, and more. Moreover, combining multiple data types help mitigate unintended biases that can be learned from a single data type.

This makes common sense. We know that trained healthcare providers consider multiple perspectives and sources of information every day to make expert decisions. If the AI we develop is intended to perform comparably to these experts, shouldn’t we also build multi-modal AI?

Consider again the BlueDot example. BlueDot’s health monitoring platform uses natural-language processing and machine learning to analyze billions of data points across more than 100 datasets, including about 100,000 articles in 65 languages, animal and plant disease networks, and official proclamations. And because BlueDot also analyzes global airline ticketing data, it also correctly predicted the virus’ spread from Wuhan to Bangkok, Seoul, Taipei, and Tokyo shortly thereafter. The BlueDot platform also used these many data types to correctly predict the Zika outbreak in Florida six months before the first case appeared in July 2016.

In summary, federal and defense healthcare agencies should think carefully before dedicating resources to unimodal AI research or proof-of-concept efforts.  So how might a manager choose which mode of data to use for a given use case? The best answer may sometimes be “all the above.” 

Catherine Ordun is a senior data scientist at Booz Allen Hamilton.