As one of the agencies leading the Obama Administration’s Big Data Research and Development Initiative, the National Institutes of Health (NIH) is investing in big data and cloud computing to advance biomedical research. Specifically, the NIH has provided significant funding to accelerate human genome research with improved big data analysis technologies. This includes a recent $24M annual commitment to establish centers of excellence to increase researcher access to big data technologies.
The immense technical and data demands of genome research make it a compelling case study for big data initiatives. Genomicists have shown increasing interest in developing the ability to comb through, analyze, and manage massive sets of unstructured biological data using Hadoop, the same open-source storage framework used by tech giants Facebook, Twitter, Amazon, and LinkedIn. But rather than using the platform to manage and analyze terabytes of social media data, genome researchers can use it to observe patterns in the 3 billion base pairs that comprise a person’s DNA. Eventually, these patterns can help identify mutations or specific markers for cancerous tumors or genetic diseases, especially as researchers apply this analysis across multitudes of human samples.
DNA genome sequencing has evolved rapidly since the completion of the Human Genome Project: the cost to sequence one person’s DNA fell from $1 million in 2007 to $1,000 in 2012. The NIH centers of excellence hope to contribute to reducing this cost as it continues to fall over time by encouraging greater computing power and better information sharing between researchers. Recently, their efforts include exploring other platforms similar to or built on top of Hadoop that may be better tailored to the needs of biomedical researchers. If successfully implemented and shared, the long-term outcomes are optimistic: McKinsey Global Institute estimates that the health sector could save $300 billion annually if it can effectively utilize big data both in healthcare and research.
Going forward, NIH big data genetics programs could revolutionize scientific collaboration, especially as they develop national and international information sharing infrastructure. For instance, the Frederick National Laboratory, funded by the National Cancer Institute and winner of the 2012 Government Big Data Solutions Award, was recognized for developing prototype infrastructures to help geneticists analyze the relationships between thousands of genes, cancer types, and millions of unique patients. Public sector research programs like these can help establish the technologies needed to make next-generation medical breakthroughs.