Genome project to require Google-like computing power

An ambitious project with the goal of producing a more detailed understanding of the link between genetic variations and susceptibility to disease will require an unprecedented amount of computing power and terabytes of data storage, according to the leaders of the project.

The 1,000 Genomes Project, announced earlier this week by an international consortium that includes the National Human Genome Research Institute, part of the National Institutes of Health, plans to examine over a three-year period the human genome at a level of detail never before accomplished.

The project "will greatly expand and further accelerate efforts to find more of the genetic factors involved in human health and disease," said Richard Durbin, deputy director of the Wellcome Trust Sanger Institute in Cambridge, England.

Francis Collins, director of the research institute, said the project will lead to a fivefold increase in the sensitivity of disease discovery efforts across the human genome.

Any two humans are more than 99 percent similar at the genetic level, but the fractional differences can help determine susceptibility to disease and how the body will respond to drugs. The goal of the project is to produce a catalog of variants that are present at 1 percent or greater frequency in the human population across most of the genome. That requires the project to sequence the genes of at least 1,000 people.

The project plans to sequence 8.2 billion DNA base pairs a day -- or the equivalent of more than two human genomes every 24 hours -- during its two-year production phase, for a total of 6 trillion DNA bases, said Gil McVean, co-chair of the analysis committee and professor of mathematical genetics at the University of Oxford.

Managing this massive amount of data will require novel computational methods. Gonçalo Abecasis, a professor of applied statistics and a geneticist who works at the Center for Statistical Genetics at the University of Michigan, said the data produced by the genome project will be so immense that the only process that he can think of that is similar in scope is the search engine Google, which manages billions of Web searches daily.

If the project had to start crunching all the sequence data today, Abecasis estimated it would take a supercomputer with 10,000 massively parallel processors. But, he said, the project is working to develop algorithms and mathematical and computational models that should reduce the computing requirements.

Because the genomes of most people are mostly similar, Abecasis said he is working on models and algorithms designed to process and crunch the fractional differences, much like the way video compression algorithms function when processing power is applied to objects that move and not to static background objects.

The models still are being developed, but the project will require supercomputers to manipulate the data but need far fewer than 10,000 processors, Abecasis said.

The Beijing Genomics Institute in Shenzhen, China, is the other key research organization participating in the project and will perform sequencing along with the Wellcome Trust Sanger Institute and its large-scale sequencing network. That network includes the Broad Institute of MIT and Harvard, the Washington University Genome Sequencing Center at the Washington University School of Medicine in St. Louis, and the Human Genome Sequencing Center at the Baylor College of Medicine in Houston.

Stay up-to-date with federal news alerts and analysis — Sign up for GovExec's email newsletters.
Close [ x ] More from GovExec

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Sponsored by Brocade

    Best of 2016 Federal Forum eBook

    Earlier this summer, Federal and tech industry leaders convened to talk security, machine learning, network modernization, DevOps, and much more at the 2016 Federal Forum. This eBook includes a useful summary highlighting the best content shared at the 2016 Federal Forum to help agencies modernize their network infrastructure.

  • Sponsored by CDW-G

    GBC Flash Poll Series: Merger & Acquisitions

    Download this GBC Flash Poll to learn more about federal perspectives on the impact of industry consolidation.

  • Sponsored by One Identity

    One Nation Under Guard: Securing User Identities Across State and Local Government

    In 2016, the government can expect even more sophisticated threats on the horizon, making it all the more imperative that agencies enforce proper identity and access management (IAM) practices. In order to better measure the current state of IAM at the state and local level, Government Business Council (GBC) conducted an in-depth research study of state and local employees.

  • Sponsored by Aquilent

    The Next Federal Evolution of Cloud

    This GBC report explains the evolution of cloud computing in federal government, and provides an outlook for the future of the cloud in government IT.

  • Sponsored by Aquilent

    A DevOps Roadmap for the Federal Government

    This GBC Report discusses how DevOps is steadily gaining traction among some of government's leading IT developers and agencies.

  • Sponsored by LTC Partners, administrators of the Federal Long Term Care Insurance Program

    Approaching the Brink of Federal Retirement

    Approximately 10,000 baby boomers are reaching retirement age per day, and a growing number of federal employees are preparing themselves for the next chapter of their lives. Learn how to tackle the challenges that today's workforce faces in laying the groundwork for a smooth and secure retirement.

  • Sponsored by Hewlett Packard Enterprise

    Cyber Defense 101: Arming the Next Generation of Government Employees

    Read this issue brief to learn about the sector's most potent challenges in the new cyber landscape and how government organizations are building a robust, threat-aware infrastructure

  • Sponsored by Aquilent

    GBC Issue Brief: Cultivating Digital Services in the Federal Landscape

    Read this GBC issue brief to learn more about the current state of digital services in the government, and how key players are pushing enhancements towards a user-centric approach.

  • Sponsored by CDW-G

    Joint Enterprise Licensing Agreements

    Read this eBook to learn how defense agencies can achieve savings and efficiencies with an Enterprise Software Agreement.

  • Sponsored by Cloudera

    Government Forum Content Library

    Get all the essential resources needed for effective technology strategies in the federal landscape.


When you download a report, your information may be shared with the underwriters of that document.