by Michael Shanler | January 17, 2014 | Comments Off on Big (Scientific) Data and the $1,000 human genome
This week, there was a milestone announcement in the world of scientific instrumentation: The CEO of Illumina, a manufacturer of sequencing systems, announced the launch of their new “HiSeq X” sequencing system at a JP Morgan conference. Each HiSeqX is designed to process up to 20,000 genomes a year at a cost of $1,000 per test. To put this milestone in test costs in perspective, only a decade ago, it cost nearly $100M per test and processing all the data took months using dedicated infrastructures and data scientists. They also announced the availability of a lower cost, benchtop sequencing system, the “NextSeq 500.” (Note, Ion Torrent, acquired by LIFE Technologies, had prematurely teased the market with a $1,000 per sequence achievement back in 2012, but they experienced commercialization delays, however, their $1,000/test is expected to be commercially available in 2014.)
These new, higher throughput systems (comprised of hardware, software, services) will begin to find their way into research centers over the next few years. With the wider instrument availability and the new lower price point, scientists and doctors will leverage genomics data for furthering research programs, identifying better medicines, developing a deeper understanding of extremely complex diseases, and exploring genetic variations.
Many infrastructures in science are already bursting at the seams as it relates to genomics data. Handling sequencing processes requires a variety of technologies to support the scientific and health endeavors. To name a few: storage, orchestration, HPC, data pipes, KM links, scientific software, collaboration, advanced data visualization and analytics tools- will increasingly be pushed into the hands of more scientists.
It also means that making sense of all the data from human genome sequencing is turning into a “big (scientific) data” problem:
- High amounts of compressed and uncompressed data (high volume)
- Variable pipes needed for analysis and crunching the data (high velocity)
- A wide array of data formats, compression standards, and reporting methods (high variety)
- Data analysis programs that have “extreme” multi-variable relationships (high complexity)
Making sure the right IT infrastructure is in place for handling this “big (scientific) data” will require IT groups to visit their cloud, security, regulatory, and purchasing policies. It means building partnerships with the scientists and collaborators to help drive innovation and collaborative infrastructures.
The $1,000 per test science “milestone” is increasingly being viewed as a potential “hurdle” within IT groups. How can the data process capabilities and the processes associated with science scale? Although, the unfolding data scenario presents a challenge, IT groups should resist defensive posturing and be proactive about learning what this means to the organization.. Strong R&D IT for supporting the science will be mandatory. Scaling knowledgeable insights for human sequencing data effectively requires a roadmap that fits the business and scientific needs.
Your future health depends on it.
View Free, Relevant Gartner Research
Gartner's research helps you cut through the complexity and deliver the knowledge you need to make the right decisions quickly, and with confidence.Read Free Gartner Research
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.