Friday, May 21, 2010

Genomic Data Sets and the Microbiome

Source: Discover Magazine
The human genome consists of roughly 3 billion base pairs, or ~3 GB. Estimates for number of cells in the adult human body vary widely, but a conservative number looks to be ten trillion. That means each cell in the body stores roughly 3 GB of genetic information, and the body as a whole stores ~30 million PBs of genetic information. And thanks to the miracle of mitosis, the data are highly redundant and accurately replicated, while consuming only ~100 watts of power.  But that's just our cells.

The human microbiome is estimated to include 10x cells... your cells, and your genetic data, are the minority of your whole. Understanding how the colony of diverse organisms that is the human body work together is considered critical to understanding health, and in particular the immune system.

Sequencing the microbiome will be critical to this larger picture, and so at some point you may not only carry your own genetic sequence around on a thumb drive, but also the sequences of a thousand or so of the more important bacteria you carry in your gut and elsewhere. As I've been looking at the models for genomic data sets that will be gathered in the next five years, the microbiome will likely play an important role.

The various microbiome sequencing efforts look to have sequenced about a 1000 strains so far. I don't yet know what the size of these data sets are.

