A genome's got to know it's limitations

Written by Rob J | Jan 29, 2013 12:01:59 AM

The UK Government has become the latest believer in the genome. In December 2012, the prime minister of the United Kingdom announced an ambitious plan to fully sequence the genomes of 100,000 Britons with cancer and rare diseases. The PM says¹:

“hundreds of thousands of businesses…are built on top of the Apple App Store, and we want to see the emergence of genomic platforms in the UK that similarly support the emergence of new companies and innovations”

Although we are also believers in genomics, we suspect that it will take the conventional 20 years or so between investment in research and commercial exploitation. We also feel that application of genomics to healthcare delivery will take place by a large number of small steps and not as a big bang. In short, a genome’s got to know its limitations².

The current state-of-the-art reminds me of neuroscience in the 1970s when we were using microelectrodes to read the electrical activity of single neurons. Unfortunately, both the technology of the time and our less than perfect technique often yielded poor results – noisy signals that were a composite of many cells’ asynchronous activity, aptly dubbed “Combined Recording of Action Potentials”. Indeed, efforts of the day to understand brain function through microelectrode studies were considered analogous to investigating the workings of a computer armed with a set of crocodile clips and a voltmeter. The question is, are we doing something very similar in the headlong rush to sequence whole genomes from a myriad (or in this case, ten myriads) of patients? Consider the following complications:

Cancer is dynamic. Taking a sample of tumour and sequencing it gives you a snapshot, and expecting this to provide the basis for understanding disease progression is like cutting a frame out of a movie and trying to predict what happens next. The value of the snapshot is in hypothesis-building, it is not the panacea in itself.
Cancer is highly heterogeneous, even between cells in a single tumour deposit³⁴. Will we be sequencing DNA extracted from >1 cell (think neuroscience C.R.A.P., above) or will we sequence a single cell (in which case, how do we make sure it will be representative of the overall disease)? This may delay or limit the “Molecular Pharmacy” concept which envisions cancer cures achieved by combining several very specific inhibitors to match the genomic profile of a tumour.
Pathways are complex with multiple branches, loops and a high level of redundancy. It’s very tempting to overestimate the potential of gene mutation discoveries, e.g. the 2005 finding of the recurrent unique acquired clonal mutation JAK2V617F in myeloproliferative disease. As a candidate cancer-driving mutation it was logical to create JAK2 inhibitors in the hope for a cure only to find, at best, symptom relief in the clinical setting. Complications lie in mutations upstream and downstream of JAK2 many of which are ‘private’ or unique to an individual tumour. That’s not to underplay the importance of understanding the JAK2 pathway, it’s just that it may take 20 years for that discovery to come to genuine fruition whilst the surrounding biology is unravelled (NB this is a fairly robust rule of thumb: 20 years from scientific discovery to therapeutic application. Think mAbs, kinase inhibitors, gene therapy, etc).
The larger the database, the greater the bioinformatics challenge. A 100,000 whole genome dataset will be so complex, and this complexity will be magnified many fold when other public domain databases are bundled, that there are bound to be vast number of correlations with clinical observations, including many spurious ones. Such data dredging, bias, or confounding, is well known in epidemiological studies⁵, and according to Douglas Merrill, former CIO/VP of Engineering at Google, “With too little data, you won’t be able to make any conclusions that you trust. With loads of data you will find relationships that aren’t real...” Moreover the quality of the database and its curation is crucial, and the detail of the clinical information associated with each patient is pivotal. Smaller, higher quality datasets will be much more valuable and this is where private investment is sensibly focussed.
Lastly, genotype is the key to clinical management and/or drug-hunting currently only in a very small number of situations. The poster child is Kalydeco (launched 23 years after the discovery of the underlying mutation) was molecularly designed to treat patients with the G551D mutation (~4% of cystic fibrosis patients). But most diseases are multi-gene and driven by a complex interplay with environmental and behavioural factors. We have made steps to an age where diseases aren’t diagnosed by phenotype (“breast cancer”) but by their molecular pathology (HER2/neu amplification), but these are still baby steps. In his 10th annual NHGRI Trent Lectureship address, Bert Vogelstien said “the study of genetic alterations in tumors will eventually be able to add quite a bit to standard, conventional histopathological analysis" [emphasis added]. Major progress toward molecular pathology will require not only genomics but also epigenetics (inherited changes in the genome caused by factors other than DNA sequence changes), transcriptomics, proteomics, kinomics and metabolomics, conducted in longitudinal studies in parallel with meticulous clinical characterisation (and banking of samples for all the as yet undiscovered ‘omics studies that future generations will hail as “the key”).

So whilst we should not pack up our tents and move away from genomics, we should integrate it into an intelligent framework. There is already a vast amount of cancer genome sequencing in progress⁶ and we need to make sure that newly funded work is genuinely additive. Let’s not just build ever more spectacular haystacks of data and then send an army of bioinformaticians off to find the needles. Fishing expeditions do not have a great track record in drug discovery (witness the spectacular fail of combinatorial chemistry and high throughput screening). Genomics is no more the single answer than it was when some enthusiasts hailed the 1999 deal between Human Genome Sciences (“they have all the targets”) and Cambridge Antibody Technology (“they have all the drugs”).

The Apple App Store was built within a five year period; genomics “apps” will take a lot longer, maybe 20+ years – after all, the genome’s got to know its limitations.

¹ Foreword to Strategy for UK Life Sciences – One Year On
² With apologies to Magnum Force (1973)
³ Gerlinger et al (2012) NEJM; 366:883-892
⁴ Owens (2012) Nature; 491:27-29
⁵ Davey-Smith (2002) BMJ; 325(7378): 1437–1438.
⁶ http://cancergenome.nih.gov

Download a PDF version:

A genome's got to know it's limitations

View full post