Genome analysis reveals how the DNA sequences that make up individual genes differ between individuals, which may contribute to differences in disease risk. However, it remains extremely challenging to accurately predict how any particular DNA change might affect biological function, limiting the clinical interpretation of genetic findings. In a paper published on 8 May in the journal Science, Manuel Rivas and colleagues at Wellcome Trust Centre for Human Genetics, together with Daniel MacArthur (Broad Institute, US), Tuuli Lappalainen (New York Genome Center, US), and collaborators from the Genotype Tissue Expression (GTEx) Consortium, have measured the cellular effects of genetic variants in unprecedented depth.
They documented the impact on gene expression levels of variants that had a high probability of causing proteins to be missing or incomplete. Protein-truncating variants (PTVs) interrupt the sequence of a gene in one of four ways: by introducing a premature ‘stop here’ signal; by adding or deleting a short sequence (indel) so that the genetic code becomes scrambled; by deleting a large section of DNA; or by disrupting critical sites where the RNA message is spliced together once the DNA has been transcribed. Such variants are often associated with disease, but others have no obvious effect. Some can even be beneficial: for example, a splice-disrupting variant in the CARD9 gene provides strong protection against Crohn’s disease and ulcerative colitis.
Cells have a protective mechanism against such truncated proteins called ‘nonsense-mediated decay’ (NMD), which can degrade abnormal RNA messages before they are translated to proteins. This phenomenon, and changes in the content of the RNA message can be detected from RNA sequencing data that the team analysed.
Rivas, MacArthur, Lappalainen and their colleagues took advantage of two large-scale cohort studies that between them had sequenced both DNA and RNA from 635 individuals in order to uncover the functional effects of PTVs. The Geuvadis study combined genome sequencing data to RNA sequencing from cultured white blood cells, while the GTEx projects analyzed DNA sequences and expression data from multiple tissues from the same individuals.
In total the researchers identified 16,286 candidate PTVs. They were able to use the expression data to characterise the likelihood that a particular PTV might disrupt gene expression through NMD and developed a model to predict the outcome from DNA sequence alone. ‘For the first time we’ve been able to quantify how efficient this process is across multiple tissues’, says Rivas. ‘We find evidence that 30 per cent of the time predicted PTVs may escape NMD.’ He and his colleagues were also able to document that in almost 20 per cent of cases the effect of a PTV was specific to one or a subset of tissues.
They also established that where a PTV knocked out expression in one of a pair of alleles, the complementary undamaged allele did not increase its expression to compensate – suggesting that in many cases, the ability of humans to tolerate the loss of one copy of a gene is driven by complex buffering processes rather than simply increasing expression of the remaining copy.
Finally they took a close look at PTVs at the critical sites at either side of a splice junction, as well as those nearby. In addition to confirming that ‘essential splice sites’ had a high probability of disrupting splicing, they were able to quantitate the impact of variants further from the junction.
'We are able to go beyond DNA sequence alone and begin to understand the cellular consequences of this important class of genetic variation', says Lappalainen. 'This work shows how "personal transcriptomics" – measuring gene expression in individuals – could become an important complement to genome analysis in the clinic, improving our diagnosis of a wide range of rare diseases,' concludes MacArthur.