Out of sequence: insertions and deletions in the human genome drive human diversity

An international consortium led by researchers at the Wellcome Trust Centre for Human Genetics and their colleagues at the Stanford University School of Medicine in California have made a detailed analysis of a form of genetic variation called short insertions and deletions, or indels. Their findings, just published in the journal Genome Research, may help us to understand how some of these missing or extra pieces of genetic sequence have an impact on disease, and may also underlie key events in human evolution.

Since the completion of the reference human genome sequence in 2003, geneticists have been hard at work compiling databases of variation – places in the genome where not everyone has exactly the same spelling of the genetic code. Such variants may be associated with increased or decreased risk of disease, and also help us to understand the history of human diversity. The most common form of variation is a single letter change – from an A to a T, for example – known as a single nucleotide polymorphism, or SNP. The second most common form of variation is the indel, a short sequence of additional or missing letters. Unlike SNPs, until now indels had not been systematically catalogued, although they are implicated in many hereditary conditions.

Taking advantage of data from the international Thousand Genomes Project, which has sequenced the complete genomes of 1092 people from four continents, a team led by Stephen Montgomery, an assistant professor of pathology and of genetics at Stanford, and Gerton Lunter, a group leader in statistical genetics at WTCHG, has identified 1.6m indels from 179 individuals representing three diverse human populations. Their most striking finding is that almost half of the indels occur in ‘hotspots’ that make up only 4 per cent of the genome. In the rest of the genome, indels are relatively uncommon, occurring at only around 6 per cent of the typical rate for SNPs.

Understanding indels will be essential to physicians and others as they seek to interpret individual genome sequences. They are already known to cause inherited disease: a prominent example is the neurological condition Huntington’s disease, which is caused when the number of CAG repeats in a key gene expands from fewer than 27 to more than 40.

 ‘With a rich catalogue of indels, we are now able to identify frequently mutated genes and implicate these variants as causal agents that influence gene expression and complex disorders’, says Montgomery. Analysis of the patterns of occurrence of indels also helped to clarify how they arise through failure to copy DNA strands faithfully during replication. ‘Having a comprehensive and uniform collection of indels has enabled us to better understand the evolutionary pressures that shape this important class of human variation’, says Lunter, ‘and to refine our understanding of the evolution and structure of the human genome.’

(ends)

Montgomery, S.B.*, Goode, D.*, Kvikstad, E.*, Albers, C.A., Zhang, Z., Xinmeng, J.M., Ananda, G., Howie, B., Karczewski, K.J., Smith, K.S., Anaya, V., Richardson, R., Davis, J., The 1000 Genomes Pilot Project Consortium, MacArthur, D.G., Sidow, A., Duret, L., Gerstein, M., Makova, K.D., Marchini, J., McVean, G., Lunter, G. The origin, evolution and impact of short insertion-deletion variants identified in 179 human genomes. Genome Research (epub Mar 11, 2013) *co-first authors