Professor Peter Donnelly FRS
|Research Area:||Genetics and Genomics|
|Technology Exchange:||Bioinformatics, Computational biology, Medical statistics, SNP typing and Statistical genetics|
|Scientific Themes:||Genetics & Genomics|
|Keywords:||Genetics of common human diseases, human recombination, genome-wide association studies, statistical genetics, population genetics and population structure|
The major focus in the Donnelly group is on the development and application of statistical methods for understanding genetic variation, and its association with phenotypic variation and disease susceptibility. These methods typically combine modern computationally-intensive statistical approaches with insights from population genetics models, and aim to get as much information as possible from the large datasets currently being generated by high-throughput experimental techniques.
Much current work involves genome-wide association studies, with Donnelly leading the Wellcome Trust Case Control Consortium (WTCCC), and a subsequent consortium, WTCCC2. These involve collaborations of several hundred scientists studying a range of common diseases. WTCCC was the largest study of its kind. It was responsible for the discovery of many novel genetic associations, and won several major awards and prizes. WTCCC2 will examine DNA samples from about 60,000 individuals with the goal of understanding the genetic basis of susceptibility to 15 human diseases and conditions.
Another research focus concerns human recombination. It had long been known from pedigree studies that recombination rates vary over large scales across chromosomes. More recently, experimental studies and patterns of human genetic variation suggested that most recombination occurs in small (~2kb) sequence regions called recombination hotspots. In collaboration with the McVean and Myers groups, we developed computational statistical methods and applied these to large surveys of human genetic variation to characterise over 30,000 human recombination hotspots, and to identify DNA sequence motifs associated with hotspot activity.
Experimental work in the group is currently principally focussed on natural variation in several bacterial species, and mechanisms for horizontal gene exchange and vaccine escape.
|Professor Gil McVean||Wellcome Trust Centre for Human Genetics||University of Oxford||United Kingdom|
|Professor Simon Myers||Wellcome Trust Centre for Human Genetics||University of Oxford||United Kingdom|
|Professor Chris Holmes||Wellcome Trust Centre for Human Genetics||University of Oxford||United Kingdom|
|Professor Jonathan Marchini||Wellcome Trust Centre for Human Genetics||University of Oxford||United Kingdom|
Fine-scale genetic variation between human populations is interesting as a signature of historical demographic events and because of its potential for confounding disease studies. We use haplotype-based statistical methods to analyse genome-wide single nucleotide polymorphism (SNP) data from a carefully chosen geographically diverse sample of 2,039 individuals from the United Kingdom. This reveals a rich and detailed pattern of genetic differentiation with remarkable concordance between genetic clusters and geography. The regional genetic differentiation and differing patterns of shared ancestry with 6,209 individuals from across Europe carry clear signals of historical demographic events. We estimate the genetic contribution to southeastern England from Anglo-Saxon migrations to be under half, and identify the regions not carrying genetic material from these migrations. We suggest significant pre-Roman but post-Mesolithic movement into southeastern England from continental Europe, and show that in non-Saxon parts of the United Kingdom, there exist genetically differentiated subgroups rather than a general 'Celtic' population. Hide abstract
Streptococcus pneumoniae ('pneumococcus') causes an estimated 14.5 million cases of serious disease and 826,000 deaths annually in children under 5 years of age(1). The highly effective introduction of the PCV7 pneumococcal vaccine in 2000 in the United States(2,3) provided an unprecedented opportunity to investigate the response of an important pathogen to widespread, vaccine-induced selective pressure. Here, we use array-based sequencing of 62 isolates from a US national monitoring program to study five independent instances of vaccine escape recombination(4), showing the simultaneous transfer of multiple and often large (up to at least 44 kb) DNA fragments. We show that one such new strain quickly became established, spreading from east to west across the United States. These observations clarify the roles of recombination and selection in the population genomics of pneumococcus and provide proof of principle of the considerable value of combining genomic and epidemiological information in the surveillance and enhanced understanding of infectious diseases. Hide abstract
Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis. Hide abstract
Ankylosing spondylitis is a common form of inflammatory arthritis predominantly affecting the spine and pelvis that occurs in approximately 5 out of 1,000 adults of European descent. Here we report the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 × 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all our datasets (P < 5 × 10(-6) overall, with support in each of the three datasets studied). We also show that polymorphisms of ERAP1, which encodes an endoplasmic reticulum aminopeptidase involved in peptide trimming before HLA class I presentation, only affect ankylosing spondylitis risk in HLA-B27-positive individuals. These findings provide strong evidence that HLA-B27 operates in ankylosing spondylitis through a mechanism involving aberrant processing of antigenic peptides. Hide abstract
Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases. Hide abstract
Although present in both humans and chimpanzees, recombination hotspots, at which meiotic crossover events cluster, differ markedly in their genomic location between the species. We report that a 13-base pair sequence motif previously associated with the activity of 40% of human hotspots does not function in chimpanzees and is being removed by self-destructive drive in the human lineage. Multiple lines of evidence suggest that the rapidly evolving zinc-finger protein PRDM9 binds to this motif and that sequence changes in the protein may be responsible for hotspot differences between species. The involvement of PRDM9, which causes histone H3 lysine 4 trimethylation, implies that there is a common mechanism for recombination hotspots in eukaryotes but raises questions about what forces have driven such rapid change. Hide abstract
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined approximately 2,000 individuals for each of 7 major diseases and a shared set of approximately 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 x 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 x 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research. Hide abstract