Dr Chris Spencer

Research Area: Genetics and Genomics
Technology Exchange: Bioinformatics, Medical statistics, SNP typing, Statistical genetics and Transcript profiling
Scientific Themes: Genetics & Genomics and Immunology & Infectious Disease
Keywords: Genome-wide association studies, Population genetics and Bayesian inference
Web Links:

As a statistical geneticist I'm interested in a diverse range of problems, from population genetics to association analysis of clinical phenotypes. My work currently has a focus on African genetics and susceptibility to malaria which is conducted within the MalariaGEN consortium. I’m also interested in using genetics to improve health and healthcare via stratified medicine, currently in the context of hepatitis C infection, as part of the STOP-HCV consortium.

To facilitate analysis across phenotypes and populations we are developing and applying new methodology. We hope to quantify the role that infectious diseases have had on shaping our immune system and physiology through natural selection. By inferring the genetic determinants of host-parasite interactions we aim to better understand the underlying biology and to use the information to target prevention and treatment of disease. 

Name Department Institution Country
Professor Dominic Kwiatkowski Wellcome Trust Centre for Human Genetics Oxford University, Henry Wellcome Building of Genomic Medicine United Kingdom
Professor Ellie (Eleanor) Barnes Experimental Medicine Division Oxford University, Peter Medawar Building United Kingdom
Professor Peter Donnelly FRS Wellcome Trust Centre for Human Genetics Oxford University, Henry Wellcome Building of Genomic Medicine United Kingdom
Professor Stephen Sawcer Department of Clinical Neurosciences, University of Cambridge United Kingdom
Professor Gil McVean FRS FMedSci Wellcome Trust Centre for Human Genetics Oxford University, Henry Wellcome Building of Genomic Medicine United Kingdom
Busby GBJ, Band G, Le QS, Jallow M, Bougama E, Mangano VD, Amenga-Etego LN, Enimil A, Apinjoh T, Ndila CM et al. 2016. Admixture into and within sub-Saharan Africa eLife, 5 (JUN2016), | Show Abstract | Read more

© Busby et al.Similarity between two individuals in the combination of genetic markers along their chromosomes indicates shared ancestry and can be used to identify historical connections between different population groups due to admixture. We use a genome-wide, haplotype-based, analysis to characterise the structure of genetic diversity and gene-flow in a collection of 48 sub-Saharan African groups. We show that coastal populations experienced an influx of Eurasian haplotypes over the last 7000 years, and that Eastern and Southern Niger-Congo speaking groups share ancestry with Central West Africans as a result of recent population expansions. In fact, most sub-Saharan populations share ancestry with groups from outside of their current geographic region as a result of gene-flow within the last 4000 years. Our in-depth analysis provides insight into haplotype sharing across different ethno-linguistic groups and the recent movement of alleles into new environments, both of which are relevant to studies of genetic epidemiology.

Kenyan Bacteraemia Study Group, Wellcome Trust Case Control Consortium 2 (WTCCC2), Rautanen A, Pirinen M, Mills TC, Rockett KA, Strange A, Ndungu AW, Naranbhai V, Gilchrist JJ et al. 2016. Polymorphism in a lincRNA Associates with a Doubled Risk of Pneumococcal Bacteremia in Kenyan Children. Am J Hum Genet, 98 (6), pp. 1092-1100. | Show Abstract | Read more

Bacteremia (bacterial bloodstream infection) is a major cause of illness and death in sub-Saharan Africa but little is known about the role of human genetics in susceptibility. We conducted a genome-wide association study of bacteremia susceptibility in more than 5,000 Kenyan children as part of the Wellcome Trust Case Control Consortium 2 (WTCCC2). Both the blood-culture-proven bacteremia case subjects and healthy infants as controls were recruited from Kilifi, on the east coast of Kenya. Streptococcus pneumoniae is the most common cause of bacteremia in Kilifi and was thus the focus of this study. We identified an association between polymorphisms in a long intergenic non-coding RNA (lincRNA) gene (AC011288.2) and pneumococcal bacteremia and replicated the results in the same population (p combined = 1.69 × 10(-9); OR = 2.47, 95% CI = 1.84-3.31). The susceptibility allele is African specific, derived rather than ancestral, and occurs at low frequency (2.7% in control subjects and 6.4% in case subjects). Our further studies showed AC011288.2 expression only in neutrophils, a cell type that is known to play a major role in pneumococcal clearance. Identification of this novel association will further focus research on the role of lincRNAs in human infectious disease.

Benner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. 2016. FINEMAP: efficient variable selection using summary data from genome-wide association studies Bioinformatics, 32 (10), pp. 1493-1501. | Show Abstract | Read more

© 2016 The Author 2016. Published by Oxford University Press.Motivation: The goal of fine-mapping in genomic regions associated with complex diseases and traits is to identify causal variants that point to molecular mechanisms behind the associations. Recent fine-mapping methods using summary data from genome-wide association studies rely on exhaustive search through all possible causal configurations, which is computationally expensive. Results: We introduce FINEMAP, a software package to efficiently explore a set of the most important causal configurations of the region via a shotgun stochastic search algorithm. We show that FINEMAP produces accurate results in a fraction of processing time of existing approaches and is therefore a promising tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing projects. Availability and implementation: FINEMAP v1.0 is freely available for Mac OS X and Linux at http://www.christianbenner.com.

Pedergnana V, Smith D, STOP-HCV Consortium, Klenerman P, Barnes E, Spencer CC, Ansari MA. 2016. Interferon lambda 4 variant rs12979860 is not associated with RAV NS5A Y93H in hepatitis C virus genotype 3a. Hepatology, 64 (4), pp. 1377-1378. | Read more

Busby GB, Band G, Si Le Q, Jallow M, Bougama E, Mangano VD, Amenga-Etego LN, Enimil A, Apinjoh T, Ndila CM et al. 2016. Admixture into and within sub-Saharan Africa. Elife, 5 | Show Abstract | Read more

Similarity between two individuals in the combination of genetic markers along their chromosomes indicates shared ancestry and can be used to identify historical connections between different population groups due to admixture. We use a genome-wide, haplotype-based, analysis to characterise the structure of genetic diversity and gene-flow in a collection of 48 sub-Saharan African groups. We show that coastal populations experienced an influx of Eurasian haplotypes over the last 7000 years, and that Eastern and Southern Niger-Congo speaking groups share ancestry with Central West Africans as a result of recent population expansions. In fact, most sub-Saharan populations share ancestry with groups from outside of their current geographic region as a result of gene-flow within the last 4000 years. Our in-depth analysis provides insight into haplotype sharing across different ethno-linguistic groups and the recent movement of alleles into new environments, both of which are relevant to studies of genetic epidemiology.

Band G, Rockett KA, Spencer CCA, Kwiatkowski DP, Band G, Le QS, Clarke GM, Kivinen K, Leffler EM, Rockett KA et al. 2015. A novel locus of resistance to severe malaria in a region of ancient balancing selection NATURE, 526 (7572), pp. 253-+. | Show Abstract | Read more

© 2015 Macmillan Publishers Limited. All rights reserved.The high prevalence of sickle haemoglobin in Africa shows that malaria has been a major force for human evolutionary selection, but surprisingly few other polymorphisms have been proven to confer resistance to malaria in large epidemiological studies. To address this problem, we conducted a multi-centre genome-wide association study (GWAS) of life-threatening Plasmodium falciparum infection (severe malaria) in over 11,000 African children, with replication data in a further 14,000 individuals. Here we report a novel malaria resistance locus close to a cluster of genes encoding glycophorins that are receptors for erythrocyte invasion by P. falciparum. We identify a haplotype at this locus that provides 33% protection against severe malaria (odds ratio= 0.67, 95% confidence interval= 0.60-0.76, P value= 9.5× 10-11) and is linked to polymorphisms that have previously been shown to have features of ancient balancing selection, on the basis of haplotype sharing between humans and chimpanzees. Taken together with previous observations on the malaria-protective role of blood group O, these data reveal that two of the strongest GWAS signals for severe malaria lie in or close to genes encoding the glycosylated surface coat of the erythrocyte cell membrane, both within regions of the genome where it appears that evolution has maintained diversity for millions of years. These findings provide new insights into the host-parasite interactions that are critical in determining the outcome of malaria infection.

Miotto O, Amato R, Ashley EA, MacInnis B, Almagro-Garcia J, Amaratunga C, Lim P, Mead D, Oyola SO, Dhorda M et al. 2015. Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat Genet, 47 (3), pp. 226-234. | Show Abstract | Read more

We report a large multicenter genome-wide association study of Plasmodium falciparum resistance to artemisinin, the frontline antimalarial drug. Across 15 locations in Southeast Asia, we identified at least 20 mutations in kelch13 (PF3D7_1343700) affecting the encoded propeller and BTB/POZ domains, which were associated with a slow parasite clearance rate after treatment with artemisinin derivatives. Nonsynonymous polymorphisms in fd (ferredoxin), arps10 (apicoplast ribosomal protein S10), mdr2 (multidrug resistance protein 2) and crt (chloroquine resistance transporter) also showed strong associations with artemisinin resistance. Analysis of the fine structure of the parasite population showed that the fd, arps10, mdr2 and crt polymorphisms are markers of a genetic background on which kelch13 mutations are particularly likely to arise and that they correlate with the contemporary geographical boundaries and population frequencies of artemisinin resistance. These findings indicate that the risk of new resistance-causing mutations emerging is determined by specific predisposing genetic factors in the underlying parasite population.

Moutsianas L, Jostins L, Beecham AH, Dilthey AT, Xifara DK, Ban M, Shah TS, Patsopoulos NA, Alfredsson L, Anderson CA et al. 2015. Class II HLA interactions modulate genetic risk for multiple sclerosis. Nat Genet, 47 (10), pp. 1107-1113. | Show Abstract | Read more

Association studies have greatly refined the understanding of how variation within the human leukocyte antigen (HLA) genes influences risk of multiple sclerosis. However, the extent to which major effects are modulated by interactions is poorly characterized. We analyzed high-density SNP data on 17,465 cases and 30,385 controls from 11 cohorts of European ancestry, in combination with imputation of classical HLA alleles, to build a high-resolution map of HLA genetic risk and assess the evidence for interactions involving classical HLA alleles. Among new and previously identified class II risk alleles (HLA-DRB1*15:01, HLA-DRB1*13:03, HLA-DRB1*03:01, HLA-DRB1*08:01 and HLA-DQB1*03:02) and class I protective alleles (HLA-A*02:01, HLA-B*44:02, HLA-B*38:01 and HLA-B*55:01), we find evidence for two interactions involving pairs of class II alleles: HLA-DQA1*01:01-HLA-DRB1*15:01 and HLA-DQB1*03:01-HLA-DQB1*03:02. We find no evidence for interactions between classical HLA alleles and non-HLA risk-associated variants and estimate a minimal effect of polygenic epistasis in modulating major risk alleles.

Shelton JM, Corran P, Risley P, Silva N, Hubbart C, Jeffreys A, Rowlands K, Craik R, Cornelius V, Hensmann M et al. 2015. Genetic determinants of anti-malarial acquired immunity in a large multi-centre study. Malar J, 14 (1), pp. 333. | Show Abstract | Read more

BACKGROUND: Many studies report associations between human genetic factors and immunity to malaria but few have been reliably replicated. These studies are usually country-specific, use small sample sizes and are not directly comparable due to differences in methodologies. This study brings together samples and data collected from multiple sites across Africa and Asia to use standardized methods to look for consistent genetic effects on anti-malarial antibody levels. METHODS: Sera, DNA samples and clinical data were collected from 13,299 individuals from ten sites in Senegal, Mali, Burkina Faso, Sudan, Kenya, Tanzania, and Sri Lanka using standardized methods. DNA was extracted and typed for 202 Single Nucleotide Polymorphisms with known associations to malaria or antibody production, and antibody levels to four clinical grade malarial antigens [AMA1, MSP1, MSP2, and (NANP)4] plus total IgE were measured by ELISA techniques. Regression models were used to investigate the associations of clinical and genetic factors with antibody levels. RESULTS: Malaria infection increased levels of antibodies to malaria antigens and, as expected, stable predictors of anti-malarial antibody levels included age, seasonality, location, and ethnicity. Correlations between antibodies to blood-stage antigens AMA1, MSP1 and MSP2 were higher between themselves than with antibodies to the (NANP)4 epitope of the pre-erythrocytic circumsporozoite protein, while there was little or no correlation with total IgE levels. Individuals with sickle cell trait had significantly lower antibody levels to all blood-stage antigens, and recessive homozygotes for CD36 (rs321198) had significantly lower anti-malarial antibody levels to MSP2. CONCLUSION: Although the most significant finding with a consistent effect across sites was for sickle cell trait, its effect is likely to be via reducing a microscopically positive parasitaemia rather than directly on antibody levels. However, this study does demonstrate a framework for the feasibility of combining data from sites with heterogeneous malaria transmission levels across Africa and Asia with which to explore genetic effects on anti-malarial immunity.

Malaria Genomic Epidemiology Network, Malaria Genomic Epidemiology Network. 2014. Reappraisal of known malaria resistance loci in a large multicenter study. Nat Genet, 46 (11), pp. 1197-1204. | Show Abstract | Read more

Many human genetic associations with resistance to malaria have been reported, but few have been reliably replicated. We collected data on 11,890 cases of severe malaria due to Plasmodium falciparum and 17,441 controls from 12 locations in Africa, Asia and Oceania. We tested 55 SNPs in 27 loci previously reported to associate with severe malaria. There was evidence of association at P < 1 × 10(-4) with the HBB, ABO, ATP2B4, G6PD and CD40LG loci, but previously reported associations at 22 other loci did not replicate in the multicenter analysis. The large sample size made it possible to identify authentic genetic effects that are heterogeneous across populations or phenotypes, with a striking example being the main African form of G6PD deficiency, which reduced the risk of cerebral malaria but increased the risk of severe malarial anemia. The finding that G6PD deficiency has opposing effects on different fatal complications of P. falciparum infection indicates that the evolutionary origins of this common human genetic disorder are more complex than previously supposed.

Cited:

23

Scopus

Zhou K, Donnelly L, Yang J, Li M, Deshmukh H, Van Zuydam N, Ahlqvist E, Spencer CC, Groop L, Morris AD et al. 2014. Heritability of variation in glycaemic response to metformin: a genome-wide complex trait analysis The Lancet Diabetes & Endocrinology, 2 (6), pp. 481-487. | Show Abstract | Read more

Background: Metformin is a first-line oral agent used in the treatment of type 2 diabetes, but glycaemic response to this drug is highly variable. Understanding the genetic contribution to metformin response might increase the possibility of personalising metformin treatment. We aimed to establish the heritability of glycaemic response to metformin using the genome-wide complex trait analysis (GCTA) method. Methods: In this GCTA study, we obtained data about HbA1c concentrations before and during metformin treatment from patients in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) study, which includes a cohort of patients with type 2 diabetes and is linked to comprehensive clinical databases and genome-wide association study data. We applied the GCTA method to estimate heritability for four definitions of glycaemic response to metformin: absolute reduction in HbA1c; proportional reduction in HbA1c; adjusted reduction in HbA1c; and whether or not the target on-treatment HbA1c of less than 7% (53 mmol/mol) was achieved, with adjustment for baseline HbA1c and known clinical covariates. Chromosome-wise heritability estimation was used to obtain further information about the genetic architecture. Findings: 5386 individuals were included in the final dataset, of whom 2085 had enough clinical data to define glycaemic response to metformin. The heritability of glycaemic response to metformin varied by response phenotype, with a heritability of 34% (95% CI 1-68; p=0·022) for the absolute reduction in HbA1c, adjusted for pretreatment HbA1c. Chromosome-wise heritability estimates suggest that the genetic contribution is probably from individual variants scattered across the genome, which each have a small to moderate effect, rather than from a few loci that each have a large effect. Interpretation: Glycaemic response to metformin is heritable, thus glycaemic response to metformin is, in part, intrinsic to individual biological variation. Further genetic analysis might enable us to make better predictions for stratified medicine and to unravel new mechanisms of metformin action. Funding: Wellcome Trust. © 2014 Elsevier Ltd.

Bramon E, Pirinen M, Strange A, Lin K, Freeman C, Bellenguez C, Su Z, Band G, Pearson R, Vukcevic D et al. 2014. A Genome-wide Association Analysis of a Broad Psychosis Phenotype Identifies Three Loci for Further Investigation BIOLOGICAL PSYCHIATRY, 75 (5), pp. 386-397. | Read more

Morris DW, Pearson RD, Cormican P, Kenny EM, O'Dushlaine CT, Perreault LP, Giannoulatou E, Tropea D, Maher BS, Wormley B et al. 2014. An inherited duplication at the gene p21 Protein-Activated Kinase 7 (PAK7) is a risk factor for psychosis. Hum Mol Genet, 23 (12), pp. 3316-3326. | Show Abstract | Read more

Identifying rare, highly penetrant risk mutations may be an important step in dissecting the molecular etiology of schizophrenia. We conducted a gene-based analysis of large (>100 kb), rare copy-number variants (CNVs) in the Wellcome Trust Case Control Consortium 2 (WTCCC2) schizophrenia sample of 1564 cases and 1748 controls all from Ireland, and further extended the analysis to include an additional 5196 UK controls. We found association with duplications at chr20p12.2 (P = 0.007) and evidence of replication in large independent European schizophrenia (P = 0.052) and UK bipolar disorder case-control cohorts (P = 0.047). A combined analysis of Irish/UK subjects including additional psychosis cases (schizophrenia and bipolar disorder) identified 22 carriers in 11 707 cases and 10 carriers in 21 204 controls [meta-analysis Cochran-Mantel-Haenszel P-value = 2 × 10(-4); odds ratio (OR) = 11.3, 95% CI = 3.7, ∞]. Nineteen of the 22 cases and 8 of the 10 controls carried duplications starting at 9.68 Mb with similar breakpoints across samples. By haplotype analysis and sequencing, we identified a tandem ~149 kb duplication overlapping the gene p21 Protein-Activated Kinase 7 (PAK7, also called PAK5) which was in linkage disequilibrium with local haplotypes (P = 2.5 × 10(-21)), indicative of a single ancestral duplication event. We confirmed the breakpoints in 8/8 carriers tested and found co-segregation of the duplication with illness in two additional family members of one of the affected probands. We demonstrate that PAK7 is developmentally co-expressed with another known psychosis risk gene (DISC1) suggesting a potential molecular mechanism involving aberrant synapse development and plasticity.

Postmus I, Trompet S, Deshmukh HA, Barnes MR, Li X, Warren HR, Chasman DI, Zhou K, Arsenault BJ, Donnelly LA et al. 2014. Pharmacogenetic meta-analysis of genome-wide association studies of LDL cholesterol response to statins. Nat Commun, 5 pp. 5068. | Show Abstract | Read more

Statins effectively lower LDL cholesterol levels in large studies and the observed interindividual response variability may be partially explained by genetic variation. Here we perform a pharmacogenetic meta-analysis of genome-wide association studies (GWAS) in studies addressing the LDL cholesterol response to statins, including up to 18,596 statin-treated subjects. We validate the most promising signals in a further 22,318 statin recipients and identify two loci, SORT1/CELSR2/PSRC1 and SLCO1B1, not previously identified in GWAS. Moreover, we confirm the previously described associations with APOE and LPA. Our findings advance the understanding of the pharmacogenetic architecture of statin response.

Schizophrenia Working Group of the Psychiatric Genomics Consortium. 2014. Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511 (7510), pp. 421-427. | Show Abstract | Read more

Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain, providing biological plausibility for the findings. Many findings have the potential to provide entirely new insights into aetiology, but associations at DRD2 and several genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that have important roles in immunity, providing support for the speculated link between the immune system and schizophrenia.

Davis OS, Band G, Pirinen M, Haworth CM, Meaburn EL, Kovas Y, Harlaar N, Docherty SJ, Hanscombe KB, Trzaskowski M et al. 2014. The correlation between reading and mathematics ability at age twelve has a substantial genetic component. Nat Commun, 5 pp. 4204. | Show Abstract | Read more

Dissecting how genetic and environmental influences impact on learning is helpful for maximizing numeracy and literacy. Here we show, using twin and genome-wide analysis, that there is a substantial genetic component to children's ability in reading and mathematics, and estimate that around one half of the observed correlation in these traits is due to shared genetic effects (so-called Generalist Genes). Thus, our results highlight the potential role of the learning environment in contributing to differences in a child's cognitive abilities at age twelve.

Blue Mountains Eye Study (BMES), Wellcome Trust Case Control Consortium 2 (WTCCC2). 2013. Genome-wide association study of intraocular pressure identifies the GLCCI1/ICA1 region as a glaucoma susceptibility locus. Hum Mol Genet, 22 (22), pp. 4653-4660. | Show Abstract | Read more

To discover quantitative trait loci for intraocular pressure, a major risk factor for glaucoma and the only modifiable one, we performed a genome-wide association study on a discovery cohort of 2175 individuals from Sydney, Australia. We found a novel association between intraocular pressure and a common variant at 7p21 near to GLCCI1 and ICA1. The findings in this region were confirmed through two UK replication cohorts totalling 4866 individuals (rs59072263, P(combined) = 1.10 × 10(-8)). A copy of the G allele at this SNP is associated with an increase in mean IOP of 0.45 mmHg (95%CI = 0.30-0.61 mmHg). These results lend support to the implication of vesicle trafficking and glucocorticoid inducibility pathways in the determination of intraocular pressure and in the pathogenesis of primary open-angle glaucoma.

International Multiple Sclerosis Genetics Consortium (IMSGC), Beecham AH, Patsopoulos NA, Xifara DK, Davis MF, Kemppinen A, Cotsapas C, Shah TS, Spencer C, Booth D et al. 2013. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet, 45 (11), pp. 1353-1360. | Show Abstract | Read more

Using the ImmunoChip custom genotyping array, we analyzed 14,498 subjects with multiple sclerosis and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (P < 1.0 × 10(-4)). In a replication phase, we combined these data with previous genome-wide association study (GWAS) data from an independent 14,802 subjects with multiple sclerosis and 26,703 healthy controls. In these 80,094 individuals of European ancestry, we identified 48 new susceptibility variants (P < 5.0 × 10(-8)), 3 of which we found after conditioning on previously identified variants. Thus, there are now 110 established multiple sclerosis risk variants at 103 discrete loci outside of the major histocompatibility complex. With high-resolution Bayesian fine mapping, we identified five regions where one variant accounted for more than 50% of the posterior probability of association. This study enhances the catalog of multiple sclerosis risk variants and illustrates the value of fine mapping in the resolution of GWAS signals.

Ripke S, O'Dushlaine C, Chambert K, Moran JL, Kähler AK, Akterin S, Bergen SE, Collins AL, Crowley JJ, Fromer M et al. 2013. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet, 45 (10), pp. 1150-1159. | Show Abstract | Read more

Schizophrenia is an idiopathic mental disorder with a heritable component and a substantial public health impact. We conducted a multi-stage genome-wide association study (GWAS) for schizophrenia beginning with a Swedish national sample (5,001 cases and 6,243 controls) followed by meta-analysis with previous schizophrenia GWAS (8,832 cases and 12,067 controls) and finally by replication of SNPs in 168 genomic regions in independent samples (7,413 cases, 19,762 controls and 581 parent-offspring trios). We identified 22 loci associated at genome-wide significance; 13 of these are new, and 1 was previously implicated in bipolar disorder. Examination of candidate genes at these loci suggests the involvement of neuronal calcium signaling. We estimate that 8,300 independent, mostly common SNPs (95% credible interval of 6,300-10,200 SNPs) contribute to risk for schizophrenia and that these collectively account for at least 32% of the variance in liability. Common genetic variation has an important role in the etiology of schizophrenia, and larger studies will allow more detailed understanding of this disorder.

Band G, Le QS, Jostins L, Pirinen M, Kivinen K, Jallow M, Sisay-Joof F, Bojang K, Pinder M, Sirugo G et al. 2013. Imputation-based meta-analysis of severe malaria in three African populations. PLoS Genet, 9 (5), pp. e1003509. | Show Abstract | Read more

Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP-based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles.

Miotto O, Almagro-Garcia J, Manske M, Macinnis B, Campino S, Rockett KA, Amaratunga C, Lim P, Suon S, Sreng S et al. 2013. Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet, 45 (6), pp. 648-655. | Show Abstract | Read more

We describe an analysis of genome variation in 825 P. falciparum samples from Asia and Africa that identifies an unusual pattern of parasite population structure at the epicenter of artemisinin resistance in western Cambodia. Within this relatively small geographic area, we have discovered several distinct but apparently sympatric parasite subpopulations with extremely high levels of genetic differentiation. Of particular interest are three subpopulations, all associated with clinical resistance to artemisinin, which have skewed allele frequency spectra and high levels of haplotype homozygosity, indicative of founder effects and recent population expansion. We provide a catalog of SNPs that show high levels of differentiation in the artemisinin-resistant subpopulations, including codon variants in transporter proteins and DNA mismatch repair proteins. These data provide a population-level genetic framework for investigating the biological origins of artemisinin resistance and for defining molecular markers to assist in its elimination.

LeishGEN Consortium, Wellcome Trust Case Control Consortium 2, Fakiola M, Strange A, Cordell HJ, Miller EN, Pirinen M, Su Z, Mishra A, Mehrotra S et al. 2013. Common variants in the HLA-DRB1-HLA-DQA1 HLA class II region are associated with susceptibility to visceral leishmaniasis. Nat Genet, 45 (2), pp. 208-213. | Show Abstract | Read more

To identify susceptibility loci for visceral leishmaniasis, we undertook genome-wide association studies in two populations: 989 cases and 1,089 controls from India and 357 cases in 308 Brazilian families (1,970 individuals). The HLA-DRB1-HLA-DQA1 locus was the only region to show strong evidence of association in both populations. Replication at this region was undertaken in a second Indian population comprising 941 cases and 990 controls, and combined analysis across the three cohorts for rs9271858 at this locus showed P(combined) = 2.76 × 10(-17) and odds ratio (OR) = 1.41, 95% confidence interval (CI) = 1.30-1.52. A conditional analysis provided evidence for multiple associations within the HLA-DRB1-HLA-DQA1 region, and a model in which risk differed between three groups of haplotypes better explained the signal and was significant in the Indian discovery and replication cohorts. In conclusion, the HLA-DRB1-HLA-DQA1 HLA class II region contributes to visceral leishmaniasis susceptibility in India and Brazil, suggesting shared genetic risk factors for visceral leishmaniasis that cross the epidemiological divides of geography and parasite species.

Cited:

21

Scopus

Pirinen M, Donnelly P, Spencer CCA. 2013. EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES ANNALS OF APPLIED STATISTICS, 7 (1), pp. 369-390. | Show Abstract | Read more

Motivated by genome-wide association studies, we consider a standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately. Three novel contributions are (1) a transformation between the linear and log-odds scales which is accurate for the important genetic case of small effect sizes; (2) a likelihood-maximization algorithm that is an order of magnitude faster than the previously published approaches; and (3) efficient methods for computing marginal likelihoods which allow Bayesian model comparison. The methodology has been successfully applied to a large-scale association study of multiple sclerosis including over 20,000 individuals and 500,000 genetic variants. © 2013 Institute of Mathematical Statistics.

Tsoi LC, Spain SL, Knight J, Ellinghaus E, Stuart PE, Capon F, Ding J, Li Y, Tejasvi T, Gudjonsson JE et al. 2012. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nat Genet, 44 (12), pp. 1341-1348. | Show Abstract | Read more

To gain further insight into the genetic architecture of psoriasis, we conducted a meta-analysis of 3 genome-wide association studies (GWAS) and 2 independent data sets genotyped on the Immunochip, including 10,588 cases and 22,806 controls. We identified 15 new susceptibility loci, increasing to 36 the number associated with psoriasis in European individuals. We also identified, using conditional analyses, five independent signals within previously known loci. The newly identified loci shared with other autoimmune diseases include candidate genes with roles in regulating T-cell function (such as RUNX3, TAGAP and STAT3). Notably, they included candidate genes whose products are involved in innate host defense, including interferon-mediated antiviral responses (DDX58), macrophage activation (ZC3H12C) and nuclear factor (NF)-κB signaling (CARD14 and CARM1). These results portend a better understanding of shared and distinctive genetic determinants of immune-mediated inflammatory disorders and emphasize the importance of the skin in innate and acquired host defense.

Cited:

42

WOS

Strange A, Riley BP, Spencer CCA, Morris DW, Pirinen M, O'Dushlaine CT, Su Z, Maher BS, Freeman C, Cormican P et al. 2012. Genome-Wide Association Study Implicates HLA-C*01:02 as a Risk Factor at the Major Histocompatibility Complex Locus in Schizophrenia BIOLOGICAL PSYCHIATRY, 72 (8), pp. 620-628. | Read more

Su Z, Gay LJ, Strange A, Palles C, Band G, Whiteman DC, Lescai F, Langford C, Nanji M, Edkins S et al. 2012. Common variants at the MHC locus and at chromosome 16q24.1 predispose to Barrett's esophagus. Nat Genet, 44 (10), pp. 1131-1136. | Show Abstract | Read more

Barrett's esophagus is an increasingly common disease that is strongly associated with reflux of stomach acid and usually a hiatus hernia, and it strongly predisposes to esophageal adenocarcinoma (EAC), a tumor with a very poor prognosis. We report the first genome-wide association study on Barrett's esophagus, comprising 1,852 UK cases and 5,172 UK controls in the discovery stage and 5,986 cases and 12,825 controls in the replication stage. Variants at two loci were associated with disease risk: chromosome 6p21, rs9257809 (Pcombined=4.09×10(-9); odds ratio (OR)=1.21, 95% confidence interval (CI)=1.13-1.28), within the major histocompatibility complex locus, and chromosome 16q24, rs9936833 (Pcombined=2.74×10(-10); OR=1.14, 95% CI=1.10-1.19), for which the closest protein-coding gene is FOXF1, which is implicated in esophageal development and structure. We found evidence that many common variants of small effect contribute to genetic susceptibility to Barrett's esophagus and that SNP alleles predisposing to obesity also increase risk for Barrett's esophagus.

Golubchik T, Brueggemann AB, Street T, Gertz RE, Spencer CC, Ho T, Giannoulatou E, Link-Gelles R, Harding RM, Beall B et al. 2012. Pneumococcal genome sequencing tracks a vaccine escape variant formed through a multi-fragment recombination event. Nat Genet, 44 (3), pp. 352-355. | Show Abstract | Read more

Streptococcus pneumoniae ('pneumococcus') causes an estimated 14.5 million cases of serious disease and 826,000 deaths annually in children under 5 years of age(1). The highly effective introduction of the PCV7 pneumococcal vaccine in 2000 in the United States(2,3) provided an unprecedented opportunity to investigate the response of an important pathogen to widespread, vaccine-induced selective pressure. Here, we use array-based sequencing of 62 isolates from a US national monitoring program to study five independent instances of vaccine escape recombination(4), showing the simultaneous transfer of multiple and often large (up to at least 44 kb) DNA fragments. We show that one such new strain quickly became established, spreading from east to west across the United States. These observations clarify the roles of recombination and selection in the population genomics of pneumococcus and provide proof of principle of the considerable value of combining genomic and epidemiological information in the surveillance and enhanced understanding of infectious diseases.

Bellenguez C, Strange A, Freeman C, Wellcome Trust Case Control Consortium, Donnelly P, Spencer CC. 2012. A robust clustering algorithm for identifying problematic samples in genome-wide association studies. Bioinformatics, 28 (1), pp. 134-135. | Show Abstract | Read more

SUMMARY: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. AVAILABILITY: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer CONTACT: chris.spencer@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Cited:

166

Scopus

Bellenguez C, Bevan S, Gschwendtner A, Spencer CCA, Burgess AI, Pirinen M, Jackson CA, Traylor M, Strange A, Su Z et al. 2012. Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke Nature Genetics, 44 (3), pp. 328-333. | Show Abstract | Read more

Genetic factors have been implicated in stroke risk, but few replicated associations have been reported. We conducted a genome-wide association study (GWAS) for ischemic stroke and its subtypes in 3,548 affected individuals and 5,972 controls, all of European ancestry. Replication of potential signals was performed in 5,859 affected individuals and 6,281 controls. We replicated previous associations for cardioembolic stroke near PITX2 and ZFHX3 and for large vessel stroke at a 9p21 locus. We identified a new association for large vessel stroke within HDAC9 (encoding histone deacetylase 9) on chromosome 7p21.1 (including further replication in an additional 735 affected individuals and 28,583 controls) (rs11984041; combined P = 1.87 × 10 -11; odds ratio (OR) = 1.42, 95% confidence interval (CI) = 1.28-1.57). All four loci exhibited evidence for heterogeneity of effect across the stroke subtypes, with some and possibly all affecting risk for only one subtype. This suggests distinct genetic architectures for different stroke subtypes. © 2012 Nature America, Inc. All rights reserved.

Pirinen M, Donnelly P, Spencer CC. 2012. Including known covariates can reduce power to detect genetic effects in case-control studies. Nat Genet, 44 (8), pp. 848-851. | Show Abstract | Read more

Genome-wide association studies (GWAS) search for associations between genetic variants and disease status, typically via logistic regression. Often there are covariates, such as sex or well-established major genetic factors, that are known to affect disease susceptibility and are independent of tested genotypes at the population level. We show theoretically and with data from recent GWAS on multiple sclerosis, psoriasis and ankylosing spondylitis that inclusion of known covariates can substantially reduce power for the identification of associated variants when the disease prevalence is lower than a few percent. Whether the inclusion of such covariates reduces or increases power to detect genetic effects depends on various factors, including the prevalence of the disease studied. When the disease is common (prevalence of >20%), the inclusion of covariates typically increases power, whereas, for rarer diseases, it can often decrease power to detect new genetic associations.

Cited:

29

Scopus

Bhatia G, Patterson N, Pasaniuc B, Zaitlen N, Genovese G, Pollack S, Mallick S, Myers S, Tandon A, Spencer C et al. 2011. Genome-wide Comparison of African-Ancestry Populations from CARe and Other Cohorts Reveals Signals of Natural Selection AMERICAN JOURNAL OF HUMAN GENETICS, 89 (3), pp. 368-381. | Show Abstract | Read more

The study of recent natural selection in human populations has important applications to human history and medicine. Positive natural selection drives the increase in beneficial alleles and plays a role in explaining diversity across human populations. By discovering traits subject to positive selection, we can better understand the population level response to environmental pressures including infectious disease. Our study examines unusual population differentiation between three large data sets to detect natural selection. The populations examined, African Americans, Nigerians, and Gambians, are genetically close to one another (F ST < 0.01 for all pairs), allowing us to detect selection even with moderate changes in allele frequency. We also develop a tree-based method to pinpoint the population in which selection occurred, incorporating information across populations. Our genome-wide significant results corroborate loci previously reported to be under selection in Africans including HBB and CD36. At the HLA locus on chromosome 6, results suggest the existence of multiple, independent targets of population-specific selective pressure. In addition, we report a genome-wide significant (p = 1.36 × 10 -11) signal of selection in the prostate stem cell antigen (PSCA) gene. The most significantly differentiated marker in our analysis, rs2920283, is highly differentiated in both Africa and East Asia and has prior genome-wide significant associations to bladder and gastric cancers. © 2011 The American Society of Human Genetics.

Didelot X, Bowden R, Street T, Golubchik T, Spencer C, McVean G, Sangal V, Anjum MF, Achtman M, Falush D, Donnelly P. 2011. Recombination and population structure in Salmonella enterica. PLoS Genet, 7 (7), pp. e1002191. | Show Abstract | Read more

Salmonella enterica is a bacterial pathogen that causes enteric fever and gastroenteritis in humans and animals. Although its population structure was long described as clonal, based on high linkage disequilibrium between loci typed by enzyme electrophoresis, recent examination of gene sequences has revealed that recombination plays an important evolutionary role. We sequenced around 10% of the core genome of 114 isolates of enterica using a resequencing microarray. Application of two different analysis methods (Structure and ClonalFrame) to our genomic data allowed us to define five clear lineages within S. enterica subspecies enterica, one of which is five times older than the other four and two thirds of the age of the whole subspecies. We show that some of these lineages display more evidence of recombination than others. We also demonstrate that some level of sexual isolation exists between the lineages, so that recombination has occurred predominantly between members of the same lineage. This pattern of recombination is compatible with expectations from the previously described ecological structuring of the enterica population as well as mechanistic barriers to recombination observed in laboratory experiments. In spite of their relatively low level of genetic differentiation, these lineages might therefore represent incipient species.

Plagnol V, Nalls MA, Bras JM, Hernandez DG, Sharma M, Sheerin U-M, Saad M, Simon-Sanchez J, Schulte C, Lesage S et al. 2011. A Two-Stage Meta-Analysis Identifies Several New Loci for Parkinson's Disease PLOS GENETICS, 7 (6), pp. e1002142-e1002142. | Read more

Vukcevic D, Hechter E, Spencer C, Donnelly P. 2011. Disease model distortion in association studies. Genet Epidemiol, 35 (4), pp. 278-290. | Show Abstract | Read more

Most findings from genome-wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well-known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely.

Spencer C, Hechter E, Vukcevic D, Donnelly P. 2011. Quantifying the underestimation of relative risks from genome-wide association studies. PLoS Genet, 7 (3), pp. e1001337. | Show Abstract | Read more

Genome-wide association studies (GWAS) have identified hundreds of associated loci across many common diseases. Most risk variants identified by GWAS will merely be tags for as-yet-unknown causal variants. It is therefore possible that identification of the causal variant, by fine mapping, will identify alleles with larger effects on genetic risk than those currently estimated from GWAS replication studies. We show that under plausible assumptions, whilst the majority of the per-allele relative risks (RR) estimated from GWAS data will be close to the true risk at the causal variant, some could be considerable underestimates. For example, for an estimated RR in the range 1.2-1.3, there is approximately a 38% chance that it exceeds 1.4 and a 10% chance that it is over 2. We show how these probabilities can vary depending on the true effects associated with low-frequency variants and on the minor allele frequency (MAF) of the most associated SNP. We investigate the consequences of the underestimation of effect sizes for predictions of an individual's disease risk and interpret our results for the design of fine mapping experiments. Although these effects mean that the amount of heritability explained by known GWAS loci is expected to be larger than current projections, this increase is likely to explain a relatively small amount of the so-called "missing" heritability.

Cited:

325

WOS

Nalls MA, Plagnol V, Hernandez DG, Sharma M, Sheerin U-M, Saad M, Simon-Sanchez J, Schulte C, Lesage S, Sveinbjornsdottir S et al. 2011. Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies LANCET, 377 (9766), pp. 641-649. | Read more

GoDARTS and UKPDS Diabetes Pharmacogenetics Study Group, Wellcome Trust Case Control Consortium 2, Zhou K, Bellenguez C, Spencer CC, Bennett AJ, Coleman RL, Tavendale R, Hawley SA, Donnelly LA et al. 2011. Common variants near ATM are associated with glycemic response to metformin in type 2 diabetes. Nat Genet, 43 (2), pp. 117-120. | Show Abstract | Read more

Metformin is the most commonly used pharmacological therapy for type 2 diabetes. We report a genome-wide association study for glycemic response to metformin in 1,024 Scottish individuals with type 2 diabetes with replication in two cohorts including 1,783 Scottish individuals and 1,113 individuals from the UK Prospective Diabetes Study. In a combined meta-analysis, we identified a SNP, rs11212617, associated with treatment success (n = 3,920, P = 2.9 × 10(-9), odds ratio = 1.35, 95% CI 1.22-1.49) at a locus containing ATM, the ataxia telangiectasia mutated gene. In a rat hepatoma cell line, inhibition of ATM with KU-55933 attenuated the phosphorylation and activation of AMP-activated protein kinase in response to metformin. We conclude that ATM, a gene known to be involved in DNA repair and cell cycle control, plays a role in the effect of metformin upstream of AMP-activated protein kinase, and variation in this gene alters glycemic response to metformin.

UK Parkinson's Disease Consortium, Wellcome Trust Case Control Consortium 2, Spencer CC, Plagnol V, Strange A, Gardner M, Paisan-Ruiz C, Band G, Barker RA, Bellenguez C et al. 2011. Dissection of the genetics of Parkinson's disease identifies an additional association 5' of SNCA and multiple associated haplotypes at 17q21. Hum Mol Genet, 20 (2), pp. 345-353. | Show Abstract | Read more

We performed a genome-wide association study (GWAS) in 1705 Parkinson's disease (PD) UK patients and 5175 UK controls, the largest sample size so far for a PD GWAS. Replication was attempted in an additional cohort of 1039 French PD cases and 1984 controls for the 27 regions showing the strongest evidence of association (P< 10(-4)). We replicated published associations in the 4q22/SNCA and 17q21/MAPT chromosome regions (P< 10(-10)) and found evidence for an additional independent association in 4q22/SNCA. A detailed analysis of the haplotype structure at 17q21 showed that there are three separate risk groups within this region. We found weak but consistent evidence of association for common variants located in three previously published associated regions (4p15/BST1, 4p16/GAK and 1q32/PARK16). We found no support for the previously reported SNP association in 12q12/LRRK2. We also found an association of the two SNPs in 4q22/SNCA with the age of onset of the disease.

International Multiple Sclerosis Genetics Consortium, Wellcome Trust Case Control Consortium 2, Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, Moutsianas L, Dilthey A, Su Z et al. 2011. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature, 476 (7359), pp. 214-219. | Show Abstract | Read more

Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.

Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. 2011. Genome-wide association study identifies five new schizophrenia loci. Nat Genet, 43 (10), pp. 969-976. | Show Abstract | Read more

We examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects. The combined stage 1 and 2 analysis yielded genome-wide significant associations with schizophrenia for seven loci, five of which are new (1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32-q24.33) and two of which have been previously implicated (6p21.32-p22.1 and 18q21.2). The strongest new finding (P = 1.6 × 10(-11)) was with rs1625579 within an intron of a putative primary transcript for MIR137 (microRNA 137), a known regulator of neuronal development. Four other schizophrenia loci achieving genome-wide significance contain predicted targets of MIR137, suggesting MIR137-mediated dysregulation as a previously unknown etiologic mechanism in schizophrenia. In a joint analysis with a bipolar disorder sample (16,374 affected individuals and 14,044 controls), three loci reached genome-wide significance: CACNA1C (rs4765905, P = 7.0 × 10(-9)), ANK3 (rs10994359, P = 2.5 × 10(-8)) and the ITIH3-ITIH4 region (rs2239547, P = 7.8 × 10(-9)).

Genetic Analysis of Psoriasis Consortium & the Wellcome Trust Case Control Consortium 2, Strange A, Capon F, Spencer CC, Knight J, Weale ME, Allen MH, Barton A, Band G, Bellenguez C et al. 2010. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nat Genet, 42 (11), pp. 985-990. | Show Abstract | Read more

To identify new susceptibility loci for psoriasis, we undertook a genome-wide association study of 594,224 SNPs in 2,622 individuals with psoriasis and 5,667 controls. We identified associations at eight previously unreported genomic loci. Seven loci harbored genes with recognized immune functions (IL28RA, REL, IFIH1, ERAP1, TRAF3IP2, NFKBIA and TYK2). These associations were replicated in 9,079 European samples (six loci with a combined P < 5 × 10⁻⁸ and two loci with a combined P < 5 × 10⁻⁷). We also report compelling evidence for an interaction between the HLA-C and ERAP1 loci (combined P = 6.95 × 10⁻⁶). ERAP1 plays an important role in MHC class I peptide processing. ERAP1 variants only influenced psoriasis susceptibility in individuals carrying the HLA-C risk allele. Our findings implicate pathways that integrate epidermal barrier dysfunction with innate and adaptive immune dysregulation in psoriasis pathogenesis.

UK IBD Genetics Consortium, Barrett JC, Lee JC, Lees CW, Prescott NJ, Anderson CA, Phillips A, Wesley E, Parnell K, Zhang H et al. 2009. Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nat Genet, 41 (12), pp. 1330-1334. | Show Abstract | Read more

Ulcerative colitis is a common form of inflammatory bowel disease with a complex etiology. As part of the Wellcome Trust Case Control Consortium 2, we performed a genome-wide association scan for ulcerative colitis in 2,361 cases and 5,417 controls. Loci showing evidence of association at P < 1 x 10(-5) were followed up by genotyping in an independent set of 2,321 cases and 4,818 controls. We find genome-wide significant evidence of association at three new loci, each containing at least one biologically relevant candidate gene, on chromosomes 20q13 (HNF4A; P = 3.2 x 10(-17)), 16q22 (CDH1 and CDH3; P = 2.8 x 10(-8)) and 7q31 (LAMB1; P = 3.0 x 10(-8)). Of note, CDH1 has recently been associated with susceptibility to colorectal cancer, an established complication of longstanding ulcerative colitis. The new associations suggest that changes in the integrity of the intestinal epithelial barrier may contribute to the pathogenesis of ulcerative colitis.

Spencer CC, Su Z, Donnelly P, Marchini J. 2009. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet, 5 (5), pp. e1000477. | Show Abstract | Read more

Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.

Wellcome Trust Case Control Consortium, Australo-Anglo-American Spondylitis Consortium (TASC), Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI et al. 2007. Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet, 39 (11), pp. 1329-1337. | Show Abstract | Read more

We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases.

Cited:

2813

Scopus

Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM et al. 2007. A second generation human haplotype map of over 3.1 million SNPs Nature, 449 (7164), pp. 851-861. | Show Abstract | Read more

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. ©2007 Nature Publishing Group.

Wellcome Trust Case Control Consortium. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447 (7145), pp. 661-678. | Show Abstract | Read more

There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined approximately 2,000 individuals for each of 7 major diseases and a shared set of approximately 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 x 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 x 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.

Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R et al. 2007. Genome-wide detection and characterization of positive selection in human populations. Nature, 449 (7164), pp. 913-918. | Show Abstract | Read more

With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.

McVean G, Spencer CC. 2006. Scanning the human genome for signals of selection. Curr Opin Genet Dev, 16 (6), pp. 624-629. | Show Abstract | Read more

The search for adaptive evolution in the human genome has reached a new era with the advent of genome-wide surveys of genetic variation. However, making sense, let alone use, of such experiments is far from straightforward. Key problems include the way in which the data have been collected, the need to control for factors such as population history and variable recombination rates, which influence the discovery rates for both true and false positives, and the inherent difficulty of falsification. Nevertheless, recent work has shown that genome scans can be used to identify both functional polymorphisms underlying selected traits and entire classes of genes enriched for signals of adaptation.

Spencer CC, Deloukas P, Hunt S, Mullikin J, Myers S, Silverman B, Donnelly P, Bentley D, McVean G. 2006. The influence of recombination on human genetic diversity. PLoS Genet, 2 (9), pp. e148. | Show Abstract | Read more

In humans, the rate of recombination, as measured on the megabase scale, is positively associated with the level of genetic variation, as measured at the genic scale. Despite considerable debate, it is not clear whether these factors are causally linked or, if they are, whether this is driven by the repeated action of adaptive evolution or molecular processes such as double-strand break formation and mismatch repair. We introduce three innovations to the analysis of recombination and diversity: fine-scale genetic maps estimated from genotype experiments that identify recombination hotspots at the kilobase scale, analysis of an entire human chromosome, and the use of wavelet techniques to identify correlations acting at different scales. We show that recombination influences genetic diversity only at the level of recombination hotspots. Hotspots are also associated with local increases in GC content and the relative frequency of GC-increasing mutations but have no effect on substitution rates. Broad-scale association between recombination and diversity is explained through covariance of both factors with base composition. To our knowledge, these results are the first evidence of a direct and local influence of recombination hotspots on genetic variation and the fate of individual mutations. However, that hotspots have no influence on substitution rates suggests that they are too ephemeral on an evolutionary time scale to have a strong influence on broader scale patterns of base composition and long-term molecular evolution.

Myers S, Spencer CC, Auton A, Bottolo L, Freeman C, Donnelly P, McVean G. 2006. The distribution and causes of meiotic recombination in the human genome. Biochem Soc Trans, 34 (Pt 4), pp. 526-530. | Show Abstract | Read more

Using the statistical analysis of genetic variation, we have developed a high-resolution genetic map of recombination hotspots and recombination rate variation across the human genome. This map, which has a resolution several orders of magnitude greater than previous studies, identifies over 25,000 recombination hotspots and gives new insights into the distribution and determination of recombination. Wavelet-based analysis demonstrates scale-specific influences of base composition, coding context and DNA repeats on recombination rates, though, in contrast with other species, no association with DNase I hypersensitivity. We have also identified specific DNA motifs that are strongly associated with recombination hotspots and whose activity is influenced by local context. Comparative analysis of recombination rates in humans and chimpanzees demonstrates very high rates of evolution of the fine-scale structure of the recombination landscape. In the light of these observations, we suggest possible resolutions of the hotspot paradox.

Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CC et al. 2006. The DNA sequence and biological annotation of human chromosome 1. Nature, 441 (7091), pp. 315-321. | Show Abstract | Read more

The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.

Spencer CC. 2006. Human polymorphism around recombination hotspots. Biochem Soc Trans, 34 (Pt 4), pp. 535-536. | Show Abstract | Read more

Meiotic recombination in humans is thought to occur as part of the resolution of DSBs (double-strand breaks). The repair of DSBs potentially leads to biases in DNA repair that can distort the population frequency of the alleles at single-nucleotide polymorphisms. Genome-wide variation data provide evidence for a weak fixation bias in favour of G and C alleles that is strongest at the centre of inferred recombination hotspots.

Hanchard NA, Rockett KA, Spencer C, Coop G, Pinder M, Jallow M, Kimber M, McVean G, Mott R, Kwiatkowski DP. 2006. Screening for recently selected alleles by analysis of human haplotype similarity. Am J Hum Genet, 78 (1), pp. 153-159. | Show Abstract | Read more

There is growing interest in the use of haplotype-based methods for detecting recent selection. Here, we describe a method that uses a sliding window to estimate similarity among the haplotypes associated with any given single-nucleotide polymorphism (SNP) allele. We used simulations of natural selection to provide estimates of the empirical power of the method to detect recently selected alleles and found it to be comparable in power to the popular long-range haplotype test and more powerful than methods based on nucleotide diversity. We then applied the method to a recently selected allele--the sickle mutation at the HBB locus--and found it to have a signal of selection that was significantly stronger than that of simulated models both with and without strong selection. Using this method, we also evaluated >4,000 SNPs on chromosome 20, indicating the applicability of the method to regional data sets.

International HapMap Consortium. 2005. A haplotype map of the human genome. Nature, 437 (7063), pp. 1299-1320. | Show Abstract | Read more

Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

McVean G, Spencer CC, Chaix R. 2005. Perspectives on human genetic variation from the HapMap Project. PLoS Genet, 1 (4), pp. e54. | Show Abstract | Read more

The completion of the International HapMap Project marks the start of a new phase in human genetics. The aim of the project was to provide a resource that facilitates the design of efficient genome-wide association studies, through characterising patterns of genetic variation and linkage disequilibrium in a sample of 270 individuals across four geographical populations. In total, over one million SNPs have been typed across these genomes, providing an unprecedented view of human genetic diversity. In this review we focus on what the HapMap Project has taught us about the structure of human genetic variation and the fundamental molecular and evolutionary processes that shape it.

Spencer CC, Coop G. 2004. SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics, 20 (18), pp. 3673-3675. | Show Abstract | Read more

UNLABELLED: SelSim is a program for Monte Carlo simulation of DNA polymorphism data for a recombining region within which a single bi-allelic site has experienced natural selection. SelSim allows simulation from either a fully stochastic model of, or deterministic approximations to, natural selection within a coalescent framework. A number of different mutation models are available for simulating surrounding neutral variation. The package enables a detailed exploration of the effects of different models and strengths of selection on patterns of diversity. This provides a tool for the statistical analysis of both empirical data and methods designed to detect natural selection. AVAILABILITY: http://www.stats.ox.ac.uk/mathgen/software.html. SUPPLEMENTARY INFORMATION: http://www.stats.ox.ac.uk/mathgen/software.html.

LeishGEN Consortium, Wellcome Trust Case Control Consortium 2, Fakiola M, Strange A, Cordell HJ, Miller EN, Pirinen M, Su Z, Mishra A, Mehrotra S et al. 2013. Common variants in the HLA-DRB1-HLA-DQA1 HLA class II region are associated with susceptibility to visceral leishmaniasis. Nat Genet, 45 (2), pp. 208-213. | Show Abstract | Read more

To identify susceptibility loci for visceral leishmaniasis, we undertook genome-wide association studies in two populations: 989 cases and 1,089 controls from India and 357 cases in 308 Brazilian families (1,970 individuals). The HLA-DRB1-HLA-DQA1 locus was the only region to show strong evidence of association in both populations. Replication at this region was undertaken in a second Indian population comprising 941 cases and 990 controls, and combined analysis across the three cohorts for rs9271858 at this locus showed P(combined) = 2.76 × 10(-17) and odds ratio (OR) = 1.41, 95% confidence interval (CI) = 1.30-1.52. A conditional analysis provided evidence for multiple associations within the HLA-DRB1-HLA-DQA1 region, and a model in which risk differed between three groups of haplotypes better explained the signal and was significant in the Indian discovery and replication cohorts. In conclusion, the HLA-DRB1-HLA-DQA1 HLA class II region contributes to visceral leishmaniasis susceptibility in India and Brazil, suggesting shared genetic risk factors for visceral leishmaniasis that cross the epidemiological divides of geography and parasite species.

Cited:

21

Scopus

Pirinen M, Donnelly P, Spencer CCA. 2013. EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES ANNALS OF APPLIED STATISTICS, 7 (1), pp. 369-390. | Show Abstract | Read more

Motivated by genome-wide association studies, we consider a standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately. Three novel contributions are (1) a transformation between the linear and log-odds scales which is accurate for the important genetic case of small effect sizes; (2) a likelihood-maximization algorithm that is an order of magnitude faster than the previously published approaches; and (3) efficient methods for computing marginal likelihoods which allow Bayesian model comparison. The methodology has been successfully applied to a large-scale association study of multiple sclerosis including over 20,000 individuals and 500,000 genetic variants. © 2013 Institute of Mathematical Statistics.

Su Z, Gay LJ, Strange A, Palles C, Band G, Whiteman DC, Lescai F, Langford C, Nanji M, Edkins S et al. 2012. Common variants at the MHC locus and at chromosome 16q24.1 predispose to Barrett's esophagus. Nat Genet, 44 (10), pp. 1131-1136. | Show Abstract | Read more

Barrett's esophagus is an increasingly common disease that is strongly associated with reflux of stomach acid and usually a hiatus hernia, and it strongly predisposes to esophageal adenocarcinoma (EAC), a tumor with a very poor prognosis. We report the first genome-wide association study on Barrett's esophagus, comprising 1,852 UK cases and 5,172 UK controls in the discovery stage and 5,986 cases and 12,825 controls in the replication stage. Variants at two loci were associated with disease risk: chromosome 6p21, rs9257809 (Pcombined=4.09×10(-9); odds ratio (OR)=1.21, 95% confidence interval (CI)=1.13-1.28), within the major histocompatibility complex locus, and chromosome 16q24, rs9936833 (Pcombined=2.74×10(-10); OR=1.14, 95% CI=1.10-1.19), for which the closest protein-coding gene is FOXF1, which is implicated in esophageal development and structure. We found evidence that many common variants of small effect contribute to genetic susceptibility to Barrett's esophagus and that SNP alleles predisposing to obesity also increase risk for Barrett's esophagus.

Bellenguez C, Strange A, Freeman C, Wellcome Trust Case Control Consortium, Donnelly P, Spencer CC. 2012. A robust clustering algorithm for identifying problematic samples in genome-wide association studies. Bioinformatics, 28 (1), pp. 134-135. | Show Abstract | Read more

SUMMARY: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. AVAILABILITY: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer CONTACT: chris.spencer@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Cited:

166

Scopus

Bellenguez C, Bevan S, Gschwendtner A, Spencer CCA, Burgess AI, Pirinen M, Jackson CA, Traylor M, Strange A, Su Z et al. 2012. Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke Nature Genetics, 44 (3), pp. 328-333. | Show Abstract | Read more

Genetic factors have been implicated in stroke risk, but few replicated associations have been reported. We conducted a genome-wide association study (GWAS) for ischemic stroke and its subtypes in 3,548 affected individuals and 5,972 controls, all of European ancestry. Replication of potential signals was performed in 5,859 affected individuals and 6,281 controls. We replicated previous associations for cardioembolic stroke near PITX2 and ZFHX3 and for large vessel stroke at a 9p21 locus. We identified a new association for large vessel stroke within HDAC9 (encoding histone deacetylase 9) on chromosome 7p21.1 (including further replication in an additional 735 affected individuals and 28,583 controls) (rs11984041; combined P = 1.87 × 10 -11; odds ratio (OR) = 1.42, 95% confidence interval (CI) = 1.28-1.57). All four loci exhibited evidence for heterogeneity of effect across the stroke subtypes, with some and possibly all affecting risk for only one subtype. This suggests distinct genetic architectures for different stroke subtypes. © 2012 Nature America, Inc. All rights reserved.

Pirinen M, Donnelly P, Spencer CC. 2012. Including known covariates can reduce power to detect genetic effects in case-control studies. Nat Genet, 44 (8), pp. 848-851. | Show Abstract | Read more

Genome-wide association studies (GWAS) search for associations between genetic variants and disease status, typically via logistic regression. Often there are covariates, such as sex or well-established major genetic factors, that are known to affect disease susceptibility and are independent of tested genotypes at the population level. We show theoretically and with data from recent GWAS on multiple sclerosis, psoriasis and ankylosing spondylitis that inclusion of known covariates can substantially reduce power for the identification of associated variants when the disease prevalence is lower than a few percent. Whether the inclusion of such covariates reduces or increases power to detect genetic effects depends on various factors, including the prevalence of the disease studied. When the disease is common (prevalence of >20%), the inclusion of covariates typically increases power, whereas, for rarer diseases, it can often decrease power to detect new genetic associations.

Vukcevic D, Hechter E, Spencer C, Donnelly P. 2011. Disease model distortion in association studies. Genet Epidemiol, 35 (4), pp. 278-290. | Show Abstract | Read more

Most findings from genome-wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well-known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely.

Spencer C, Hechter E, Vukcevic D, Donnelly P. 2011. Quantifying the underestimation of relative risks from genome-wide association studies. PLoS Genet, 7 (3), pp. e1001337. | Show Abstract | Read more

Genome-wide association studies (GWAS) have identified hundreds of associated loci across many common diseases. Most risk variants identified by GWAS will merely be tags for as-yet-unknown causal variants. It is therefore possible that identification of the causal variant, by fine mapping, will identify alleles with larger effects on genetic risk than those currently estimated from GWAS replication studies. We show that under plausible assumptions, whilst the majority of the per-allele relative risks (RR) estimated from GWAS data will be close to the true risk at the causal variant, some could be considerable underestimates. For example, for an estimated RR in the range 1.2-1.3, there is approximately a 38% chance that it exceeds 1.4 and a 10% chance that it is over 2. We show how these probabilities can vary depending on the true effects associated with low-frequency variants and on the minor allele frequency (MAF) of the most associated SNP. We investigate the consequences of the underestimation of effect sizes for predictions of an individual's disease risk and interpret our results for the design of fine mapping experiments. Although these effects mean that the amount of heritability explained by known GWAS loci is expected to be larger than current projections, this increase is likely to explain a relatively small amount of the so-called "missing" heritability.

GoDARTS and UKPDS Diabetes Pharmacogenetics Study Group, Wellcome Trust Case Control Consortium 2, Zhou K, Bellenguez C, Spencer CC, Bennett AJ, Coleman RL, Tavendale R, Hawley SA, Donnelly LA et al. 2011. Common variants near ATM are associated with glycemic response to metformin in type 2 diabetes. Nat Genet, 43 (2), pp. 117-120. | Show Abstract | Read more

Metformin is the most commonly used pharmacological therapy for type 2 diabetes. We report a genome-wide association study for glycemic response to metformin in 1,024 Scottish individuals with type 2 diabetes with replication in two cohorts including 1,783 Scottish individuals and 1,113 individuals from the UK Prospective Diabetes Study. In a combined meta-analysis, we identified a SNP, rs11212617, associated with treatment success (n = 3,920, P = 2.9 × 10(-9), odds ratio = 1.35, 95% CI 1.22-1.49) at a locus containing ATM, the ataxia telangiectasia mutated gene. In a rat hepatoma cell line, inhibition of ATM with KU-55933 attenuated the phosphorylation and activation of AMP-activated protein kinase in response to metformin. We conclude that ATM, a gene known to be involved in DNA repair and cell cycle control, plays a role in the effect of metformin upstream of AMP-activated protein kinase, and variation in this gene alters glycemic response to metformin.

UK Parkinson's Disease Consortium, Wellcome Trust Case Control Consortium 2, Spencer CC, Plagnol V, Strange A, Gardner M, Paisan-Ruiz C, Band G, Barker RA, Bellenguez C et al. 2011. Dissection of the genetics of Parkinson's disease identifies an additional association 5' of SNCA and multiple associated haplotypes at 17q21. Hum Mol Genet, 20 (2), pp. 345-353. | Show Abstract | Read more

We performed a genome-wide association study (GWAS) in 1705 Parkinson's disease (PD) UK patients and 5175 UK controls, the largest sample size so far for a PD GWAS. Replication was attempted in an additional cohort of 1039 French PD cases and 1984 controls for the 27 regions showing the strongest evidence of association (P< 10(-4)). We replicated published associations in the 4q22/SNCA and 17q21/MAPT chromosome regions (P< 10(-10)) and found evidence for an additional independent association in 4q22/SNCA. A detailed analysis of the haplotype structure at 17q21 showed that there are three separate risk groups within this region. We found weak but consistent evidence of association for common variants located in three previously published associated regions (4p15/BST1, 4p16/GAK and 1q32/PARK16). We found no support for the previously reported SNP association in 12q12/LRRK2. We also found an association of the two SNPs in 4q22/SNCA with the age of onset of the disease.

International Multiple Sclerosis Genetics Consortium, Wellcome Trust Case Control Consortium 2, Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, Moutsianas L, Dilthey A, Su Z et al. 2011. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature, 476 (7359), pp. 214-219. | Show Abstract | Read more

Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.

Spencer CC, Su Z, Donnelly P, Marchini J. 2009. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet, 5 (5), pp. e1000477. | Show Abstract | Read more

Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.

1046