In recent years, large-scale, well designed genome-wide association (GWA) studies of hundreds of thousands of single nucleotide polymorphisms (SNPs) typed on thousands of individuals have proved to be extremely successful in identifying novel common loci contributing moderate effects to complex human phenotypes. Furthermore, meta-analyses of the results of these studies, combining tens of thousands of samples from the same or closely related populations, have mapped ever more modest loci. However, many of these studies have been analysed using simple statistical techniques, focussing on single locus tests of association of SNPs and phenotype. We would expect to gain greater power to detect associations using more complicated analysis methods, particularly if the effect of a gene on the phenotype is not mediated through the effects of a single, common causal variant, which is likely to be the rule, and not the exception, for complex traits.
Multi-locus methods, combining genotypes at several SNPs simultaneously in joint analyses, provides one approach to increasing power to detect association with multiple common causal variants within the same region (allelic heterogeneity) or to identify rare causal variants. Our group has focussed extensively on the development of haplotype-based methods for GWA studies, attempting to deal with the following issues: (i) estimation of haplotypes from SNP genotype data and incorporation of the uncertainty in these estimates in the subsequent association analysis; and (ii) reducing the number of haplotypes (particularly those that are rare) by clustering using cladistic methods or Bayesian partition modelling. Haplotype-based methods are the most powerful techniques for identifying rare causal variants with SNPs genotyped on GWA genotyping arrays. However, with the increasing availability of deep re-sequencing data, such as that from the 1,000 Genomes Project, we have also been developing methods for identifying rare causal variants through the observation of the accumulation of minor alleles at these loci within genes.
Our group have also been focussing on the development of methodology for GWA gene-environment and gene-gene interaction, where the effects of a causal SNP are modified by a non-genetic risk factor (such as gender, age, or exposure to a specific environment), or by genotypes at another locus. We have also been considering appropriate methods for sub-phenotype analysis that can be used to partition a complex phenotype according to severity or age of onset, for example. These methods can be used to identify causal variants that are shared by sub-phenotypes, but also those that may be specific to one sub-phenotype, or may have effects that vary between sub-phenotypes. Our group is also working on methodology for multiple phenotype analyses, where SNPs appear to be associated with several correlated traits. The goal of these analyses is to attempt to disentangle the correlations between phenotypes, and to determine for which genotypes the associated SNP is truly causal.
Members of our group have been involved in the analysis of data generated as part of the Wellcome Trust Case Control Consortium (WTCCC), and are currently involved in the analysis of GWA studies for a wide range of complex human phenotypes including type 2 diabetes, coronary artery disease, asthma, tuberculosis, anthropometric traits and dyslexia. Our group contributes to a number of international consortia including DIAGRAM (type 2 diabetes), GIANT (anthropometric traits), ENGAGE and MAGIC (metabolic traits), and NEURODYS (dyslexia), and to the fine-mapping and re-sequencing sub-committee of WTCCC+.