Prof Richard F Mott
| Research Area: | Bioinformatics & Stats (inc. Modelling and Computational Biology) |
|---|---|
| Technology Exchange: | Bioinformatics, Computational biology, SNP typing, Statistical genetics and Transcript profiling |
| Scientific Themes: | Genetics & Genomics |
| Keywords: | QTL, bioinformatics, disease mapping and comparative genomics |
| Web Links: |
My group has current research foci in the areas of comparative genomics, ancestral haplotype construction, database development, QTL linkage and association methods, variability in patterns of recombination, transcriptome analysis and multivariate modelling of quantitative traits.
My own research focusses on the analysis of complex traits in mouse models of human disease, and in other model organisms
| Name | Department | Institution | Country |
|---|---|---|---|
| Prof Jonathan Flint | Wellcome Trust Centre for Human Genetics | Oxford University | UK |
| Prof Chris Holmes | Wellcome Trust Centre for Human Genetics | Oxford University | UK |
| Nicholas Harberd | University of Oxford | UK | |
| Paula Kover | University of Manchester | UK | |
| Fuad Iraqi | Tel Aviv University | Israel | |
| Irina Udalova | Imperial College, London | UK | |
| Janet Thornton | European Bioinformatics Institute | UK |
2011. Sequence-based characterization of structural variation in the mouse genome. Nature, 477 (7364), pp. 326-329. Read abstract | Read more
Structural variation is widespread in mammalian genomes and is an important cause of disease, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 711,920 SVs at 281,243 sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 160,000 SVs were mapped to base pair resolution, allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One-third of the genes so affected have immunological functions. Hide abstract
2011. Collaborative Cross mice and their power to map host susceptibility to Aspergillus fumigatus infection Genome Research, 21 (8), pp. 1239-1248. Read abstract | Read more
The Collaborative Cross (CC) is a genetic reference panel of recombinant inbred lines of mice, designed for the dissection of complex traits and gene networks. Each line is independently descended from eight genetically diverse founder strains such that the genomes of the CC lines, once fully inbred, are fine-grained homozygous mosaics of the founder haplotypes. We present an analysis of 120 CC lines, from a cohort of the CC bred at Tel Aviv University in collaboration with the University of Oxford, which at the time of this study were between the sixth and 12th generations of inbreeding and substantially homozygous at 170,000 SNPs. We show how CC genomes decompose into mosaics, and we identify loci that carry a deficiency or excess of a founder, many being deficient for the wild-derived strains WSB/EiJ and PWK/PhJ. We phenotyped 371 mice from 66 CC lines for a susceptibility to Aspergillus fumigatus infection. The survival time after infection varied significantly between CC lines. Quantitative trait locus (QTL) mapping identified genome-wide significant QTLs on chromosomes 2, 3, 8, 10 (two QTLs), 15, and 18. Simulations show that QTL mapping resolution (the median distance between the QTL peak and true location) varied between 0.47 and 1.18 Mb. Most of the QTLs involved contrasts between wild-derived founder strains and therefore would not segregate between classical inbred strains. Use of variation data from the genomes of the CC founder strains refined these QTLs further and suggested several candidate genes. These results support the use of the CC for dissecting complex traits. © 2011 by Cold Spring Harbor Laboratory Press. Hide abstract
2011. Multiple reference genomes and transcriptomes for Arabidopsis thaliana Nature,
2010. Bayesian quantitative trait locus mapping using inferred haplotypes. Genetics, 184 (3), pp. 839-852. Read abstract | Read more
We describe a fast hierarchical Bayesian method for mapping quantitative trait loci by haplotype-based association, applicable when haplotypes are not observed directly but are inferred from multiple marker genotypes. The method avoids the use of a Monte Carlo Markov chain by employing priors for which the likelihood factorizes completely. It is parameterized by a single hyperparameter, the fraction of variance explained by the quantitative trait locus, compared to the frequentist fixed-effects model, which requires a parameter for the phenotypic effect of each combination of haplotypes; nevertheless it still provides estimates of haplotype effects. We use simulation to show that the method matches the power of the frequentist regression model and, when the haplotypes are inferred, exceeds it for small QTL effect sizes. The Bayesian estimates of the haplotype effects are more accurate than the frequentist estimates, for both known and inferred haplotypes, which indicates that this advantage is independent of the effect of uncertainty in haplotype inference and will hold in comparison with frequentist methods in general. We apply the method to data from a panel of recombinant inbred lines of Arabidopsis thaliana, descended from 19 inbred founders. Hide abstract
2010. Elusive copy number variation in the mouse genome. PLoS One, 5 (9), pp. e12839. Read abstract | Read more
Array comparative genomic hybridization (aCGH) to detect copy number variants (CNVs) in mammalian genomes has led to a growing awareness of the potential importance of this category of sequence variation as a cause of phenotypic variation. Yet there are large discrepancies between studies, so that the extent of the genome affected by CNVs is unknown. We combined molecular and aCGH analyses of CNVs in inbred mouse strains to investigate this question. Hide abstract
2009. Failed gene conversion leads to extensive end processing and chromosomal rearrangements in fission yeast. EMBO J, 28 (21), pp. 3400-3412. Read abstract | Read more
Loss of heterozygosity (LOH), a causal event in cancer and human genetic diseases, frequently encompasses multiple genetic loci and whole chromosome arms. However, the mechanisms by which such extensive LOH arises, and how it is suppressed in normal cells is poorly understood. We have developed a genetic system to investigate the mechanisms of DNA double-strand break (DSB)-induced extensive LOH, and its suppression, using a non-essential minichromosome, Ch(16), in fission yeast. We find extensive LOH to arise from a new break-induced mechanism of isochromosome formation. Our data support a model in which Rqh1 and Exo1-dependent end processing from an unrepaired DSB leads to removal of the broken chromosome arm and to break-induced replication of the intact arm from the centromere, a considerable distance from the initial lesion. This process also promotes genome-wide copy number variation. A genetic screen revealed Rhp51, Rhp55, Rhp57 and the MRN complex to suppress both isochromosome formation and chromosome loss, in accordance with these events resulting from extensive end processing associated with failed homologous recombination repair. Hide abstract
2009. Mapping in Structured Populations by Resample Model Averaging GENETICS, 182 (4), pp. 1263-1277. Read abstract | Read more
Highly recombinant populations derived from inbred lines, such as advanced intercross lines and heterogeneous stocks, can be used to map loci far more accurately than is possible with standard intercrosses. However, the varying degrees of relatedness that exist between individuals complicate analysis, potentially leading to many false positive signals. We describe a method to deal with these problems that does not require pedigree information and accounts for model uncertainty through model averaging. In our method, we select multiple quantitative trait loci (QTL) models using forward selection applied to resampled data sets obtained by nonparametric bootstrapping and subsampling. We provide model-averaged statistics about the probability of loci or of multilocus regions being included in model selection, and this leads to more accurate identification of QTL than by single-locus mapping. The generality of our approach means it can potentially be applied to any population of unknown structure. Copyright © 2009 by the Genetics Society of America. Hide abstract
2009. A Multiparent Advanced Generation Inter-Cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet, 5 (7), pp. e1000551. Read abstract | Read more
Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms. Hide abstract
2009. High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome Res, 19 (6), pp. 1133-1140. Read abstract | Read more
A proportion of the genetic variants underlying complex phenotypes do so through their effects on gene expression, so an important challenge in complex trait analysis is to discover the genetic basis for the variation in transcript abundance. So far, the potential of mapping both quantitative trait loci (QTLs) and expression quantitative trait loci (eQTLs) in rodents has been limited by the low mapping resolution inherent in crosses between inbred strains. We provide a megabase resolution map of thousands of eQTLs in hippocampus, lung, and liver samples from heterogeneous stock (HS) mice in which 843 QTLs have also been mapped at megabase resolution. We exploit dense mouse SNP data to show that artifacts due to allele-specific hybridization occur in approximately 30% of the cis-acting eQTLs and, by comparison with exon expression data, we show that alternative splicing of the 3' end of the genes accounts for <1% of cis-acting eQTLs. Approximately one third of cis-acting eQTLs and one half of trans-acting eQTLs are tissue specific. We have created an important systems biology resource for the genetic analysis of complex traits in a key model organism. Hide abstract
2008. SNP and haplotype mapping for genetic analysis in the rat. Nat Genet, 40 (5), pp. 560-566. Read abstract | Read more
The laboratory rat is one of the most extensively studied model organisms. Inbred laboratory rat strains originated from limited Rattus norvegicus founder populations, and the inherited genetic variation provides an excellent resource for the correlation of genotype to phenotype. Here, we report a survey of genetic variation based on almost 3 million newly identified SNPs. We obtained accurate and complete genotypes for a subset of 20,238 SNPs across 167 distinct inbred rat strains, two rat recombinant inbred panels and an F2 intercross. Using 81% of these SNPs, we constructed high-density genetic maps, creating a large dataset of fully characterized SNPs for disease gene mapping. Our data characterize the population structure and illustrate the degree of linkage disequilibrium. We provide a detailed SNP map and demonstrate its utility for mapping of quantitative trait loci. This community resource is openly available and augments the genetic tools for this workhorse of physiological studies. Hide abstract
2007. Management, presentation and interpretation of genome scans using GSCANDB. Bioinformatics, 23 (12), pp. 1545-1549. Read abstract | Read more
Advances in high-throughput genotyping have made it possible to carry out genome-wide association studies using very high densities of genetic markers. This has led to the problem of the storage, management, quality control, presentation and interpretation of results. In order to achieve a successful outcome, it may be necessary to analyse the data in different ways and compare the results with genome annotations and other genome scans. Hide abstract
2006. A high-resolution single nucleotide polymorphism genetic map of the mouse genome. PLoS Biol, 4 (12), pp. e395. Read abstract | Read more
High-resolution genetic maps are required for mapping complex traits and for the study of recombination. We report the highest density genetic map yet created for any organism, except humans. Using more than 10,000 single nucleotide polymorphisms evenly spaced across the mouse genome, we have constructed genetic maps for both outbred and inbred mice, and separately for males and females. Recombination rates are highly correlated in outbred and inbred mice, but show relatively low correlation between males and females. Differences between male and female recombination maps and the sequence features associated with recombination are strikingly similar to those observed in humans. Genetic maps are available from http://gscan.well.ox.ac.uk/#genetic_map and as supporting information to this publication. Hide abstract
2006. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet, 38 (8), pp. 879-887. Read abstract | Read more
Difficulties in fine-mapping quantitative trait loci (QTLs) are a major impediment to progress in the molecular dissection of complex traits in mice. Here we show that genome-wide high-resolution mapping of multiple phenotypes can be achieved using a stock of genetically heterogeneous mice. We developed a conservative and robust bootstrap analysis to map 843 QTLs with an average 95% confidence interval of 2.8 Mb. The QTLs contribute to variation in 97 traits, including models of human disease (asthma, type 2 diabetes mellitus, obesity and anxiety) as well as immunological, biochemical and hematological phenotypes. The genetic architecture of almost all phenotypes was complex, with many loci each contributing a small proportion to the total variance. Our data set, freely available at http://gscan.well.ox.ac.uk, provides an entry point to the functional characterization of genes involved in many complex traits. Hide abstract
2005. Using progenitor strain information to identify quantitative trait nucleotides in outbred mice. Genetics, 171 (2), pp. 673-681. Read abstract | Read more
We have developed a fast and economical strategy for dissecting the genetic architecture of quantitative trait loci at a molecular level. The method uses two pieces of information: mapping data from crosses that involve more than two inbred strains and sequence variants in the progenitor strains within the interval containing a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that any sequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals except at a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis. We apply the method to a 4.8-Mb region on mouse chromosome 1 that contains a QTL influencing anxiety segregating in a heterogeneous stock and show that, under the assumption that a single QTN is present and lies in a region conserved between the human and mouse genomes, it is possible to reduce the number of variants likely to be the quantitative trait nucleotide from many thousands to <20. Hide abstract
2004. Genetic dissection of a behavioral quantitative trait locus shows that Rgs2 modulates anxiety in mice. Nat Genet, 36 (11), pp. 1197-1202. Read abstract | Read more
Here we present a strategy to determine the genetic basis of variance in complex phenotypes that arise from natural, as opposed to induced, genetic variation in mice. We show that a commercially available strain of outbred mice, MF1, can be treated as an ultrafine mosaic of standard inbred strains and accordingly used to dissect a known quantitative trait locus influencing anxiety. We also show that this locus can be subdivided into three regions, one of which contains Rgs2, which encodes a regulator of G protein signaling. We then use quantitative complementation to show that Rgs2 is a quantitative trait gene. This combined genetic and functional approach should be applicable to the analysis of any quantitative trait. Hide abstract
2004. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc Natl Acad Sci U S A, 101 (26), pp. 9734-9739. Read abstract | Read more
Investigation of sequence variation in common inbred mouse strains has revealed a segmented pattern in which regions of high and low variant density are intermixed. Furthermore, it has been suggested that allelic strain distribution patterns also occur in well defined blocks and consequently could be used to map quantitative trait loci (QTL) in comparisons between inbred strains. We report a detailed analysis of polymorphism distribution in multiple inbred mouse strains over a 4.8-megabase region containing a QTL influencing anxiety. Our analysis indicates that it is only partly true that the genomes of inbred strains exist as a patchwork of segments of sequence identity and difference. We show that the definition of haplotype blocks is not robust and that methods for QTL mapping may fail if they assume a simple block-like structure. Hide abstract
2004. Quantitative high-throughput analysis of transcription factor binding specificities. Nucleic Acids Res, 32 (4), pp. e44. Read abstract | Read more
We present a general high-throughput approach to accurately quantify DNA-protein interactions, which can facilitate the identification of functional genetic polymorphisms. The method tested here on two structurally distinct transcription factors (TFs), NF-kappaB and OCT-1, comprises three steps: (i) optimized selection of DNA variants to be tested experimentally, which we show is superior to selecting variants at random; (ii) a quantitative protein-DNA binding assay using microarray and surface plasmon resonance technologies; (iii) prediction of binding affinity for all DNA variants in the consensus space using a statistical model based on principal coordinates analysis. For the protein-DNA binding assay, we identified a polyacrylamide/ester glass activation chemistry which formed exclusive covalent bonds with 5'-amino-modified DNA duplexes and hindered non-specific electrostatic attachment of DNA. Full accessibility of the DNA duplexes attached to polyacrylamide-modified slides was confirmed by the high degree of data correlation with the electromobility shift assay (correlation coefficient 93%). This approach offers the potential for high-throughput determination of TF binding profiles and predicting the effects of single nucleotide polymorphisms on TF binding affinity. New DNA binding data for OCT-1 are presented. Hide abstract
1999. Approximate statistics of gapped alignments. J Comput Biol, 6 (1), pp. 91-112. Read abstract | Read more
A heuristic approximation to the score distribution of gapped alignments in the logarithmic domain is presented. The method applies to comparisons between random, unrelated protein sequences, using standard score matrices and arbitrary gap penalties. It is shown that gapped alignment behavior is essentially governed by a single parameter, alpha, depending on the penalty scheme and sequence composition. This treatment also predicts the position of the transition point between logarithmic and linear behavior. The approximation is tested by simulation and shown to be accurate over a range of commonly used substitution matrices and gap-penalties. Hide abstract
1997. EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci, 13 (4), pp. 477-478.
1997. Instability of highly expanded CAG repeats in mice transgenic for the Huntington's disease mutation. Nat Genet, 15 (2), pp. 197-200. Read abstract | Read more
Six inherited neurodegenerative diseases are caused by a CAG/polyglutamine expansion, including spinal and bulbar muscular atrophy (SBMA), Huntington's disease (HD), spinocerebellar ataxia type 1 (SCA1), dentatorubral pallidoluysian atrophy (DRPLA) Machado-Joseph disease (MJD or SCA3) and SCA2. Normal and expanded HD allele sizes of 6-39 and 35-121 repeats have been reported, and the allele distributions for the other diseases are comparable. Intergenerational instability has been described in all cases, and repeats tend to be more unstable on paternal transmission. This may present as larger increases on paternal inheritance as in HD, or as a tendency to increase on male and decrease on female transmission as in SCA1 (ref. 15). Somatic repeat instability is also apparent and appears most pronounced in the CNS. The major exception is the cerebellum, which in HD, DRPLA, SCA1 and MJD has a smaller repeat relative to the other brain regions tested. Of non-CNS tissues, instability was observed in blood, liver, kidney and colon. A mouse model of CAG repeat instability would be helpful in unravelling its molecular basis although an absence of CAG repeat instability in transgenic mice has so far been reported. These studies include (CAG) in the androgen receptor cDNA, (CAG) in the HD cDNA, (CAG) in the SCA1 cDNA, (CAG) in the SCA3 cDNA and as an isolated (CAG) tract. Hide abstract



