Mott group publications

Groves JO, Leslie I, Huang GJ, McHugh SB, Taylor A, Mott R, Munafò M, Bannerman DM, Flint J. 2013. Ablating adult neurogenesis in the rat has no effect on spatial processing: evidence from a novel pharmacogenetic model. PLoS Genet, 9 (9), Read abstract | Read more

The function of adult neurogenesis in the rodent brain remains unclear. Ablation of adult born neurons has yielded conflicting results about emotional and cognitive impairments. One hypothesis is that adult neurogenesis in the hippocampus enables spatial pattern separation, allowing animals to distinguish between similar stimuli. We investigated whether spatial pattern separation and other putative hippocampal functions of adult neurogenesis were altered in a novel genetic model of neurogenesis ablation in the rat. In rats engineered to express thymidine kinase (TK) from a promoter of the rat glial fibrillary acidic protein (GFAP), ganciclovir treatment reduced new neurons by 98%. GFAP-TK rats showed no significant difference from controls in spatial pattern separation on the radial maze, spatial learning in the water maze, contextual or cued fear conditioning. Meta-analysis of all published studies found no significant effects for ablation of adult neurogenesis on spatial memory, cue conditioning or ethological measures of anxiety. An effect on contextual freezing was significant at a threshold of 5% (P = 0.04), but not at a threshold corrected for multiple testing. The meta-analysis revealed remarkably high levels of heterogeneity among studies of hippocampal function. The source of this heterogeneity remains unclear and poses a challenge for studies of the function of adult neurogenesis. Hide abstract

Shusterman A, Salyma Y, Nashef A, Soller M, Wilensky A, Mott R, Weiss EI, Houri-Haddad Y, Iraqi FA. 2013. Genotype is an important determinant factor of host susceptibility to periodontitis in the Collaborative Cross and inbred mouse populations BMC GENETICS, 14 (1), Read abstract | Read more

Background: Periodontal infection (Periodontitis) is a chronic inflammatory disease, which results in the breakdown of the supporting tissues of the teeth. Previous epidemiological studies have suggested that resistance to chronic periodontitis is controlled to some extent by genetic factors of the host. The aim of this study was to determine the phenotypic response of inbred and Collaborative Cross (CC) mouse populations to periodontal bacterial challenge, using an experimental periodontitis model. In this model, mice are co-infected with Porphyromonas gingivalis and Fusobacterium nucleatum, bacterial strains associated with human periodontal disease. Six weeks following the infection, the maxillary jaws were harvested and analyzed for alveolar bone loss relative to uninfected controls, using computerized microtomography (microCT). Initially, four commercial inbred mouse strains were examined to calibrate the procedure and test for gender effects. Subsequently, we applied the same protocol to 23 lines (at inbreeding generations 10-18) from the newly developed mouse genetic reference population, the Collaborative Cross (CC) to determine heritability and genetic variation of control bone volume prior to infection (CBV, naïve bone volume around the teeth of uninfected mice), and residual bone volume (RBV, bone volume after infection) and loss of bone volume (LBV, the difference between CBV and RBV) following infection.Results: BALB/CJ mice were highly susceptible (P<0.05) whereas DBA/2J, C57BL/6J and A/J mice were resistant. Six lines of the tested CC population were susceptible, whereas the remaining lines were resistant to alveolar bone loss. Gender effects on bone volume were tested across the four inbred and 23 CC lines, and found not to be significant. Based on ANOVA analyses, broad-sense heritabilities were statistically significant and equal to 0.4 for CBV and 0.2 for LBV.Conclusions: The moderate heritability values indicate that the variation in host susceptibility to the disease is controlled to an appreciable extent by genetic factors. These results strongly support the possibility of using the Collaborative Cross, as well as developing dedicated F2 (resistant x susceptible inbred strains) resource populations, for future dissection of genetic factors in periodontitis. © 2013 Shusterman et al.; licensee BioMed Central Ltd. Hide abstract

Mott R, Flint J. 2013. Dissecting Quantitative Traits in Mice. Annu Rev Genomics Hum Genet, 14 (1), Read abstract | Read more

Progress in complex trait mapping in mice has been accelerated by the development of new populations suited to high-resolution mapping and by statistical methodologies that control for population structure. When combined with newly acquired catalogs of sequence variation in inbred strains, the genetic architecture of these new populations makes it possible to dissect complex traits down to the level of single variants. These analyses have shown not only that complex traits are caused by multiple contributing loci but also that each locus is likely due to the combined effects of multiple causal DNA variants. In combination with new rapid methods for producing transgenic mice that make it efficient to test candidate genes and variants, these advances significantly enhance the mouse genetics toolbox for dissecting quantitative traits. Expected final online publication date for the Annual Review of Genomics and Human Genetics Volume 14 is August 31, 2013. Please see http://www.annualreviews.org/catalog/pubdates.aspx for revised estimates. Hide abstract

Baud A, Calderari S, Mott R, Flint J, Gauguier D. 2013. [Genome sequencing and genetic mapping to dissect the genetic basis of complex traits]. Med Sci (Paris), 29 (6-7), | Read more

Hosseini M, Goodstadt L, Hughes JR, Kowalczyk MS, de Gobbi M, Otto GW, Copley RR, Mott R, Higgs DR, Flint J. 2013. Causes and Consequences of Chromatin Variation between Inbred Mice. PLoS Genet, 9 (6), Read abstract | Read more

Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits. Hide abstract

Rat Genome Sequencing and Mapping Consortium, Baud A, Hermsen R, Guryev V, Stridh P, Graham D, McBride MW, Foroud T et al. 2013. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats. Nat Genet, 45 (7), Read abstract | Read more

Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating new genes in models of anxiety, heart disease and multiple sclerosis. The relationship between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci, a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show that the extent and spatial pattern of variation in inbred rats differ substantially from those of inbred mice and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. Hide abstract

Shusterman A, Durrant C, Mott R, Polak D, Schaefer A, Weiss EI, Iraqi FA, Houri-Haddad Y. 2013. Host Susceptibility to Periodontitis: Mapping Murine Genomic Regions JOURNAL OF DENTAL RESEARCH, 92 (5), Read abstract | Read more

Host susceptibility to periodontal infection is controlled by genetic factors. As a step toward identifying and cloning these factors, we generated an A/J × BALB/cJ F2 mouse resource population. A genome-wide search for Quantitative Trait Loci (QTL) associated with periodontitis was performed. We aimed to quantify the phenotypic response of the progenies to periodontitis by microCT analysis, to perform a genome-wide search for QTL associated with periodontitis, and, finally, to suggest candidate genes for periodontitis. We were able to produce 408 F2 mice. All mice were co-infected with Porphyromonas gingivalis and Fusobacterium nucleatum bacteria. Six weeks following infection, alveolar bone loss was quantified by computerized tomography (microCT) technology. We found normal distribution of the phenotype, with 2 highly significant QTL on chromosomes 5 and 3. A third significant QTL was found on chromosome 1. Candidate genes were suggested, such as Toll-like receptors (TLR) 1 and 6, chemokines, and bone-remodeling genes (enamelin, ameloblastin, and amelotin). This report shows that periodontitis in mice is a polygenic trait with highly significant mapped QTL. © 2013 International & American Associations for Dental Research. Hide abstract

Karp NA, Melvin D, Mott RF, Project SMG. 2012. Robust and Sensitive Analysis of Mouse Knockout Phenotypes PLOS ONE, 7 (12), | Read more

Jiang C, Belfield EJ, Mithani A, Visscher A, Ragoussis J, Mott R, Smith JA, Harberd NP. 2012. ROS-mediated vascular homeostatic control of root-to-shoot soil Na delivery in Arabidopsis. EMBO J, 31 (22), Read abstract | Read more

Sodium (Na) is ubiquitous in soils, and is transported to plant shoots via transpiration through xylem elements in the vascular tissue. However, excess Na is damaging. Accordingly, control of xylem-sap Na concentration is important for maintenance of shoot Na homeostasis, especially under Na stress conditions. Here we report that shoot Na homeostasis of Arabidopsis thaliana plants grown in saline soils is conferred by reactive oxygen species (ROS) regulation of xylem-sap Na concentrations. We show that lack of A. thaliana respiratory burst oxidase protein F (AtrbohF; an NADPH oxidase catalysing ROS production) causes hypersensitivity of shoots to soil salinity. Lack of AtrbohF-dependent salinity-induced vascular ROS accumulation leads to increased Na concentrations in root vasculature cells and in xylem sap, thus causing delivery of damaging amounts of Na to the shoot. We also show that the excess shoot Na delivery caused by lack of AtrbohF is dependent upon transpiration. We conclude that AtrbohF increases ROS levels in wild-type root vasculature in response to raised soil salinity, thereby limiting Na concentrations in xylem sap, and in turn protecting shoot cells from transpiration-dependent delivery of excess Na. Hide abstract

Goodson M, Rust MB, Witke W, Bannerman D, Mott R, Ponting CP, Flint J. 2012. Cofilin-1: a modulator of anxiety in mice. PLoS Genet, 8 (10), Read abstract | Read more

The genes involved in conferring susceptibility to anxiety remain obscure. We developed a new method to identify genes at quantitative trait loci (QTLs) in a population of heterogeneous stock mice descended from known progenitor strains. QTLs were partitioned into intervals that can be summarized by a single phylogenetic tree among progenitors and intervals tested for consistency with alleles influencing anxiety at each QTL. By searching for common Gene Ontology functions in candidate genes positioned within those intervals, we identified actin depolymerizing factors (ADFs), including cofilin-1 (Cfl1), as genes involved in regulating anxiety in mice. There was no enrichment for function in the totality of genes under each QTL, indicating the importance of phylogenetic filtering. We confirmed experimentally that forebrain-specific inactivation of Cfl1 decreased anxiety in knockout mice. Our results indicate that similarity of function of mammalian genes can be used to recognize key genetic regulators of anxiety and potentially of other emotional behaviours. Hide abstract

Belfield EJ, Gan X, Mithani A, Brown C, Jiang C, Franklin K, Alvey E, Wibowo A et al. 2012. Genome-wide analysis of mutations in mutant lineages selected following fast-neutron irradiation mutagenesis of Arabidopsis thaliana. Genome Res, 22 (7), Read abstract | Read more

Ionizing radiation has long been known to induce heritable mutagenic change in DNA sequence. However, the genome-wide effect of radiation is not well understood. Here we report the molecular properties and frequency of mutations in phenotypically selected mutant lines isolated following exposure of the genetic model flowering plant Arabidopsis thaliana to fast neutrons (FNs). Previous studies suggested that FNs predominantly induce deletions longer than a kilobase in A. thaliana. However, we found a higher frequency of single base substitution than deletion mutations. While the overall frequency and molecular spectrum of fast-neutron (FN)-induced single base substitutions differed substantially from those of "background" mutations arising spontaneously in laboratory-grown plants, G:C>A:T transitions were favored in both. We found that FN-induced G:C>A:T transitions were concentrated at pyrimidine dinucleotide sites, suggesting that FNs promote the formation of mutational covalent linkages between adjacent pyrimidine residues. In addition, we found that FNs induced more single base than large deletions, and that these single base deletions were possibly caused by replication slippage. Our observations provide an initial picture of the genome-wide molecular profile of mutations induced in A. thaliana by FN irradiation and are particularly informative of the nature and extent of genome-wide mutation in lines selected on the basis of mutant phenotypes from FN-mutagenized A. thaliana populations. Hide abstract

Diawara M, Gan X, Cazier JB, Mott R, Calderari S, Gauguier D. 2012. Regulation of microARN expression deteremined by systematic sequencing in obesity induced by a diet rich in fat DIABETES & METABOLISM, 38

Welsh CE, Miller DR, Manly KF, Wang J, McMillan L, Morahan G, Mott R, Iraqi FA, Threadgill DW, de Villena FP-M. 2012. Status and access to the Collaborative Cross population MAMMALIAN GENOME, 23 (9-10), Read abstract | Read more

The Collaborative Cross (CC) is a panel of recombinant inbred lines derived from eight genetically diverse laboratory inbred strains. Recently, the genetic architecture of the CC population was reported based on the genotype of a single male per line, and other publications reported incompletely inbred CC mice that have been used to map a variety of traits. The three breeding sites, in the US, Israel, and Australia, are actively collaborating to accelerate the inbreeding process through marker-assisted inbreeding and to expedite community access of CC lines deemed to have reached defined thresholds of inbreeding. Plans are now being developed to provide access to this novel genetic reference population through distribution centers. Here we provide a description of the distribution efforts by the University of North Carolina Systems Genetics Core, Tel Aviv University, Israel and the University of Western Australia. © The Author(s) 2012. Hide abstract

Kover PX, Mott R. 2012. Mapping the genetic basis of ecologically and evolutionarily relevant traits in Arabidopsis thaliana CURRENT OPINION IN PLANT BIOLOGY, 15 (2), | Read more

Welsh CE, Miller DR, Manly KF, Wang J, McMillan L, Morahan G, Mott R, Iraqi FA, Threadgill DW, de Villena FP-M. 2012. Status and access to the Collaborative Cross population Mammalian Genome,

Iraqi FA, Mahajne M, Salaymah Y, Sandovski H, Tayem H, Vered K, Balmer L, Hall M et al. 2012. The genome architecture of the collaborative cross mouse genetic reference population Genetics, 190 (2), Read abstract | Read more

The Collaborative Cross Consortium reports here on the development of a unique genetic resource population. The Collaborative Cross (CC) is a multi parental recombinant inbred panel derived from eight laboratory mouse inbred strains. Breeding of the CC lines was initiated at multiple international sites using mice from The Jackson Laboratory. Currently, this innovative project is breeding independent CC lines at the University of North Carolina (UNC), at Tel Aviv University (TAU), and at Geniad in Western Australia (GND). These institutions aim to make publicly available the completed CC lines and their genotypes and sequence information. We genotyped, and report here, results from 458 extant lines from UNC, TAU, and GND using a custom genotyping array with 7500 SNPs designed to be maximally informative in the CC and used a novel algorithm to infer inherited haplotypes directly from hybridization intensity patterns. We identified lines with breeding errors and cousin lines generated by splitting incipient lines into two or more cousin lines at early generations of inbreeding. We then characterized the genome architecture of 350 genetically independent CC lines. Results showed that founder haplotypes are inherited at the expected frequency, although we also consistently observed highly significant transmission ratio distortion at specific loci across all three populations. On chromosome 2, there is significant overrepresentation of WSB/EiJ alleles, and on chromosome X, there is a large deficit of CC lines with CAST/EiJ alleles. Linkage disequilibrium decays as expected and we saw no evidence of gametic disequilibrium in the CC population as a whole or in random subsets of the population. Gametic equilibrium in the CC population is in marked contrast to the gametic disequilibrium present in a large panel of classical inbred strains. Finally, we discuss access to the CC population and to the associated raw data describing the genetic structure of individual lines. Integration of rich phenotypic and genomic data over time and across a wide variety of fields will be vital to delivering on one of the key attributes of the CC, a common genetic reference platform for identifying causative variants and genetic networks determining traits in mammals. © 2012 by the Genetics Society of America. Hide abstract

Durrant C, Swertz MA, Alberts R, Arends D, Möller S, Mott R, Prins P, van der Velde KJ, Jansen RC, Schughart K. 2012. Bioinformatics tools and database resources for systems genetics analysis in mice--a short review and an evaluation of future needs. Brief Bioinform, 13 (2), Read abstract | Read more

During a meeting of the SYSGENET working group 'Bioinformatics', currently available software tools and databases for systems genetics in mice were reviewed and the needs for future developments discussed. The group evaluated interoperability and performed initial feasibility studies. To aid future compatibility of software and exchange of already developed software modules, a strong recommendation was made by the group to integrate HAPPY and R/qtl analysis toolboxes, GeneNetwork and XGAP database platforms, and TIQS and xQTL processing platforms. R should be used as the principal computer language for QTL data analysis in all platforms and a 'cloud' should be used for software dissemination to the community. Furthermore, the working group recommended that all data models and software source code should be made visible in public repositories to allow a coordinated effort on the use of common data structures and file formats. Hide abstract

Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A et al. 2011. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature, 477 (7364), Read abstract | Read more

We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism. Hide abstract

Yalcin B, Wong K, Agam A, Goodson M, Keane TM, Gan X, Nellåker C, Goodstadt L et al. 2011. Sequence-based characterization of structural variation in the mouse genome. Nature, 477 (7364), Read abstract | Read more

Structural variation is widespread in mammalian genomes and is an important cause of disease, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 711,920 SVs at 281,243 sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 160,000 SVs were mapped to base pair resolution, allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One-third of the genes so affected have immunological functions. Hide abstract

Aylor DL, Valdar W, Foulds-Mathes W, Buus RJ, Verdugo RA, Baric RS, Ferris MT, Frelinger JA et al. 2011. Genetic analysis of complex traits in the emerging Collaborative Cross GENOME RESEARCH, 21 (8), | Read more

Salome PA, Bomblies K, Laitinen RAE, Yant L, Mott R, Weigel D. 2011. Genetic Architecture of Flowering-Time Variation in Arabidopsis thaliana GENETICS, 188 (2), Read abstract | Read more

The onset of flowering is an important adaptive trait in plants. The small ephemeral species Arabidopsis thaliana grows under a wide range of temperature and day-length conditions across much of the Northern hemisphere, and a number of flowering-time loci that vary between different accessions have been identified before. However, only few studies have addressed the species-wide genetic architecture of floweringtime control. We have taken advantage of a set of 18 distinct accessions that present much of the common genetic diversity of A. thaliana and mapped quantitative trait loci (QTL) for flowering time in 17 F populations derived from these parents. We found that the majority of flowering-time QTL cluster in as few as five genomic regions, which include the locations of the entire FLC/MAF clade of transcription factor genes. By comparing effects across shared parents, we conclude that in several cases there might be an allelic series caused by rare alleles. While this finding parallels results obtained for maize, in contrast to maize much of the variation in flowering time in A. thaliana appears to be due to large-effect alleles. © 2011 by the Genetics Society of America. Hide abstract

Jiang C, Mithani A, Gan X, Belfield EJ, Harberd NP, Klingler JP, Zhu J-K, Ragoussis J, Mott R. 2011. Regenerant arabidopsis lineages display a distinct genome-wide spectrum of mutations conferring variant phenotypes Current Biology, 21 (16), Read abstract | Read more

Multicellular organisms can be regenerated from totipotent differentiated somatic cell or nuclear founders [1-3]. Organisms regenerated from clonally related isogenic founders might a priori have been expected to be phenotypically invariant. However, clonal regenerant animals display variant phenotypes caused by defective epigenetic reprogramming of gene expression [2], and clonal regenerant plants exhibit poorly understood heritable phenotypic ("somaclonal") variation [4-7]. Here we show that somaclonal variation in regenerant Arabidopsis lineages is associated with genome-wide elevation in DNA sequence mutation rate. We also show that regenerant mutations comprise a distinctive molecular spectrum of base substitutions, insertions, and deletions that probably results from decreased DNA repair fidelity. Finally, we show that while regenerant base substitutions are a likely major genetic cause of the somaclonal variation of regenerant Arabidopsis lineages, transposon movement is unlikely to contribute substantially to that variation. We conclude that the phenotypic variation of regenerant plants, unlike that of regenerant animals, is substantially due to DNA sequence mutation. © 2011 Elsevier Ltd. All rights reserved. Hide abstract

Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ et al. 2011. Multiple reference genomes and transcriptomes for Arabidopsis thaliana Nature,

Durrant C, Yalcin B, Cleak J, Goodstadt L, Mott R, Tayem H, Iraqi FA, Pardo-Manuel De Villena F. 2011. Collaborative Cross mice and their power to map host susceptibility to Aspergillus fumigatus infection Genome Research, 21 (8), Read abstract | Read more

The Collaborative Cross (CC) is a genetic reference panel of recombinant inbred lines of mice, designed for the dissection of complex traits and gene networks. Each line is independently descended from eight genetically diverse founder strains such that the genomes of the CC lines, once fully inbred, are fine-grained homozygous mosaics of the founder haplotypes. We present an analysis of 120 CC lines, from a cohort of the CC bred at Tel Aviv University in collaboration with the University of Oxford, which at the time of this study were between the sixth and 12th generations of inbreeding and substantially homozygous at 170,000 SNPs. We show how CC genomes decompose into mosaics, and we identify loci that carry a deficiency or excess of a founder, many being deficient for the wild-derived strains WSB/EiJ and PWK/PhJ. We phenotyped 371 mice from 66 CC lines for a susceptibility to Aspergillus fumigatus infection. The survival time after infection varied significantly between CC lines. Quantitative trait locus (QTL) mapping identified genome-wide significant QTLs on chromosomes 2, 3, 8, 10 (two QTLs), 15, and 18. Simulations show that QTL mapping resolution (the median distance between the QTL peak and true location) varied between 0.47 and 1.18 Mb. Most of the QTLs involved contrasts between wild-derived founder strains and therefore would not segregate between classical inbred strains. Use of variation data from the genomes of the CC founder strains refined these QTLs further and suggested several candidate genes. These results support the use of the CC for dissecting complex traits. © 2011 by Cold Spring Harbor Laboratory Press. Hide abstract

Durrant C, Mott R. 2010. Bayesian quantitative trait locus mapping using inferred haplotypes. Genetics, 184 (3), Read abstract | Read more

We describe a fast hierarchical Bayesian method for mapping quantitative trait loci by haplotype-based association, applicable when haplotypes are not observed directly but are inferred from multiple marker genotypes. The method avoids the use of a Monte Carlo Markov chain by employing priors for which the likelihood factorizes completely. It is parameterized by a single hyperparameter, the fraction of variance explained by the quantitative trait locus, compared to the frequentist fixed-effects model, which requires a parameter for the phenotypic effect of each combination of haplotypes; nevertheless it still provides estimates of haplotype effects. We use simulation to show that the method matches the power of the frequentist regression model and, when the haplotypes are inferred, exceeds it for small QTL effect sizes. The Bayesian estimates of the haplotype effects are more accurate than the frequentist estimates, for both known and inferred haplotypes, which indicates that this advantage is independent of the effect of uncertainty in haplotype inference and will hold in comparison with frequentist methods in general. We apply the method to data from a panel of recombinant inbred lines of Arabidopsis thaliana, descended from 19 inbred founders. Hide abstract

Agam A, Yalcin B, Bhomra A, Cubin M, Webber C, Holmes C, Flint J, Mott R. 2010. Elusive copy number variation in the mouse genome. PLoS One, 5 (9), Read abstract | Read more

Array comparative genomic hybridization (aCGH) to detect copy number variants (CNVs) in mammalian genomes has led to a growing awareness of the potential importance of this category of sequence variation as a cause of phenotypic variation. Yet there are large discrepancies between studies, so that the extent of the genome affected by CNVs is unknown. We combined molecular and aCGH analyses of CNVs in inbred mouse strains to investigate this question. Hide abstract

Yalcin B, Nicod J, Bhomra A, Davidson S, Cleak J, Farinelli L, Østerås M, Whitley A et al. 2010. Commercially available outbred mice for genome-wide association studies. PLoS Genet, 6 (9), Read abstract | Read more

Genome-wide association studies using commercially available outbred mice can detect genes involved in phenotypes of biomedical interest. Useful populations need high-frequency alleles to ensure high power to detect quantitative trait loci (QTLs), low linkage disequilibrium between markers to obtain accurate mapping resolution, and an absence of population structure to prevent false positive associations. We surveyed 66 colonies for inbreeding, genetic diversity, and linkage disequilibrium, and we demonstrate that some have haplotype blocks of less than 100 Kb, enabling gene-level mapping resolution. The same alleles contribute to variation in different colonies, so that when mapping progress stalls in one, another can be used in its stead. Colonies are genetically diverse: 45% of the total genetic variation is attributable to differences between colonies. However, quantitative differences in allele frequencies, rather than the existence of private alleles, are responsible for these population differences. The colonies derive from a limited pool of ancestral haplotypes resembling those found in inbred strains: over 95% of sequence variants segregating in outbred populations are found in inbred strains. Consequently it is possible to impute the sequence of any mouse from a dense SNP map combined with inbred strain sequence data, which opens up the possibility of cataloguing and testing all variants for association, a situation that has so far eluded studies in completely outbred populations. We demonstrate the colonies' potential by identifying a deletion in the promoter of H2-Ea as the molecular change that strongly contributes to setting the ratio of CD4+ and CD8+ lymphocytes. Hide abstract

Huang GJ, Smith AL, Gray DH, Cosgrove C, Singer BH, Edwards A, Sims S, Parent JM et al. 2010. A genetic and functional relationship between T cells and cellular proliferation in the adult hippocampus. PLoS Biol, 8 (12), Read abstract | Read more

Neurogenesis continues through the adult life of mice in the subgranular zone of the dentate gyrus in the hippocampus, but its function remains unclear. Measuring cellular proliferation in the hippocampus of 719 outbred heterogeneous stock mice revealed a highly significant correlation with the proportions of CD8+ versus CD4+ T lymphocyte subsets. This correlation reflected shared genetic loci, with the exception of the H-2Ea locus that had a dominant influence on T cell subsets but no impact on neurogenesis. Analysis of knockouts and repopulation of TCRα-deficient mice by subsets of T cells confirmed the influence of T cells on adult neurogenesis, indicating that CD4+ T cells or subpopulations thereof mediate the effect. Our results reveal an organismal impact, broader than hitherto suspected, of the natural genetic variation that controls T cell development and homeostasis. Hide abstract

Tinline-Purvis H, Savory AP, Cullen JK, Davé A, Moss J, Bridge WL, Marguerat S, Bähler J et al. 2009. Failed gene conversion leads to extensive end processing and chromosomal rearrangements in fission yeast. EMBO J, 28 (21), Read abstract | Read more

Loss of heterozygosity (LOH), a causal event in cancer and human genetic diseases, frequently encompasses multiple genetic loci and whole chromosome arms. However, the mechanisms by which such extensive LOH arises, and how it is suppressed in normal cells is poorly understood. We have developed a genetic system to investigate the mechanisms of DNA double-strand break (DSB)-induced extensive LOH, and its suppression, using a non-essential minichromosome, Ch(16), in fission yeast. We find extensive LOH to arise from a new break-induced mechanism of isochromosome formation. Our data support a model in which Rqh1 and Exo1-dependent end processing from an unrepaired DSB leads to removal of the broken chromosome arm and to break-induced replication of the intact arm from the centromere, a considerable distance from the initial lesion. This process also promotes genome-wide copy number variation. A genetic screen revealed Rhp51, Rhp55, Rhp57 and the MRN complex to suppress both isochromosome formation and chromosome loss, in accordance with these events resulting from extensive end processing associated with failed homologous recombination repair. Hide abstract

Valdar W, Holmes CC, Mott R, Flint J. 2009. Mapping in Structured Populations by Resample Model Averaging GENETICS, 182 (4), Read abstract | Read more

Highly recombinant populations derived from inbred lines, such as advanced intercross lines and heterogeneous stocks, can be used to map loci far more accurately than is possible with standard intercrosses. However, the varying degrees of relatedness that exist between individuals complicate analysis, potentially leading to many false positive signals. We describe a method to deal with these problems that does not require pedigree information and accounts for model uncertainty through model averaging. In our method, we select multiple quantitative trait loci (QTL) models using forward selection applied to resampled data sets obtained by nonparametric bootstrapping and subsampling. We provide model-averaged statistics about the probability of loci or of multilocus regions being included in model selection, and this leads to more accurate identification of QTL than by single-locus mapping. The generality of our approach means it can potentially be applied to any population of unknown structure. Copyright © 2009 by the Genetics Society of America. Hide abstract

Kover PX, Valdar W, Trakalo J, Scarcelli N, Ehrenreich IM, Purugganan MD, Durrant C, Mott R. 2009. A Multiparent Advanced Generation Inter-Cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet, 5 (7), Read abstract | Read more

Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms. Hide abstract

Huang GJ, Shifman S, Valdar W, Johannesson M, Yalcin B, Taylor MS, Taylor JM, Mott R, Flint J. 2009. High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome Res, 19 (6), Read abstract | Read more

A proportion of the genetic variants underlying complex phenotypes do so through their effects on gene expression, so an important challenge in complex trait analysis is to discover the genetic basis for the variation in transcript abundance. So far, the potential of mapping both quantitative trait loci (QTLs) and expression quantitative trait loci (eQTLs) in rodents has been limited by the low mapping resolution inherent in crosses between inbred strains. We provide a megabase resolution map of thousands of eQTLs in hippocampus, lung, and liver samples from heterogeneous stock (HS) mice in which 843 QTLs have also been mapped at megabase resolution. We exploit dense mouse SNP data to show that artifacts due to allele-specific hybridization occur in approximately 30% of the cis-acting eQTLs and, by comparison with exon expression data, we show that alternative splicing of the 3' end of the genes accounts for <1% of cis-acting eQTLs. Approximately one third of cis-acting eQTLs and one half of trans-acting eQTLs are tissue specific. We have created an important systems biology resource for the genetic analysis of complex traits in a key model organism. Hide abstract

Weigel D, Mott R. 2009. The 1001 genomes project for Arabidopsis thaliana. Genome Biol, 10 (5), Read abstract | Read more

We advocate here a 1001 Genomes project for Arabidopsis thaliana, the workhorse of plant genetics, which will provide an enormous boost for plant research with a modest financial investment. Hide abstract

Lawrence R, Day-Williams AG, Mott R, Broxholme J, Cardon LR, Zeggini E. 2009. GLIDERS--a web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs. BMC Bioinformatics, 10 (1), Read abstract | Read more

A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine) that enables the retrieval of pairwise associations with r2 >or= 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers. Hide abstract

Johannesson M, Lopez-Aumatell R, Stridh P, Diez M, Tuncel J, Blázquez G, Martinez-Membrives E, Cañete T et al. 2009. A resource for the simultaneous high-resolution mapping of multiple quantitative trait loci in rats: the NIH heterogeneous stock. Genome Res, 19 (1), Read abstract | Read more

The laboratory rat (Rattus norvegicus) is a key tool for the study of medicine and pharmacology for human health. A large database of phenotypes for integrated fields such as cardiovascular, neuroscience, and exercise physiology exists in the literature. However, the molecular characterization of the genetic loci that give rise to variation in these traits has proven to be difficult. Here we show how one obstacle to progress, the fine-mapping of quantitative trait loci (QTL), can be overcome by using an outbred population of rats. By use of a genetically heterogeneous stock of rats, we map a locus contributing to variation in a fear-related measure (two-way active avoidance in the shuttle box) to a region on chromosome 5 containing nine genes. By establishing a protocol measuring multiple phenotypes including immunology, neuroinflammation, and hematology, as well as cardiovascular, metabolic, and behavioral traits, we establish the rat HS as a new resource for the fine-mapping of QTLs contributing to variation in complex traits of biomedical relevance. Hide abstract

Taylor JM, Street TL, Hao L, Copley R, Taylor MS, Hayden PJ, Stolper G, Mott R, Hein J, Moffatt MF, Cookson WO. 2009. Dynamic and physical clustering of gene expression during epidermal barrier formation in differentiating keratinocytes. PLoS One, 4 (10), Read abstract | Read more

The mammalian epidermis is a continually renewing structure that provides the interface between the organism and an innately hostile environment. The keratinocyte is its principal cell. Keratinocyte proteins form a physical epithelial barrier, protect against microbial damage, and prepare immune responses to danger. Epithelial immunity is disordered in many common diseases and disordered epithelial differentiation underlies many cancers. In order to identify the genes that mediate epithelial development we used a tissue model of the skin derived from primary human keratinocytes. We measured global gene expression in triplicate at five times over the ten days that the keratinocytes took to fully differentiate. We identified 1282 gene transcripts that significantly changed during differentiation (false discovery rate <0.01%). We robustly grouped these transcripts by K-means clustering into modules with distinct temporal expression patterns, shared regulatory motifs, and biological functions. We found a striking cluster of late expressed genes that form the structural and innate immune defences of the epithelial barrier. Gene Ontology analyses showed that undifferentiated keratinocytes were characterised by genes for motility and the adaptive immune response. We systematically identified calcium-binding genes, which may operate with the epidermal calcium gradient to control keratinocyte division during skin repair. The results provide multiple novel insights into keratinocyte biology, in particular providing a comprehensive list of known and previously unrecognised major components of the epidermal barrier. The findings provide a reference for subsequent understanding of how the barrier functions in health and disease. Hide abstract

Flint J, Mott R. 2008. Applying mouse complex-trait resources to behavioural genetics. Nature, 456 (7223), Read abstract | Read more

Studies of the genetic basis of behaviour in mice are at a turning point. Soon, new resources will enable the behavioural function of all genes to be tested and the networks of genes, messenger RNAs and proteins involved in a particular behaviour to be identified. Using these resources, scientists will be able to analyse mouse behaviour at an unprecedented level of detail. Interpreting the new data, however, will require a shift in focus from gene-based approaches to network-based approaches. Hide abstract

Iraqi FA, Churchill G, Mott R. 2008. The Collaborative Cross, developing a resource for mammalian systems genetics: a status report of the Wellcome Trust cohort. Mamm Genome, 19 (6), Read abstract | Read more

We report on the progress of a project funded by the Wellcome Trust to produce over 100 recombinant inbred mouse lines as part of the Collaborative Cross (CC) genetic reference panel. These new strains of mice are being derived from a set of eight genetically diverse founders. The genomes of the finished strains will be mosaics of the founder strains' genomes with a high density of independent recombination breakpoints. The CC mice will be available for distribution free of any intellectual property constraints to serve as a community resource for systems genetics studies. Hide abstract

van der Spoel AC, Mott R, Platt FM. 2008. Differential sensitivity of mouse strains to an N-alkylated imino sugar: glycosphingolipid metabolism and acrosome formation. Pharmacogenomics, 9 (6), Read abstract | Read more

This review deals with the pharmacological properties of an alkylated monosaccharide mimetic, N-butyldeoxynojirimycin (NB-DNJ). This compound is of pharmacogenetic interest because one of its biological effects in mice - impairment of spermatogenesis, leading to male infertility - depends greatly on the genetic background of the animal. In susceptible mice, administration of NB-DNJ perturbs the formation of an organelle, the acrosome, in early post-meiotic male germ cells. In all recipient mice, irrespective of reproductive phenotype, NB-DNJ has a similar biochemical effect: inhibition of the glucosylceramidase beta-glucosidase 2 and subsequent elevation of glucosylceramide, a glycosphingolipid. The questions that we now need to address are: how can glucosylceramide specifically affect early acrosome formation, and why is this contingent on genetic factors? Here we discuss relevant aspects of reproductive biology, the metabolism and cell biology of sphingolipids, and complex trait analysis; we also present a speculative model that takes our observations into account. Hide abstract

STAR Consortium, Saar K, Beck A, Bihoreau MT, Birney E, Brocklebank D, Chen Y, Cuppen E et al. 2008. SNP and haplotype mapping for genetic analysis in the rat. Nat Genet, 40 (5), Read abstract | Read more

The laboratory rat is one of the most extensively studied model organisms. Inbred laboratory rat strains originated from limited Rattus norvegicus founder populations, and the inherited genetic variation provides an excellent resource for the correlation of genotype to phenotype. Here, we report a survey of genetic variation based on almost 3 million newly identified SNPs. We obtained accurate and complete genotypes for a subset of 20,238 SNPs across 167 distinct inbred rat strains, two rat recombinant inbred panels and an F2 intercross. Using 81% of these SNPs, we constructed high-density genetic maps, creating a large dataset of fully characterized SNPs for disease gene mapping. Our data characterize the population structure and illustrate the degree of linkage disequilibrium. We provide a detailed SNP map and demonstrate its utility for mapping of quantitative trait loci. This community resource is openly available and augments the genetic tools for this workhorse of physiological studies. Hide abstract

Fullerton JM, Willis-Owen SA, Yalcin B, Shifman S, Copley RR, Miller SR, Bhomra A, Davidson S, Oliver PL, Mott R, Flint J. 2008. Human-mouse quantitative trait locus concordance and the dissection of a human neuroticism locus. Biol Psychiatry, 63 (9), Read abstract | Read more

Exploiting synteny between mouse and human disease loci has been proposed as a cost-effective method for the identification of human susceptibility genes. Here we explore its utility in an analysis of a human personality trait, neuroticism, which can be modeled in mice by tests of emotionality. We investigated a mouse emotionality locus on chromosome 1 that contains no annotated genes but abuts four regulators of G protein signaling, one of which (rgs2) has been previously identified as a quantitative trait gene for emotionality. This locus is syntenic with a human region that has been consistently implicated in the genetic aetiology of neuroticism. Hide abstract

Mott R, Flint J. 2008. Prospects for complex trait analysis in the mouse. Mamm Genome, 19 (5), | Read more

Flint J, Shifman S, Munafo M, Mott R. 2008. Genetic variants in major depression. Novartis Found Symp, 289 Read abstract

Major depression is one of the most common and most debilitating disorders in the world. A wealth of data indicate that additive genetic effects contribute to at least 30% of the variance in liability to major depression, yet attempts to identify the molecular basis of susceptibility using standard family based linkage and genetic association methodologies have had limited success. Alternative approaches have recently been advocated, such as the inclusion of gene by environment interactions and the use of endophenotypes. Our own data indicate that the genetic architecture of affective illness is more complex than expected. A whole genome association study of neuroticism, a personality trait that shares many of the same susceptibility loci as depression, reveals that the individual effect sizes are less than 1%. Larger sample sizes and more sophisticated analytical approaches will be needed than have hitherto been applied. Hide abstract

Mott R. 2007. A haplotype map for the laboratory mouse. Nat Genet, 39 (9), Read abstract | Read more

Two reports present detailed analyses of the haplotype structure of widely used laboratory mice based on resequencing data from 15 inbred strains. The studies provide the deepest view thus far of the patterns of genetic variation segregating in the inbred lines, and have implications for the design of complex trait mapping studies in mice. ©2007 Nature Publishing Group. Hide abstract

Taylor M, Valdar W, Kumar A, Flint J, Mott R. 2007. Management, presentation and interpretation of genome scans using GSCANDB. Bioinformatics, 23 (12), Read abstract | Read more

Advances in high-throughput genotyping have made it possible to carry out genome-wide association studies using very high densities of genetic markers. This has led to the problem of the storage, management, quality control, presentation and interpretation of results. In order to achieve a successful outcome, it may be necessary to analyse the data in different ways and compare the results with genome annotations and other genome scans. Hide abstract

Shifman S, Bell JT, Copley RR, Taylor MS, Williams RW, Mott R, Flint J. 2007. Evidence of a large-scale functional organization of mammalian chromosomes: Authors' reply PLOS BIOLOGY, 5 (5), | Read more

Wilson ND, Ross LJ, Close J, Mott R, Crow TJ, Volpi EV. 2007. Replication profile of PCDH11X and PCDH11Y, a gene pair located in the non-pseudoautosomal homologous region Xq21.3/Yp11.2. Chromosome Res, 15 (4), Read abstract | Read more

In order to investigate the replication timing properties of PCDH11X and PCDH11Y, a pair of protocadherin genes located in the hominid-specific non-pseudoautosomal homologous region Xq21.3/Yp11.2, we conducted a FISH-based comparative study in different human and non-human primate (Gorilla gorilla) cell types. The replication profiles of three genes from different regions of chromosome X (ZFX, XIST and ATRX) were used as terms of reference. Particular emphasis was given to the evaluation of allelic replication asynchrony in relation to the inactivation status of each gene. The human cell types analysed include neuronal cells and ICF syndrome cells, considered to be a model system for the study of X inactivation. PCDH11 appeared to be generally characterized by replication asynchrony in both male and female cells, and no significant differences were observed between human and gorilla, in which this gene lacks X-Y homologous status. However, in differentiated human neuroblastoma and cerebral cortical cells PCDH11X replication profile showed a significant shift towards allelic synchrony. Our data are relevant to the complex relationship between X-inactivation, as a chromosome-wide phenomenon, and asynchrony of replication and expression status of single genes on chromosome X. Hide abstract

Shifman S, Bell JT, Copley RR, Taylor MS, Williams RW, Mott R, Flint J. 2006. A high-resolution single nucleotide polymorphism genetic map of the mouse genome. PLoS Biol, 4 (12), Read abstract | Read more

High-resolution genetic maps are required for mapping complex traits and for the study of recombination. We report the highest density genetic map yet created for any organism, except humans. Using more than 10,000 single nucleotide polymorphisms evenly spaced across the mouse genome, we have constructed genetic maps for both outbred and inbred mice, and separately for males and females. Recombination rates are highly correlated in outbred and inbred mice, but show relatively low correlation between males and females. Differences between male and female recombination maps and the sequence features associated with recombination are strikingly similar to those observed in humans. Genetic maps are available from http://gscan.well.ox.ac.uk/#genetic_map and as supporting information to this publication. Hide abstract

Valdar W, Solberg LC, Gauguier D, Cookson WO, Rawlins JN, Mott R, Flint J. 2006. Genetic and environmental effects on complex traits in mice. Genetics, 174 (2), Read abstract | Read more

The interaction between genotype and environment is recognized as an important source of experimental variation when complex traits are measured in the mouse, but the magnitude of that interaction has not often been measured. From a study of 2448 genetically heterogeneous mice, we report the heritability of 88 complex traits that include models of human disease (asthma, type 2 diabetes mellitus, obesity, and anxiety) as well as immunological, biochemical, and hematological phenotypes. We show that environmental and physiological covariates are involved in an unexpectedly large number of significant interactions with genetic background. The 15 covariates we examined have a significant effect on behavioral and physiological tests, although they rarely explain >10% of the variation. We found that interaction effects are more frequent and larger than the main effects: half of the interactions explained >20% of the variance and in nine cases exceeded 50%. Our results indicate that assays of gene function using mouse models should take into account interactions between gene and environment. Hide abstract

O'Rourke D, Baban D, Demidova M, Mott R, Hodgkin J. 2006. Genomic clusters, putative pathogen recognition molecules, and antimicrobial genes are induced by infection of C. elegans with M. nematophilum. Genome Res, 16 (8), Read abstract | Read more

The interaction between the nematode Caenorhabditis elegans and a Gram-positive bacterial pathogen, Microbacterium nematophilum, provides a model for an innate immune response in nematodes. This pathogen adheres to the rectal and post-anal cuticle of the worm, causing slowed growth, constipation, and a defensive swelling response of rectal hypodermal cells. To explore the genomic responses that the worm activates after pathogenic attack we used microarray analysis of transcriptional changes induced after 6-h infection, comparing virulent with avirulent infection. We defined 89 genes with statistically significant expression changes of at least twofold, of which 68 were up-regulated and 21 were down-regulated. Among the former, those encoding C-type lectin domains were the most abundant class. Many of the 89 genes exhibit genomic clustering, and we identified one large cluster of 62 genes, of which most were induced in response to infection. We tested 41 of the induced genes for involvement in immunity using mutants or RNAi, finding that six of these are required for the swelling response and five are required more generally for defense. Our results indicate that C-type lectins and other putative pathogen-recognition molecules are important for innate immune defense in C. elegans. We also found significant induction of genes encoding lysozymes, proteases, and defense-related proteins, as well as various domains of unknown function. The genes induced during infection by M. nematophilum appear largely distinct from genes induced by other pathogens, suggesting that C. elegans mounts pathogen-specific responses to infection. Hide abstract

Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JN, Mott R, Flint J. 2006. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet, 38 (8), Read abstract | Read more

Difficulties in fine-mapping quantitative trait loci (QTLs) are a major impediment to progress in the molecular dissection of complex traits in mice. Here we show that genome-wide high-resolution mapping of multiple phenotypes can be achieved using a stock of genetically heterogeneous mice. We developed a conservative and robust bootstrap analysis to map 843 QTLs with an average 95% confidence interval of 2.8 Mb. The QTLs contribute to variation in 97 traits, including models of human disease (asthma, type 2 diabetes mellitus, obesity and anxiety) as well as immunological, biochemical and hematological phenotypes. The genetic architecture of almost all phenotypes was complex, with many loci each contributing a small proportion to the total variance. Our data set, freely available at http://gscan.well.ox.ac.uk, provides an entry point to the functional characterization of genes involved in many complex traits. Hide abstract

Mott R. 2006. Finding the molecular basis of complex genetic variation in humans and mice. Philos Trans R Soc Lond B Biol Sci, 361 (1467), Read abstract | Read more

I survey the state of the art in complex trait analysis, including the use of new experimental and computational technologies and resources becoming available, and the challenges facing us. I also discuss how the prospects of rodent model systems compare with association mapping in humans. Hide abstract

Valdar W, Flint J, Mott R. 2006. Simulating the collaborative cross: power of quantitative trait loci detection and mapping resolution in large sets of recombinant inbred strains of mice. Genetics, 172 (3), Read abstract | Read more

It has been suggested that the collaborative cross, a large set of recombinant inbred strains derived from eight inbred mouse strains, would be a powerful resource for the dissection of complex phenotypes. Here we use simulation to investigate the power of the collaborative cross to detect and map small genetic effects. We show that for a fixed population of 1000 individuals, 500 RI lines bred using a modified version of the collaborative cross design are adequate to map a single additive locus that accounts for 5% of the phenotypic variation to within 0.96 cM. In the presence of strong epistasis more strains can improve detection, but 500 lines still provide sufficient resolution to meet most goals of the collaborative cross. However, even with a very large panel of RILs, mapping resolution may not be sufficient to identify single genes unambiguously. Our results are generally applicable to the design of RILs in other species. Hide abstract

Solberg LC, Valdar W, Gauguier D, Nunez G, Taylor A, Burnett S, Arboledas-Hita C, Hernandez-Pliego P et al. 2006. A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice. Mamm Genome, 17 (2), Read abstract | Read more

Whole-genome genetic association studies in outbred mouse populations represent a novel approach to identifying the molecular basis of naturally occurring genetic variants, the major source of quantitative variation between inbred strains of mice. Measuring multiple phenotypes in parallel on each mouse would make the approach cost effective, but protocols for phenotyping on a large enough scale have not been developed. In this article we describe the development and deployment of a protocol to collect measures on three models of human disease (anxiety, type II diabetes, and asthma) as well as measures of mouse blood biochemistry, immunology, and hematology. We report that the protocol delivers highly significant differences among the eight inbred strains (A/J, AKR/J, BALBc/J, CBA/J, C3H/HeJ, C57BL/6 J, DBA/2 J, and LP/J), the progenitors of a genetically heterogeneous stock (HS) of mice. We report the successful collection of multiple phenotypes from 2000 outbred HS animals. The phenotypes measured in the protocol form the basis of a large-scale investigation into the genetic basis of complex traits in mice designed to examine interactions between genes and between genes and environment, as well as the main effects of genetic variants on phenotypes. Hide abstract

Shifman S, Bell JT, Copley RR, Taylor MS, Williams RW, Mott R, Flint J. 2006. A high-resolution single nucleotide polymorphism genetic map of the mouse genome. PLoS biology, 4 (12), Read abstract | Read more

High-resolution genetic maps are required for mapping complex traits and for the study of recombination. We report the highest density genetic map yet created for any organism, except humans. Using more than 10,000 single nucleotide polymorphisms evenly spaced across the mouse genome, we have constructed genetic maps for both outbred and inbred mice, and separately for males and females. Recombination rates are highly correlated in outbred and inbred mice, but show relatively low correlation between males and females. Differences between male and female recombination maps and the sequence features associated with recombination are strikingly similar to those observed in humans. Genetic maps are available from http://gscan.well.ox.ac.uk/#genetic_map and as supporting information to this publication. Hide abstract

Fiddy S, Cattermole D, Xie D, Duan XY, Mott R. 2006. An integrated system for genetic analysis. BMC Bioinformatics, 7 Read abstract | Read more

Large-scale genetic mapping projects require data management systems that can handle complex phenotypes and detect and correct high-throughput genotyping errors, yet are easy to use. Hide abstract

Hanchard NA, Rockett KA, Spencer C, Coop G, Pinder M, Jallow M, Kimber M, McVean G, Mott R, Kwiatkowski DP. 2006. Screening for recently selected alleles by analysis of human haplotype similarity. Am J Hum Genet, 78 (1), Read abstract | Read more

There is growing interest in the use of haplotype-based methods for detecting recent selection. Here, we describe a method that uses a sliding window to estimate similarity among the haplotypes associated with any given single-nucleotide polymorphism (SNP) allele. We used simulations of natural selection to provide estimates of the empirical power of the method to detect recently selected alleles and found it to be comparable in power to the popular long-range haplotype test and more powerful than methods based on nucleotide diversity. We then applied the method to a recently selected allele--the sickle mutation at the HBB locus--and found it to have a signal of selection that was significantly stronger than that of simulated models both with and without strong selection. Using this method, we also evaluated >4,000 SNPs on chromosome 20, indicating the applicability of the method to regional data sets. Hide abstract

Luoni G, Forton J, Jallow M, Sadighi Akha E, Sisay-Joof F, Pinder M, Hanchard N, Herbert M et al. 2005. Population-specific patterns of linkage disequilibrium in the human 5q31 region. Genes Immun, 6 (8), Read abstract | Read more

Linkage disequilibrium across the human genome is generally lower in West Africans than Europeans. However in the 5q31 region, which is rich in immune genes, we find significantly more examples of apparent nonrecombination between distant marker pairs in West Africans. Much of this effect is due to SNPs that are absent in Europeans, possibly reflecting recent positive selection in the West African population. Hide abstract

Yalcin B, Flint J, Mott R. 2005. Using progenitor strain information to identify quantitative trait nucleotides in outbred mice. Genetics, 171 (2), Read abstract | Read more

We have developed a fast and economical strategy for dissecting the genetic architecture of quantitative trait loci at a molecular level. The method uses two pieces of information: mapping data from crosses that involve more than two inbred strains and sequence variants in the progenitor strains within the interval containing a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that any sequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals except at a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis. We apply the method to a 4.8-Mb region on mouse chromosome 1 that contains a QTL influencing anxiety segregating in a heterogeneous stock and show that, under the assumption that a single QTN is present and lies in a region conserved between the human and mouse genomes, it is possible to reduce the number of variants likely to be the quantitative trait nucleotide from many thousands to <20. Hide abstract

Barnby G, Abbott A, Sykes N, Morris A, Weeks DE, Mott R, Lamb J, Bailey AJ, Monaco AP, International Molecular Genetics Study of Autism Consortium. 2005. Candidate-gene screening and association analysis at the autism-susceptibility locus on chromosome 16p: evidence of association at GRIN2A and ABAT. Am J Hum Genet, 76 (6), Read abstract | Read more

Autism is a highly heritable neurodevelopmental disorder whose underlying genetic causes have yet to be identified. To date, there have been eight genome screens for autism, two of which identified a putative susceptibility locus on chromosome 16p. In the present study, 10 positional candidate genes that map to 16p11-13 were examined for coding variants: A2BP1, ABAT, BFAR, CREBBP, EMP2, GRIN2A, MRTF-B, SSTR5, TBX6, and UBN1. Screening of all coding and regulatory regions by denaturing high-performance liquid chromatography identified seven nonsynonymous changes. Five of these mutations were found to cosegregate with autism, but the mutations are not predicted to have deleterious effects on protein structure and are unlikely to represent significant etiological variants. Selected variants from candidate genes were genotyped in the entire International Molecular Genetics Study of Autism Consortium collection of 239 multiplex families and were tested for association with autism by use of the pedigree disequilibrium test. Additionally, genotype frequencies were compared between 239 unrelated affected individuals and 192 controls. Patterns of linkage disequilibrium were investigated, and the transmission of haplotypes across candidate genes was tested for association. Evidence of single-marker association was found for variants in ABAT, CREBBP, and GRIN2A. Within these genes, 12 single-nucleotide polymorphisms (SNPs) were subsequently genotyped in 91 autism trios (one affected individual and two unaffected parents), and the association was replicated within GRIN2A (Fisher's exact test, P<.0001). Logistic regression analysis of SNP data across GRIN2A and ABAT showed a trend toward haplotypic differences between cases and controls. Hide abstract

Griffiths MJ, Shafi MJ, Popper SJ, Hemingway CA, Kortok MM, Wathen A, Rockett KA, Mott R et al. 2005. Genomewide analysis of the host response to malaria in Kenyan children. J Infect Dis, 191 (10), Read abstract | Read more

Malaria is a global problem, and there is a critical need for further understanding of the disease process. When malarial parasites invade and develop within the bloodstream, they stimulate a profound host response whose main clinical sign is fever. To explore this response, we measured host gene expression in whole blood from Kenyan children hospitalized with either acute malaria or other febrile illnesses. Genomewide analysis of expression identified 2 principal gene-expression profiles related to neutrophil and erythroid activity. In addition to these general acute responses, a third gene-expression profile was associated with host parasitemia; mediators of erythrophagocytosis and cellular stress were notable components of this response. The delineation of subjects on the basis of patterns of gene expression provides a molecular perspective of the host response to malaria and further functional insight into the underlying processes of pathogenesis. Hide abstract

Flint J, Valdar W, Shifman S, Mott R. 2005. Strategies for mapping and cloning quantitative trait genes in rodents. Nat Rev Genet, 6 (4), Read abstract | Read more

Over the past 15 years, more than 2,000 quantitative trait loci (QTLs) have been identified in crosses between inbred strains of mice and rats, but less than 1% have been characterized at a molecular level. However, new resources, such as chromosome substitution strains and the proposed Collaborative Cross, together with new analytical tools, including probabilistic ancestral haplotype reconstruction in outbred mice, Yin-Yang crosses and in silico analysis of sequence variants in many inbred strains, could make QTL cloning tractable. We review the potential of these strategies to identify genes that underlie QTLs in rodents. Hide abstract

Price TS, Regan R, Mott R, Hedman A, Honey B, Daniels RJ, Smith L, Greenfield A et al. 2005. SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. Nucleic Acids Res, 33 (11), Read abstract | Read more

Comparative genome hybridization (CGH) to DNA microarrays (array CGH) is a technique capable of detecting deletions and duplications in genomes at high resolution. However, array CGH studies of the human genome noting false negative and false positive results using large insert clones as probes have raised important concerns regarding the suitability of this approach for clinical diagnostic applications. Here, we adapt the Smith-Waterman dynamic-programming algorithm to provide a sensitive and robust analytic approach (SW-ARRAY) for detecting copy-number changes in array CGH data. In a blind series of hybridizations to arrays consisting of the entire tiling path for the terminal 2 Mb of human chromosome 16p, the method identified all monosomies between 267 and 1567 kb with a high degree of statistical significance and accurately located the boundaries of deletions in the range 267-1052 kb. The approach is unique in offering both a nonparametric segmentation procedure and a nonparametric test of significance. It is scalable and well-suited to high resolution whole genome array CGH studies that use array probes derived from large insert clones as well as PCR products and oligonucleotides. Hide abstract

Yalcin B, Willis-Owen SA, Fullerton J, Meesaq A, Deacon RM, Rawlins JN, Copley RR, Morris AP, Flint J, Mott R. 2004. Genetic dissection of a behavioral quantitative trait locus shows that Rgs2 modulates anxiety in mice. Nat Genet, 36 (11), Read abstract | Read more

Here we present a strategy to determine the genetic basis of variance in complex phenotypes that arise from natural, as opposed to induced, genetic variation in mice. We show that a commercially available strain of outbred mice, MF1, can be treated as an ultrafine mosaic of standard inbred strains and accordingly used to dissect a known quantitative trait locus influencing anxiety. We also show that this locus can be subdivided into three regions, one of which contains Rgs2, which encodes a regulator of G protein signaling. We then use quantitative complementation to show that Rgs2 is a quantitative trait gene. This combined genetic and functional approach should be applicable to the analysis of any quantitative trait. Hide abstract

Churchill GA, Airey DC, Allayee H, Angel JM, Attie AD, Beatty J, Beavis WD, Belknap JK et al. 2004. The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat Genet, 36 (11), Read abstract | Read more

The goal of the Complex Trait Consortium is to promote the development of resources that can be used to understand, treat and ultimately prevent pervasive human diseases. Existing and proposed mouse resources that are optimized to study the actions of isolated genetic loci on a fixed background are less effective for studying intact polygenic networks and interactions among genes, environments, pathogens and other factors. The Collaborative Cross will provide a common reference panel specifically designed for the integrative analysis of complex systems and will change the way we approach human health and disease. Hide abstract

Yalcin B, Fullerton J, Miller S, Keays DA, Brady S, Bhomra A, Jefferson A, Volpi E, Copley RR, Flint J, Mott R. 2004. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc Natl Acad Sci U S A, 101 (26), Read abstract | Read more

Investigation of sequence variation in common inbred mouse strains has revealed a segmented pattern in which regions of high and low variant density are intermixed. Furthermore, it has been suggested that allelic strain distribution patterns also occur in well defined blocks and consequently could be used to map quantitative trait loci (QTL) in comparisons between inbred strains. We report a detailed analysis of polymorphism distribution in multiple inbred mouse strains over a 4.8-megabase region containing a QTL influencing anxiety. Our analysis indicates that it is only partly true that the genomes of inbred strains exist as a patchwork of segments of sequence identity and difference. We show that the definition of haplotype blocks is not robust and that methods for QTL mapping may fail if they assume a simple block-like structure. Hide abstract

Woodfine K, Fiegler H, Beare DM, Collins JE, McCann OT, Young BD, Debernardi S, Mott R, Dunham I, Carter NP. 2004. Replication timing of the human genome. Hum Mol Genet, 13 (2), Read abstract | Read more

We have developed a directly quantitative method utilizing genomic clone DNA microarrays to assess the replication timing of sequences during the S phase of the cell cycle. The genomic resolution of the replication timing measurements is limited only by the genomic clone size and density. We demonstrate the power of this approach by constructing a genome-wide map of replication timing in human lymphoblastoid cells using an array with clones spaced at 1 Mb intervals and a high-resolution replication timing map of 22q with an array utilizing overlapping sequencing tile path clones. We show a positive correlation, both genome-wide and at a high resolution, between replication timing and a range of genome parameters including GC content, gene density and transcriptional activity. Hide abstract

Linnell J, Mott R, Field S, Kwiatkowski DP, Ragoussis J, Udalova IA. 2004. Quantitative high-throughput analysis of transcription factor binding specificities. Nucleic Acids Res, 32 (4), Read abstract | Read more

We present a general high-throughput approach to accurately quantify DNA-protein interactions, which can facilitate the identification of functional genetic polymorphisms. The method tested here on two structurally distinct transcription factors (TFs), NF-kappaB and OCT-1, comprises three steps: (i) optimized selection of DNA variants to be tested experimentally, which we show is superior to selecting variants at random; (ii) a quantitative protein-DNA binding assay using microarray and surface plasmon resonance technologies; (iii) prediction of binding affinity for all DNA variants in the consensus space using a statistical model based on principal coordinates analysis. For the protein-DNA binding assay, we identified a polyacrylamide/ester glass activation chemistry which formed exclusive covalent bonds with 5'-amino-modified DNA duplexes and hindered non-specific electrostatic attachment of DNA. Full accessibility of the DNA duplexes attached to polyacrylamide-modified slides was confirmed by the high degree of data correlation with the electromobility shift assay (correlation coefficient 93%). This approach offers the potential for high-throughput determination of TF binding profiles and predicting the effects of single nucleotide polymorphisms on TF binding affinity. New DNA binding data for OCT-1 are presented. Hide abstract

Valdar WS, Flint J, Mott R. 2003. QTL fine-mapping with recombinant-inbred heterogeneous stocks and in vitro heterogeneous stocks. Mamm Genome, 14 (12), Read abstract | Read more

We compared strategies to fine-map Quantitative Trait Loci (QTL) in mice with heterogeneous stocks (HS). We showed that a panel of about 100 Recombinant Inbred Lines (RIL) derived from an HS, and which we called an RIHS, was ideally suited to fine-map QTL to very high resolution, without the cost of additional genotyping. We also investigated a strategy based on in vitro fertilization of large numbers of F(1) offspring of HS males crossed with an inbred line (IVHS). This method required some additional genotyping but avoided the breeding delays and costs associated with the construction of an RI panel. We showed that QTL detection was higher by using RIHS than with IVHS and that it was independent of the number of RI lines, provided the total number of animals phenotyped was constant. However, fine-mapping accuracy was slightly better with IVHS. We also investigated the effects of varying the number of HS generations and using multiallelic microsatellites instead of SNPs. We found that quite modest generation times of 10-20 generations were optimal. Microsatellites were superior to SNPs only when the generation time was 30 or more and when the markers were widely spaced. Hide abstract

Traherne JA, Hill MR, Hysi P, D'Amato M, Broxholme J, Mott R, Moffatt MF, Cookson WO. 2003. LD mapping of maternally and non-maternally derived alleles and atopy in FcepsilonRI-beta. Hum Mol Genet, 12 (20), Read abstract | Read more

Polymorphisms in the beta chain of the high affinity receptor for IgE (Fc epsilon RI-beta, MS4A2) are consistently associated with traits underlying asthma and atopy (immunoglobulin E-mediated allergy). However, the causal variants and haplotypes underlying disease have not yet been identified. Maternal effects, with association confined to maternally derived alleles, have been shown in some studies but not in others. We have therefore extended the known sequence and systematically detected polymorphisms across an 18.1 Kb genomic region that includes Fc epsilon RI-beta. Association testing in two panels of subjects showed the presence of single-nucleotide polymorphisms (SNPs) affecting prick skin tests and specific IgE responses in several clusters. Stepwise analyses indicated that the clusters represent independent effects. Interferon regulatory factor 2 (IRF-2) sites were altered by significantly associated SNPs in two regions. Strong association to maternally derived alleles was seen in one panel of subjects and not in the other. Maternal and non-maternally derived associations tended to share the same SNP clusters, but associations were stronger in the presence of maternal effects. Two regions of increased CpG concentration were identified in Fc epsilon RI-beta. One of these approximated a SNP cluster that showed strong association and maternal effects, providing a potential substrate for epigenetic effects. Hide abstract

Ackerman HC, Ribas G, Jallow M, Mott R, Neville M, Sisay-Joof F, Pinder M, Campbell RD, Kwiatkowski DP. 2003. Complex haplotypic structure of the central MHC region flanking TNF in a West African population. Genes Immun, 4 (7), Read abstract | Read more

TNF polymorphisms have been associated with susceptibility to malaria and other infectious and inflammatory conditions. We investigated a sample of 150 West African chromosomes to determine linkage disequilibrium (LD) between 25 SNP markers located in an 80 kb segment of the MHC Class III region encompassing TNF and eight neighbouring genes. We observed 45 haplotypes, and 22 of them comprise 80% of the sample. The pattern of LD is remarkably patchy, such that many markers show no LD with adjacent markers but high LD with markers that are much further away. We introduce a method of examining the implications of LD data for disease association studies based on sample size considerations: this shows that certain TNF polymorphisms would be likely to yield positive associations if the true disease allele resided in LTA or BAT1. We conclude that detailed marker maps are needed to resolve the causal origin of disease associations observed at the TNF locus. Hide abstract

Harrison A, Pearl F, Sillitoe I, Slidel T, Mott R, Thornton J, Orengo C. 2003. Recognizing the fold of a protein structure. Bioinformatics, 19 (14), Read abstract | Read more

This paper reports a graph-theoretic program, GRATH, that rapidly, and accurately, matches a novel structure against a library of domain structures to find the most similar ones. GRATH generates distributions of scores by comparing the novel domain against the different types of folds that have been classified previously in the CATH database of structural domains. GRATH uses a measure of similarity that details the geometric information, number of secondary structures and number of residues within secondary structures, that any two protein structures share. Although GRATH builds on well established approaches for secondary structure comparison, a novel scoring scheme has been introduced to allow ranking of any matches identified by the algorithm. More importantly, we have benchmarked the algorithm using a large dataset of 1702 non-redundant structures from the CATH database which have already been classified into fold groups, with manual validation. This has facilitated introduction of further constraints, optimization of parameters and identification of reliable thresholds for fold identification. Following these benchmarking trials, the correct fold can be identified with the top score with a frequency of 90%. It is identified within the ten most likely assignments with a frequency of 98%. GRATH has been implemented to use via a server (http://www.biochem.ucl.ac.uk/cgi-bin/cath/Grath.pl). GRATH's speed and accuracy means that it can be used as a reliable front-end filter for the more accurate, but computationally expensive, residue based structure comparison algorithm SSAP, currently used to classify domain structures in the CATH database. With an increasing number of structures being solved by the structural genomics initiatives, the GRATH server also provides an essential resource for determining whether newly determined structures are related to any known structures from which functional properties may be inferred. Hide abstract

Zhang Y, Leaves NI, Anderson GG, Ponting CP, Broxholme J, Holt R, Edser P, Bhattacharyya S et al. 2003. Positional cloning of a quantitative trait locus on chromosome 13q14 that influences immunoglobulin E levels and asthma. Nat Genet, 34 (2), Read abstract | Read more

Atopic or immunoglobulin E (IgE)-mediated diseases include the common disorders of asthma, atopic dermatitis and allergic rhinitis. Chromosome 13q14 shows consistent linkage to atopy and the total serum IgE concentration. We previously identified association between total serum IgE levels and a novel 13q14 microsatellite (USAT24G1; ref. 7) and have now localized the underlying quantitative-trait locus (QTL) in a comprehensive single-nucleotide polymorphism (SNP) map. We found replicated association to IgE levels that was attributed to several alleles in a single gene, PHF11. We also found association with these variants to severe clinical asthma. The gene product (PHF11) contains two PHD zinc fingers and probably regulates transcription. Distinctive splice variants were expressed in immune tissues and cells. Hide abstract

Fullerton J, Cubin M, Tiwari H, Wang C, Bomhra A, Davidson S, Miller S, Fairburn C et al. 2003. Linkage analysis of extremely discordant and concordant sibling pairs identifies quantitative-trait loci that influence variation in the human personality trait neuroticism. Am J Hum Genet, 72 (4), Read abstract | Read more

Several theoretical studies have suggested that large samples of randomly ascertained siblings can be used to ascertain phenotypically extreme individuals and thereby increase power to detect genetic linkage in complex traits. Here, we report a genetic linkage scan using extremely discordant and concordant sibling pairs, selected from 34,580 sibling pairs in the southwest of England who completed a personality questionnaire. We performed a genomewide scan for quantitative-trait loci (QTLs) that influence variation in the personality trait of neuroticism, or emotional stability, and we established genomewide empirical significance thresholds by simulation. The maximum pointwise P values, expressed as the negative logarithm (base 10), were found on 1q (3.95), 4q (3.84), 7p (3.90), 12q (4.74), and 13q (3.81). These five loci met or exceeded the 5% genomewide significance threshold of 3.8 (negative logarithm of the P value). QTLs on chromosomes 1, 12, and 13 are likely to be female specific. One locus, on chromosome 1, is syntenic with that reported from QTL mapping of rodent emotionality, an animal model of neuroticism, suggesting that some animal and human QTLs influencing emotional stability may be homologous. Hide abstract

Nijnik A, Mott R, Kwiatkowski DP, Udalova IA. 2003. Comparing the fine specificity of DNA binding by NF-kappaB p50 and p52 using principal coordinates analysis. Nucleic Acids Res, 31 (5), Read abstract | Read more

Principal coordinates analysis has been proposed as an efficient way of predicting the binding affinity of a transcription factor to different DNA motifs, as it can model complex interactions that are difficult to represent with standard position-weight matrices. Here we evaluate its ability to distinguish the DNA binding properties of two closely related proteins, the homodimeric forms of NF-kappaB p50 and p52. When tested experimentally against 50 different variants of the generalised NF-kappaB motif GGRRNNYYCC, the binding specificities of p50 and p52 were similar but not identical (correlation rho = 0.86). These experimental data can be modelled accurately with six principal coordinates that are similar for p50 and p52, plus one principal coordinate that is significantly stronger for p52 than for p50, relating to the inner positions of the binding site. These findings are compatible with crystallographic data showing that p52 has greater ability than p50 to form water molecule-mediated hydrogen bonds with inner nucleotide positions of the binding site. Hide abstract

Mott R, Schultz J, Bork P, Ponting CP. 2003. Predicting protein cellular localization using a domain projection method Genome Research, 13 (1), Read abstract

We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a "small-world network", linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of "bridging" domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment ("locale"), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy. Hide abstract

Ackerman H, Usen S, Mott R, Richardson A, Sisay-Joof F, Katundu P, Taylor T, Ward R, Molyneux M, Pinder M, Kwiatkowski DP. 2003. Haplotypic analysis of the TNF locus by association efficiency and entropy. Genome Biol, 4 (4), Read abstract | Read more

To understand the causal basis of TNF associations with disease, it is necessary to understand the haplotypic structure of this locus. We genotyped 12 single-nucleotide polymorphisms (SNPs) distributed over 4.3 kilobases in 296 healthy, unrelated Gambian and Malawian adults. We generated 592 high-quality haplotypes by integrating family- and population-based reconstruction methods. Hide abstract

Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature, 420 (6915), Read abstract | Read more

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism. Hide abstract

Harrison A, Pearl F, Mott R, Thornton J, Orengo C. 2002. Quantifying the similarities within fold space. J Mol Biol, 323 (5), Read abstract | Read more

We have used GRATH, a graph-based structure comparison algorithm, to map the similarities between the different folds observed in the CATH domain structure database. Statistical analysis of the distributions of the fold similarities has allowed us to assess the significance for any similarity. Therefore we have examined whether it is best to represent folds as discrete entities or whether, in fact, a more accurate model would be a continuum wherein folds overlap via common motifs. To do this we have introduced a new statistical measure of fold similarity, termed gregariousness. For a particular fold, gregariousness measures how many other folds have a significant structural overlap with that fold, typically comprising 40% or more of the larger structure. Gregarious folds often contain commonly occurring super-secondary structural motifs, such as beta-meanders, greek keys, alpha-beta plait motifs or alpha-hairpins, which are matching similar motifs in other folds. Apart from one example, all the most gregarious folds matching 20% or more of the other folds in the database, are alpha-beta proteins. They also occur in highly populated architectural regions of fold space, adopting sandwich-like arrangements containing two or more layers of alpha-helices and beta-strands.Domains that exhibit a low gregariousness, are those that have very distinctive folds, with few common motifs or motifs that are packed in unusual arrangements. Most of the superhelices exhibit low gregariousness despite containing some commonly occurring super-secondary structural motifs. In these folds, these common motifs are combined in an unusual way and represent a small proportion of the fold (<10%). Our results suggest that fold space may be considered as continuous for some architectural arrangements (e.g. alpha-beta sandwiches), in that super-secondary motifs can be used to link neighbouring fold groups. However, in other regions of fold space much more discrete topologies are observed with little similarity between folds. Hide abstract

Flint J, Fernandez-Teruel A, Escorihuela RM, Gray JA, Aguilar R, Gil L, Gimenez-Llort L, Tobena A et al. 2002. A quantitative trait locus influencing anxiety in the laboratory rat BEHAVIOR GENETICS, 32 (6),

Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, Dibling T et al. 2002. A first-generation linkage disequilibrium map of human chromosome 22. Nature, 418 (6897), Read abstract | Read more

DNA sequence variants in specific genes or regions of the human genome are responsible for a variety of phenotypes such as disease risk or variable drug response. These variants can be investigated directly, or through their non-random associations with neighbouring markers (called linkage disequilibrium (LD)). Here we report measurement of LD along the complete sequence of human chromosome 22. Duplicate genotyping and analysis of 1,504 markers in Centre d'Etude du Polymorphisme Humain (CEPH) reference families at a median spacing of 15 kilobases (kb) reveals a highly variable pattern of LD along the chromosome, in which extensive regions of nearly complete LD up to 804 kb in length are interspersed with regions of little or no detectable LD. The LD patterns are replicated in a panel of unrelated UK Caucasians. There is a strong correlation between high LD and low recombination frequency in the extant genetic map, suggesting that historical and contemporary recombination rates are similar. This study demonstrates the feasibility of developing genome-wide maps of LD. Hide abstract

Mott R, Schultz J, Bork P, Ponting CP. 2002. Predicting protein cellular localization using a domain projection method. Genome Res, 12 (8), Read abstract | Read more

We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a "small-world network", linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of "bridging" domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment ("locale"), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy. Hide abstract

Udalova IA, Mott R, Field D, Kwiatkowski D. 2002. Quantitative prediction of NF-kappa B DNA-protein interactions. Proc Natl Acad Sci U S A, 99 (12), Read abstract | Read more

We describe a general method based on principal coordinates analysis to predict the effects of single-nucleotide polymorphisms within regulatory sequences on DNA-protein interactions. We use binding data for the transcription factor NF-kappaB as a test system. The method incorporates the effects of interactions between base pair positions in the binding site, and we demonstrate that such interactions are present for NF-kappaB. Prediction accuracy is higher than with profile models, confirmed by crossvalidation and by the experimental verification of our predictions for additional sequences. The binding affinities of all potential NF-kappaB sites on human chromosome 22, together with the effects of known single-nucleotide polymorphisms, are calculated to determine likely functional variants. We propose that this approach may be valuable, either on its own or in combination with other methods, when standard profile models are disadvantaged by complex internucleotide interactions. Hide abstract

Mott R, Flint J. 2002. Simultaneous detection and fine mapping of quantitative trait loci in mice using heterogeneous stocks. Genetics, 160 (4), Read abstract

We describe a method to simultaneously detect and fine map quantitative trait loci (QTL) that is especially suited to the mapping of modifier loci in mouse mutant models. The method exploits the high level of historical recombination present in a heterogeneous stock (HS), an outbred population of mice derived from known founder strains. The experimental design is an F(2) cross between the HS and a genetically distinct line, such as one carrying a knockout or transgene. QTL detection is performed by a standard genome scan with approximately 100 markers and fine mapping by typing the same animals using densely spaced markers over those candidate regions detected by the scan. The analysis uses an extension of the dynamic-programming technique employed previously to fine map QTL in HS mice. We show by simulation that a QTL accounting for 5% of the total variance can be detected and fine mapped with >50% probability to within 3 cM by genotyping approximately 1500 animals. Hide abstract

Fernández-Teruel A, Escorihuela RM, Gray JA, Aguilar R, Gil L, Giménez-Llort L, Tobeña A, Bhomra A et al. 2002. A quantitative trait locus influencing anxiety in the laboratory rat. Genome Res, 12 (4), Read abstract | Read more

A critical test for a gene that influences susceptibility to fear in animals is that it should have a consistent pattern of effects across a broad range of conditioned and unconditioned models of anxiety. Despite many years of research, definitive evidence that genetic effects operate in this way is lacking. The limited behavioral test regimes so far used in genetic mapping experiments and the lack of suitable multivariate methodologies have made it impossible to determine whether the quantitative trait loci (QTL) detected to date specifically influence fear-related traits. Here we report the first multivariate analysis to explore the genetic architecture of rodent behavior in a battery of animal models of anxiety. We have mapped QTLs in an F2 intercross of two rat strains, the Roman high and low avoidance rats, that have been selectively bred for differential response to fear. Multivariate analyses show that one locus, on rat chromosome 5, influences behavior in different models of anxiety. The QTL influences two-way active avoidance, conditioned fear, elevated plus maze, and open field activity but not acoustic startle response or defecation in a novel environment. The direction of effects of the QTL alleles and a coincidence between the behavioral profiles of anxiolytic drug and genetic action are consistent with the QTL containing at least one gene with a pleiotropic action on fear responses. As the neural basis of fear is conserved across species, we suggest that the QTL may have relevance to trait anxiety in humans. Hide abstract

Letunic I, Goodstadt L, Dickens NJ, Doerks T, Schultz J, Mott R, Ciccarelli F, Copley RR, Ponting CP, Bork P. 2002. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res, 30 (1), Read abstract | Read more

SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk. Hide abstract

Ponting CP, Mott R, Bork P, Copley RR. 2001. Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution. Genome Res, 11 (12), Read abstract | Read more

Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution. Hide abstract

Adiga PS, Bhomra A, Turri MG, Nicod A, Datta SR, Jeavons P, Mott R, Flint J. 2001. Automatic analysis of agarose gel images. Bioinformatics, 17 (11), Read abstract | Read more

Automatic tools to speed up routine biological processes are very much sought after in bio-medical research. Much repetitive work in molecular biology, such as allele calling in genetic analysis, can be made semi-automatic or task specific automatic by using existing techniques from computer science and signal processing. Computerized analysis is reproducible and avoids various forms of human error. Semi-automatic techniques with an interactive check on the results speed up the analysis and reduce the error. Hide abstract

Mott R, Abecasis GR, Cardon LR. 2001. Identifying extreme regions of linkage disequilibrium with dense maps. AMERICAN JOURNAL OF HUMAN GENETICS, 69 (4),

Flint J, Mott R. 2001. Finding the molecular basis of quantitative traits: successes and pitfalls. Nat Rev Genet, 2 (6), Read abstract | Read more

Understanding the molecular basis of quantitative genetic variation is a principal goal for biomedicine. Although the complex genetic architecture of quantitative traits has so far largely frustrated attempts to identify genes in humans by standard linkage methodologies, quantitative trait loci (QTL) have been mapped in plants, insects and rodents. However, identifying the molecular bases of QTL remains a challenge. Here, we discuss why this is and how new experimental strategies and analytical techniques, combined with the fruits of the genome projects, are beginning to identify candidate genes for QTL studies in several model organisms. Hide abstract

Mott R, Talbot CJ, Turri MG, Collins AC, Flint J. 2000. A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci U S A, 97 (23), Read abstract | Read more

High-resolution mapping of quantitative trait loci (QTL) in animals has proved to be difficult because the large effect sizes detected in crosses between inbred strains are often caused by numerous linked QTLs, each of small effect. In a study of fearfulness in mice, we have shown it is possible to fine map small-effect QTLs in a genetically heterogeneous stock (HS). This strategy is a powerful general method of fine mapping QTLs, provided QTLs detected in crosses between inbred strains that formed the HS can be reliably detected in the HS. We show here that single-marker association analysis identifies only two of five QTLs expected to be segregating in the HS and apparently limits the strategy's usefulness for fine mapping. We solve this problem with a multipoint analysis that assigns the probability that an allele descends from each progenitor in the HS. The analysis does not use pedigrees but instead requires information about the HS founder haplotypes. With this method we mapped all three previously undetected loci [chromosome (Chr.) 1 logP 4.9, Chr. 10 logP 6.0, Chr. 15 logP 4.0]. We show that the reason for the failure of single-marker association to detect QTLs is its inability to distinguish opposing phenotypic effects when they occur on the same marker allele. We have developed a robust method of fine mapping QTLs in genetically heterogeneous animals and suggest it is now cost effective to undertake genomewide high-resolution analysis of complex traits in parallel on the same set of mice. Hide abstract

Mott R. 2000. Accurate formula for P-values of gapped local sequence and profile alignments. J Mol Biol, 300 (3), Read abstract | Read more

A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (i.e. gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to fit an extreme-value distribution to simulations or to the results of databank searches. The method is based on the theoretical ideas introduced by R. Mott and R. Tribe in 1999. Extensive simulation studies show that score-thresholds produced by the method are accurate to within +/-5 % 95 % of the time. We also investigate factors which effect the accuracy of alignment statistics, and show that any method based on asymptotic theory is limited because asymptotic behaviour is not strictly achieved for many real protein sequences, due to extreme composition effects. Consequently, it may not be practicable to find a general formula that is significantly more accurate until the sub-asymptotic behaviour of alignments is better understood. Hide abstract

Mott R. 1999. Local sequence alignments with monotonic gap penalties. Bioinformatics, 15 (6), Read abstract | Read more

Sequence alignments obtained using affine gap penalties are not always biologically correct, because the insertion of long gaps is over-penalised. There is a need for an efficient algorithm which can find local alignments using non-linear gap penalties. Hide abstract

Mott R, Tribe R. 1999. Approximate statistics of gapped alignments. J Comput Biol, 6 (1), Read abstract | Read more

A heuristic approximation to the score distribution of gapped alignments in the logarithmic domain is presented. The method applies to comparisons between random, unrelated protein sequences, using standard score matrices and arbitrary gap penalties. It is shown that gapped alignment behavior is essentially governed by a single parameter, alpha, depending on the penalty scheme and sequence composition. This treatment also predicts the position of the transition point between logarithmic and linear behavior. The approximation is tested by simulation and shown to be accurate over a range of commonly used substitution matrices and gap-penalties. Hide abstract

Meier-Ewert S, Lange J, Gerst H, Herwig R, Schmitt A, Freund J, Elge T, Mott R, Herrmann B, Lehrach H. 1998. Comparative gene expression profiling by oligonucleotide fingerprinting. Nucleic Acids Res, 26 (9), Read abstract | Read more

The use of hybridisation of synthetic oligonucleotides to cDNAs under high stringency to characterise gene sequences has been demonstrated by a number of groups. We have used two cDNA libraries of 9 and 12 day mouse embryos (24 133 and 34 783 clones respectively) in a pilot study to characterise expressed genes by hybridisation with 110 hybridisation probes. We have identified 33 369 clusters of cDNA clones, that ranged in representation from 1 to 487 copies (0.7%). 737 were assigned to known rodent genes, and a further 13 845 showed significant homologies. A total of 404 clusters were identified as significantly differentially represented (P < 0.01) between the two cDNA libraries. This study demonstrates the utility of the fingerprinting approach for the generation of comparative gene expression profiles through the analysis of cDNAs derived from different biological materials. Hide abstract

Dear S, Durbin R, Hillier L, Marth G, Thierry-Mieg J, Mott R. 1998. Sequence assembly with CAFTOOLS. Genome Res, 8 (3), Read abstract

Large-scale genomic sequencing requires a software infrastructure to support and integrate applications that are not directly compatible. We describe a suite of software tools built around the Common Assembly Format (CAF), a comprehensive representation of a sequence assembly as a text file. These tools form the backbone of sequencing informatics at the Sanger Centre and the Genome Sequencing Center. The CAF format is intentionally flexible, and our Perl and C libraries, which parse and manipulate it, provide powerful tools for creating new applications as well as wrappers to incorporate other software. The tools are available free by anonymous FTP from ftp://ftp.sanger.ac.uk/pub/badger/. Hide abstract

Mott R. 1998. Trace alignment and some of its applications. Bioinformatics, 14 (1), Read abstract | Read more

Extra useful information can be extracted from a DNA chromatogram trace, over that contained in the base-called DNA sequence. Many sequencing applications can benefit from examination of these traces. Hide abstract

Soderlund C, Longden I, Mott R. 1997. FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci, 13 (5), Read abstract

To meet the demands of large-scale sequencing, thousands of clones must be fingerprinted and assembled into contigs. To determine the order of clones, a typical experiment is to digest the clones with one or more restriction enzymes and measure the resulting fragments. The probability of two clones overlapping is based on the similarity of their fragments. A contig contains two or more overlapping clones and a minimal tiling path of clones is selected to be sequenced. Interactive software with algorithmic support is necessary to assemble the clones into contigs quickly. Hide abstract

Mott R. 1997. EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci, 13 (4),

Mangiarini L, Sathasivam K, Mahal A, Mott R, Seller M, Bates GP. 1997. Instability of highly expanded CAG repeats in mice transgenic for the Huntington's disease mutation. Nat Genet, 15 (2), Read abstract | Read more

Six inherited neurodegenerative diseases are caused by a CAG/polyglutamine expansion, including spinal and bulbar muscular atrophy (SBMA), Huntington's disease (HD), spinocerebellar ataxia type 1 (SCA1), dentatorubral pallidoluysian atrophy (DRPLA) Machado-Joseph disease (MJD or SCA3) and SCA2. Normal and expanded HD allele sizes of 6-39 and 35-121 repeats have been reported, and the allele distributions for the other diseases are comparable. Intergenerational instability has been described in all cases, and repeats tend to be more unstable on paternal transmission. This may present as larger increases on paternal inheritance as in HD, or as a tendency to increase on male and decrease on female transmission as in SCA1 (ref. 15). Somatic repeat instability is also apparent and appears most pronounced in the CNS. The major exception is the cerebellum, which in HD, DRPLA, SCA1 and MJD has a smaller repeat relative to the other brain regions tested. Of non-CNS tissues, instability was observed in blood, liver, kidney and colon. A mouse model of CAG repeat instability would be helpful in unravelling its molecular basis although an absence of CAG repeat instability in transgenic mice has so far been reported. These studies include (CAG) in the androgen receptor cDNA, (CAG) in the HD cDNA, (CAG) in the SCA1 cDNA, (CAG) in the SCA3 cDNA and as an isolated (CAG) tract. Hide abstract