The Background

Colonies used

Genetic relatedness

LD decay radius

LD maps

Recombination maps




WTCHG Mouse Resources

HS Mouse Mapping (Gscan)



WTCHG Bioinformatics Group

Richard Mott's Home Page

Jonathan Flint's Home Page

QTL mapping with HAPPY

QTL mapping strategies

Wellcome-CTC Mouse Strain SNP Genotype Set

Select Mouse SNPs

Heterogeneous Stock QTL Project


Wellcome Trust Centre for Human Genetics

Commercially available outbred mice for genome-wide association studies

Updated March 2010

The Background

Mouse geneticists have reason to envy the success of human genome-wide association studies (GWAS), but not necessarily to adopt their practice, for example by using wild mice. So doing entails the same drawbacks that afflict human GWAS: tens of thousands of subjects are needed for robust detection of common causal variants and the majority of the genetic variance remains unexplained, even using these large sample sizes. What are the alternatives? Commercial mouse breeders, such as Harlan and Charles River Laboratories, maintain large colonies of outbred mice that may have the necessary genetic structure. LD in some outbred stocks has been shown to allow high-resolution mapping, sufficient to identify genes. Importantly, most outbred stocks are known to derive from animals from a single population, such as the Swiss stocks which descend from two male and seven female imported from Lausanne, Switzerland

Origins of commercially available mice

Groundwork responsible for the successful application of human GWAS required both the development of sufficient markers as well as the genetic characterization of different populations. Similar work is needed in mouse genetics. Dense marker sets and tools for their genotyping are now available but they have not been applied to analyse the genetic structure of outbred populations. Here we provide data from approximately half a million markers obtained with the mouse diversity array and from next generation sequencing of one colony.

Colonies used

We use the term colony to mean a population of mice maintained as a mating population at a single location, and stock to mean a collection of colonies that are given the same stock designation by the breeders. For example HsdWin:CFW-1 and Crl:CFW(SW) are two colonies from the same stock (CFW). We follow the international standardized nomenclature for outbred stocks, but add two further pieces of information: a two letter code for the country of origin and, when there are several cohorts available from the same site, a code for the production room: e.g. Crl:CFW(SW)-US_P08.

We wrote to commercial suppliers across the world and asked them to provide us with samples for an analysis of the genetic architecture of outbred colonies. We obtained samples from 72 colonies, whose location is shown below

Location of the suppliers

Inbreeding, genetic relatedness and genetic drift

High rates of inbreeding make colonies less suitable for mapping because they contain fewer (if any) segregating QTLs. Colonies that consist of a mixture of relatives (such as siblings, half siblings, cousins, second degree and third degree relatives) will be difficult to use for mapping because the differing degrees of genetic relatedness introduce population structure.

We evaluated genetic relationships between and within colonies with a number of measures.

Principal components and multi-dimensional scaling revealed population differentiation, but no single feature (not stock, colony, producer of country of origin) satisfactorily accounts for the distribution.

Principal components analysis - grouping by stock, supplier and country of origin

Multidimensional scaling of IBS pairwise distance matrices

Multidimensional scaling of IBS pairwise distance matrices- grouping by stock, supplier and country of origin

Here we show a genetic genealogy for colonies clustered according to Fst distances

Clustering colonies and stocks by Fst

We attempted to determine genetic ancestry regardless of stock identity. We considered each colony as originating from K unknown ancestral populations and looked at values of K from 2 to 12 using a maximum likelihood method in the program FRAPPE

We looked at allele frequency fluctuation over time, which is expected to occur due to unintended directional selection and random genetic drift The figure below compares LD measured at two time points in six colonies. With one exception (the MF1 colony), the results are stable.

LD decay and mean minor allele frequencies

We assess mapping resolution by the extent of the LD decay radius, defined as the average physical separation in base pairs (bp) between SNPs beyond which the squared correlation coefficient (R2) drops below 0.5. However it is important also to take into account the level of genetic variation in the colonies. There will be fewer quantitative trait loci segregating in colonies with less variation, and if the alleles are rare then the QTLs will be relatively difficult to detect. We show therefore the mean minor allele frequency for each colony.

Linkage disequilibrium decay radius and mean minor allele frequencies in 72 commercially available colonies

LD maps

The data made available on this web page are from three colonies:




Linkage disequilibrium data is provided as pairwise R2 and D prime measures. This is the output of the HAPLOVIEW programme. We have added the marker positions to the file.

Recombination maps

The recombination rate estimated using LDHat is given across the genome


Haplotype blocks were estimated using PLINK, which implements the block finding algorithm found in HAPLOVIEW.


Sequence reads, aligned to the reference strain (C57BL/6J) for four animals from the Crl:CFW(SW)-US_P08 colony look seq

QTL Mapping - phenotypes and genotypes

Raw genotypes and phenotypes for the four phenotypes (concentration of high-density lipoproteins (HDL) , mean red cell volume (MCV), serum alkaline phosphatase (ALP), the ratio of blood CD4+ to CD8+ T-lymphocytes) mapped in three colonies(Crl:CFW(SW)-US_P08, HsdWin:CFW-NL and HsdWin:NMRI-NL) are available in this directory each file has the following structure:

id phen marker chr bp gen

NL_Hsd_NMRIp001 53.3 mm37-1-131648336 1 131648336 G G

NL_Hsd_NMRIp001 53.3 mm33-1-129773886 1 131821425 A G

note that the phenotypes are repeated for each marker