Commercially available outbred mice for genome-wide association studies
Updated March 2010
The Background
Mouse geneticists have reason to envy the success of human genome-wide association studies (GWAS), but not necessarily to adopt their practice, for example by using wild mice. So doing entails the same drawbacks that afflict human GWAS:
tens of thousands of subjects are needed for robust detection of common causal variants and the majority of the genetic variance remains unexplained, even using these large sample sizes. What are the alternatives?
Commercial mouse breeders, such as Harlan and Charles River Laboratories, maintain large colonies of outbred mice that may have the necessary genetic structure.
LD in some outbred stocks has been shown to allow high-resolution mapping, sufficient to identify genes. Importantly, most outbred stocks are known to derive from animals from a
single population, such as the Swiss stocks which descend from two male and seven female imported from Lausanne, Switzerland

Groundwork responsible for the successful application of human GWAS required both the development of sufficient markers as well as the
genetic characterization of different populations. Similar work is needed in mouse genetics. Dense marker sets and tools for their genotyping are now available but they
have not been applied to analyse the genetic structure of outbred populations.
Here we provide data from approximately half a million markers obtained with the mouse diversity array and from next generation sequencing of one colony.
Colonies used
We use the term colony to mean a population of mice maintained as a mating population at a single location, and stock to mean a collection of colonies that are given the same stock
designation by the breeders. For example HsdWin:CFW-1 and Crl:CFW(SW) are two colonies from the same stock (CFW). We follow the international standardized nomenclature for outbred stocks, but add two further pieces of information:
a two letter code for the country of origin and, when there are several cohorts available from the same site, a code for the production room: e.g. Crl:CFW(SW)-US_P08.
We wrote to commercial suppliers across the world and asked them to provide us with samples for an analysis of the genetic architecture
of outbred colonies. We obtained samples from 72 colonies, whose location is shown below
Inbreeding, genetic relatedness and genetic drift
High rates of inbreeding make colonies less suitable for mapping because they contain fewer (if any) segregating QTLs. Colonies that consist of a mixture of relatives (such as siblings, half siblings, cousins, second degree and third degree relatives) will be difficult to use for mapping because the differing degrees of genetic relatedness introduce population structure.
We evaluated genetic relationships between and within colonies with a number of measures.
Principal components and multi-dimensional scaling revealed population differentiation, but no single feature (not stock, colony, producer of country of origin) satisfactorily accounts for the distribution.

Multidimensional scaling of IBS pairwise distance matrices

Here we show a genetic genealogy for colonies clustered according to Fst distances

We attempted to determine genetic ancestry regardless of stock identity. We considered each colony as originating from K unknown ancestral populations and looked at values of K from 2 to 12 using a maximum likelihood method in the program FRAPPE





We looked at allele frequency fluctuation over time, which is expected to occur due to unintended directional selection and random genetic drift
The figure below compares LD measured at two time points in six colonies. With one exception (the MF1 colony), the results
are stable.

LD decay and mean minor allele frequencies
We assess mapping resolution by the extent of the LD decay radius, defined as the average physical separation
in base pairs (bp) between SNPs beyond which the squared correlation coefficient (R2) drops below 0.5. However it is important
also to take into account the level of genetic variation in the colonies. There will be fewer quantitative trait loci
segregating in colonies with less variation, and if the alleles are rare then the QTLs will be relatively
difficult to detect. We show therefore the mean minor allele frequency for each colony.

LD maps
The data made available on this web page are from three colonies:
Crl:CFW(SW)-US_P08
Hsd:ICR(CD-1)-NL
HsdWin:NMRI(Han)-NL
Linkage disequilibrium data is provided as pairwise R2 and D prime measures. This is the output of the HAPLOVIEW programme. We have added the marker positions to the file.
Recombination maps
The recombination rate estimated using LDHat is given across the genome
Blocks
Haplotype blocks were estimated using PLINK, which implements the block finding algorithm found in HAPLOVIEW.
Sequence
Sequence reads, aligned to the reference strain (C57BL/6J) for four animals from the Crl:CFW(SW)-US_P08 colony look seq
QTL Mapping - phenotypes and genotypes
Raw genotypes and phenotypes for the four phenotypes (concentration of high-density lipoproteins (HDL) , mean red cell volume (MCV), serum alkaline phosphatase (ALP), the ratio of blood CD4+ to CD8+ T-lymphocytes) mapped in three colonies(Crl:CFW(SW)-US_P08, HsdWin:CFW-NL and HsdWin:NMRI-NL) are available in this directory
each file has the following structure:
id phen marker chr bp gen
NL_Hsd_NMRIp001 53.3 mm37-1-131648336 1 131648336 G G
NL_Hsd_NMRIp001 53.3 mm33-1-129773886 1 131821425 A G
note that the phenotypes are repeated for each marker |