Richard Mott's Home Page
Group Home Page
happy R package
Problem Statement and Requirements
- HAPPY is designed to map QTL in Heterogeneous Stocks (HS), ie
populations founded from known inbred lines, which have interbred
over many generations. No pedigree information is required.
- Obviously, phenotypic values for the trait must be known for all individuals. It is preferable that these are normally distributed because HAPPY uses Analysis of Variance F statistics to test for linkage (however, a permutation test can be used instead).
- For each genotyped marker, it is necessary to know the ancestral
alleles in the inbred founders (which by definition must be
homozygous), and the genotypes from the individuals in the final
- The chromosomal position in centiMorgans of each marker must be known.
- Missing data are accomodated provided these are due to
random failures in the genotyping and not selective genotyping
based on the trait values (however, it is permissible to selectively
genotype all the markers provided the same individuals are genotyped at each locus).
What HAPPY does
HAPPY's analyis is essentially two stage; ancestral haplotype reconstruction using dynamic programming, followed by QTL testing by linear regression:
- Assume that at a QTL, a chromosome originating from
the progenitor strain, labelled s, contributes an unknown additive
amount Ts to the phenotype, so that the expected genetic effect for a
diploid individual with ancestral alleles labelled s,t at the trait locus
is Ts+Tt; a test for a QTL is equivalent to testing for differences between
- A dynamic-programming algorithm is used to compute the
probability Fn(s,t) that a given individual has the ancestral alleles
s, t at locus labelled n, conditional upon all the genotype data for
that individual. Then the expected phenotype is 2 Sums Ts Sumt
Fn(s,t), and the Ts are estimated by a linear regression of the
observed phenotypes on these expected values across all individuals,
followed by an analysis of variance to test whether the progenitor
estimates differ significantly.
- The method's power depends on the ability to distinguish
ancestral haplotypes across the interval; clearly the power will be
lower if all markers in a region have the same type of non-informative
allele distribution, but the markers can share information where there
is a mixture.
A more detailed mathematical description of the algorithm and method is available here (MS Word format) or from Proc. Natl. Acad. Sci. USA, 10.1073/pnas.230304397).
Please send Questions, Comments, and Bug Reports to Richard Mott