Richard Mott's Home Page

Group Home Page

Introduction

happy R package

HAPPY 1.2

running happy

file formats

installation

web server

output

bugs

inbred-outbred cross

mapping strategies

QTN analysis

legal matters

Wellcome Trust Centre for Human Genetics

HAPPY V1.2


Problem Statement and Requirements

  • HAPPY is designed to map QTL in Heterogeneous Stocks (HS), ie populations founded from known inbred lines, which have interbred over many generations. No pedigree information is required.
  • Obviously, phenotypic values for the trait must be known for all individuals. It is preferable that these are normally distributed because HAPPY uses Analysis of Variance F statistics to test for linkage (however, a permutation test can be used instead).
  • For each genotyped marker, it is necessary to know the ancestral alleles in the inbred founders (which by definition must be homozygous), and the genotypes from the individuals in the final generation.
  • The chromosomal position in centiMorgans of each marker must be known.
  • Missing data are accomodated provided these are due to random failures in the genotyping and not selective genotyping based on the trait values (however, it is permissible to selectively genotype all the markers provided the same individuals are genotyped at each locus).

What HAPPY does

HAPPY's analyis is essentially two stage; ancestral haplotype reconstruction using dynamic programming, followed by QTL testing by linear regression:

  • Assume that at a QTL, a chromosome originating from the progenitor strain, labelled s, contributes an unknown additive amount Ts to the phenotype, so that the expected genetic effect for a diploid individual with ancestral alleles labelled s,t at the trait locus is Ts+Tt; a test for a QTL is equivalent to testing for differences between the Ts's.
  • A dynamic-programming algorithm is used to compute the probability Fn(s,t) that a given individual has the ancestral alleles s, t at locus labelled n, conditional upon all the genotype data for that individual. Then the expected phenotype is 2 Sums Ts Sumt Fn(s,t), and the Ts are estimated by a linear regression of the observed phenotypes on these expected values across all individuals, followed by an analysis of variance to test whether the progenitor estimates differ significantly.
  • The method's power depends on the ability to distinguish ancestral haplotypes across the interval; clearly the power will be lower if all markers in a region have the same type of non-informative allele distribution, but the markers can share information where there is a mixture.

A more detailed mathematical description of the algorithm and method is available here (MS Word format) or from Proc. Natl. Acad. Sci. USA, 10.1073/pnas.230304397).


Please send Questions, Comments, and Bug Reports to Richard Mott

 
spacer