Richard Mott's Home Page

Group Home Page

Introduction

happy R package

HAPPY 1.2

running happy

file formats

installation

web server

output

bugs

inbred-outbred cross

mapping strategies

QTN analysis

legal matters

Wellcome Trust Centre for Human Genetics

HAPPY FILE FORMATS


HAPPY requires two input text files in the following formats. [A perl script, qtlData.pl, which helps generate the data in the correct format is included in the source distribution in the EXAMPLES subdirectory.]

  • an alleles file describing the alleles for each marker in the founder populations, and the marker positions. Here is an example alleles file for 3 markers and 8 strains (=founder populations).
    • Lines starting with a # are treated as comments.
    • The first line says how many markers and strains there are.
    • The second line gives the names of the strains (which may not contain spaces).
    • There then follow entries for the 3 markers, ie the lines starting with the word "marker". So for example the first marker D10MIT237 has 4 alleles and is at chromosomal position 0.1 centiMorgans
    • The 4 lines starting with the word "allele" following the marker line describe how the alleles are distributed amoung the founders. e.g. allele "96" is found in strains 1, 3, 6, 7 ie A/J, BALB, DBA, I. The numbers are the probability that the allele is found in each strain; currently these should be either 0 or 1/(#founders with allele) = 1/5 = 0.25 in the case of allele 96.
    • We code missing values as ND, which we treat as an allele occurrings with equal probability in each strain.

    markers 3 strains 8 strain_names A/J AKR BALB C3H C57 DBA I RIII marker D10MIT237 4 0.1 allele ND 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 allele 96 0.200 0.000 0.200 0.000 0.000 0.200 0.200 0.200 allele 98 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 allele 87 0.000 0.000 0.000 0.500 0.500 0.000 0.000 0.000 marker D10MIT267 5 0.50 allele ND 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 allele 95 0.333 0.000 0.333 0.000 0.000 0.333 0.000 0.000 allele 132 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 allele 103 0.000 0.000 0.000 0.500 0.500 0.000 0.000 0.000 allele 127 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.500 marker D10MIT102 3 1.31 allele ND 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 allele 143 0.250 0.000 0.250 0.000 0.000 0.000 0.250 0.250 allele 149 0.000 0.250 0.000 0.250 0.250 0.250 0.000 0.000
  • a data file containing the observed phenotypes and genotypes for the individuals. Data files may be in one of two formats.
  • Ped-file format
    • If there are M markers and N individuals, then the data file contains N rows and 2*M+6 columns.
    • Each row contains the data for one individual. Not all these columns are used by HAPPY. The columns (only those in bold are actually read) are Family-id,individual-id, mother-id, father-id, sex, phenotype. Columns 2*m+4, 2*m+5 contain the genotypes for the m'th marker.
    • The genotypes for a given marker must be among the alleles listed for that marker in the alleles file.
    • Lines starting with a # are treated as comments and are ignored.
    • An example fragment of a ped-file is given here: 1_3 A048005080 H2.3:C5.2(3) H2.3:G2.2(3) 2 NA A A G G A A 1_5 A048006063 E5.2:H5.1(4) E5.2:D4.1(4) 1 NA A A G G A A 1_1 A048006555 E1.3:H1.2(3) E1.3:D1.2(3) 1 NA A A G G A A 1_1 A048007096 D3.2:G2.1(5) D3.2:C5.1(5) 1 NA A C G A A T 1_3 A048010273 G5.2:B5.1(4) G5.2:F5.1(2) 2 NA A A G G A A 1_1 A048010371 H4.2:C5.1(4) H4.2:G1.1(7) 1 NA A A G G A A 1_81 A048011040 G2.2:B4.1(5) G2.2:F1.1(7) 1 NA A A G G A A 1_5 A048011287 B4.3:E5.2(3) B4.3:A1.2(3) 1 NA A A G G A A 1_3 A048011567 C2.2:F3.1(5) C2.2:B5.1(4) 1 NA A A G G A A 1_3 A048013559 C5.2:F3.1(5) C5.2:B5.1(4) 2 NA A C G A A T 1_2 A048015047 B5.2:E5.1(4) B5.2:A1.1(4) 1 NA A A G G A A 1_5 A048017615 E5.2:H5.1(4) E5.2:D4.1(4) 1 NA A A G G A A 1_2 A048019267 F5.2:A1.1(4) F5.2:E2.1(5) 1 NA A A G G A A 1_7 A048021023 H2.2:C3.1(4) H2.2:G3.1(3) 1 NA A C G A A T 1_7 A048022858 E2.3:H2.2(3) E2.3:D4.2(2) 2 NA A C G A A T 1_1 A048023355 H5.2:C1.1(5) H5.2:G1.1(7) 2 NA A A G G A A 1_1 A048023581 H4.2:C5.1(4) H4.2:G1.1(7) 1 NA A A G G A A 1_1 A048028854 E5.3:H5.2(6) E5.3:D3.2(5) 2 NA A C G A A T 1_76 A048028871 C1.3:F3.2(3) C1.3:B1.2(3) 2 NA A A G G A A 1_2 A048029086 G1.3:B1.2(3) G1.3:F3.2(3) 2 NA A A G G A A 1_1 A048030529 H4.2:C5.1(4) H4.2:G1.1(7) 1 NA A A G G A A 1_5 A048031067 A3.2:D2.1(5) A3.2:H4.1(5) 2 NA A C G A A T
  • HAPPY format (the C version of HAPPY only reads this version; the R version reads both).
    • If there are M markers and N individuals, then the data file contains N rows and 2*M+2 columns.
    • Each row contains the data for one individual. The first column is the sample ID. The second is the trait phenotype, which must be a real number (ie not a categorical variable). Columns 2*m+1, 2*m+2 contain the genotypes for the m'th marker.
    • The genotypes for a given marker must be among the alleles listed for that marker in the alleles file.
    • Lines starting with a # are treated as comments and are ignored.
    • A fragment data file consistent with the example alleles file is shown below.
    # SAMPLE_ID PHENOTYPE D10MIT237 D10MIT267 D10MIT102 3/19/96_01 0.70590 ND ND 95 127 ND ND 11/15/96_250 -0.63409 ND ND ND ND ND ND 3/19/96_02 -0.31980 ND ND 95 95 ND ND 2/29/96_20 0.46876 ND ND 95 127 ND ND 11/15/96_251 -0.59370 96 96 ND ND 143 149 3/19/96_03 -0.80497 96 96 95 95 143 149 11/15/96_252 -0.94475 ND ND ND ND ND ND 3/19/96_04 0.04782 ND ND 95 127 ND ND 11/15/96_253 0.20961 ND ND ND ND ND ND 3/19/96_05 -0.96132 96 96 95 127 143 149 3/19/96_06 -0.16538 ND ND 95 95 ND ND

Please send Questions, Comments, and Bug Reports to Richard Mott

 
spacer