##### # # #### # #### #### # # # # # # # # # # # # # # # # # # # # # ## # # # # # # # # ## ## # # # # # # # # # # #### ###### #### ####
May 1998 - Authors: Martin Farrall and Heather Cordell
A SOFTWARE PACKAGE FOR ANALYSING TWO-LOCUS SUSCEPTIBILITY GENE MODELS IN AFFECTED SIB-PAIR DATA
VERSION 2 INCLUDES RISCH MULTIPLICATIVE MODELS AS IMPLEMENTED IN THE GENERAL VARIANCE COMPONENTS MODEL BY HEATHER CORDELL
INTRODUCTION
In this sub-directory (pub/genetics/twoloc) you will find a series of utility programs, analysis programs and an example dataset that accompany a paper that appeared in Genetic Epidemiology in 1997 entitled "Affected Sibpair Linkage Tests for Multiple Linked Susceptibility Genes" by Martin Farrall (volume 14, number 2, pages 103 - 115). The abstract for this paper follows:
Genome-wide searches for susceptibility genes using pairs of affected siblings are being undertaken to dissect out individual polygenes that contribute to human multifactorial diseases. Efficient identity-by-descent (IBD) based sibpair linkage tests are available that test individual markers or maps of linked markers for linkage to a single putative susceptibility gene. In order to assess the support for linkage to a second putative susceptibility gene that happens to map close to an established susceptibility gene, it is necessary to use a method that correctly allows for the IBD distortion that directly results from the linkage between the two genes. A maximum likelihood based, multilocus linkage test is proposed which accounts for this interdependency and evaluates the support for an interaction between constituent susceptibility genes. The size and power of a test for a second linked susceptibility gene is investigated by simulation studies.
Please note that these programs are intended to be used by linkage analysis wizards who have a sound knowledge of the LINKAGE programs and experience with compiling, linking and running pascal, fortran and C programs on a UNIX based computer. In due course, I hope to polish up this software into a much more user friendly package (a "ready-to-microwave" TV dinner) but for the time being you will have to make do with a variety of prepared ingredients that you will have to combine and cook to make your meal! You will need to have the linkage analysis program VITESSE (and ideally the fortran numerical analysis library from NAG {Numerical Algorithm Group}) up and running on your computer before you will be able to use the software to its full extent as well as a fortran 77 and pascal compiler. I have included executable binary versions of the fortran and pascal programs for three Unix platforms, Solaris 2.5, IRIX 6.2 and DEC OSF/1 v3.0.
FILES INCLUDED IN THIS RELEASE
README.FIRST (this file) TWOLOC.TAR.Z (compressed tarfile containing the following files)
alpha.dat (output from "calc_alpha 0.2 0.3") awk (subdirectory containing three awk programs) bin (subdirectory with executables for various UNIX platforms) calc_alpha (Csh program to calculate alphas) calc_ibd (Csh program to calculate taus) datafile.dat (output of "mkp", input for "calc_ibd") joint_prob.dat (output from "calc_ibd", input for "twoloc") pedcount.dat (output from "mkp", input for "calc_ibd") pedfile.dat (output from "mkp", input for "calc_ibd") pedin.dat (input for "mkp") src (subdirectory with sources for "mkp" and "twoloc")
Use ftp to retrieve TWOLOC.TAR.Z and uncompress it and "untar" it as follows:
either
uncompress TWOLOC.TAR.Z; tar xvf TWOLOC.TAR
or
zcat TWOLOC.TAR.Z | tar xvf -
TWOLOC
The main analysis program, twoloc, is written in fortran (f77) (the source code is called twoloc.f), and this program computes MLS statistics for various user defined two-locus models. The user is interactively interrogated to specify the male and female recombination fractions separating the two susceptibility genes and to choose from various optional constraints (e.g. two locus additive model, single locus model, no dominance variance at one or both loci etc.). The sibpair data is read in from a single text file (joint_prob.dat) which contains the number of sibpairs to be read in and a list of probabilities (referred to as taus in the Genetic Epidemiology paper) for each affected sibling pair (the probability that a sibpair shares i and j genes IBD for two susceptibility genes at the two locations). The program also reads in a list of 16 sex-specific joint IBD prior probabilities (referred to as alphas in the Genetic Epidemiology paper) from a textfile called alpha.dat.
The first line in joint_prob.dat is the number of families to be analysed. Each subsequent line contains 16 joint IBD probabilities (taus) (that obviously have to sum to 1!), each sibpair has it's own list of 16 taus . The list order for the taus is shown below where each joint IBD probability is listed as IBD(i,j) where i and j can be 1 (IBD=0), 2 (IBD = 1 inherited from father), 3 (IBD = 1 inherited from mother) or 4 (IBD = 2).
IBD(1,1), IBD(1,2), IBD(1,3), IBD(1,4), IBD(2,1), IBD(2,2), IBD(2,3), IBD(2,4), IBD(3,1), IBD(3,2), IBD(3,3), IBD(3,4), IBD(4,1), IBD(4,2), IBD(4,3), IBD(4,4).
So for example, a family in which the joint IBD at two locations is exactly 2 and 2 will be entered as follows:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
As another example, consider a family which shares exactly 1 gene IBD (inherited from the father) at the first susceptibility gene and 1 gene IBD (inherited from the mother) at the second susceptibility gene, this data will be entered as follows:
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
The file alpha.dat, contains 16 sex-specific prior IBD probabilities (alphas), these can be calculated using the script calc_alpha (see below) and are listed in a similar order to that shown for the observed joint IBD probabilities (taus).
twoloc, calls the quasi-Newtonian minimization subroutine E04JAF from the Numerical Algorithms Group Ltd. (NAG) to maximize the likelihood (and MLS). It should be fairly straightforward to replace this function with your own favourite minimization function and I will try to assist you with this if you are unable to unfathom the twoloc source code. NAG fortran libraries are readily available and I suggest that for details you try contacting your local computing experts (as site licences are often held by universities). NAG's home page is http://www.nag.co.uk/ where online documentation for E04JAF as well as distributor information can be found.
Compiling twoloc with a suitably installed fortran compiler and NAG library can be achieved by:
f77 -O -extend_source -o twoloc twoloc.f -lnag
The "-extend_source" switch (some compilers use "-e") is necessary as some of the source code lines are longer than 72 characters. The -lnag switch for the linker is necessary for the NAG subroutines, you may need an additional -L flag if the libraries are not installed in your default search path.
An example dataset (joint_prob.dat) is included to try out some illustrative analyses. The two susceptibility genes are assumed to be linked (theta = 0.2 in males and 0.3 in females). Results from fitting several models are listed below:
Option Model MLS --------------------------------------------------------------------- 1 general model - all 8 variance components iterated 5.942 5 additive model (includes dominance components) 5.726 9 Locus 1 on its own with dominance variance 4.777 11 Locus 2 on its own with dominance variance 2.611 ---------------------------------------------------------------------
The additional support for locus 2 assuming that locus 1 is linked is calculated by comparing model 5 with model 9 (5.726 - 4.777 = 0.95). The additional support for locus 1 assuming that locus 2 is linked is calculated by comparing model 5 with model 11 (5.726 - 2.611 = 3.12). Assuming that both loci are linked, then there is no support for an epistatic interaction (i.e. compare model 1 with model 5, 5.942 - 5.726 = 0.22). I would conclude from this analysis that there is only support for locus 1 being linked and the "pairwise" MLS at locus 2 is due to its linkage to a major susceptibility gene (locus 1).
The final output of twoloc is written to the file mls.out (an example follows):
========================================================================
TWO-LOCUS SIBPAIR LINKAGE ANALYSIS RESULTS
IFAIL = 7
Total number of sibpairs in analysis = 312
Number of function evaluations = 347
2:1:1:0 joint IBD sharing matrix
0 1 (male) 1 (female) 2 0 0.063518 0.036150 0.055627 0.037694 1 (male) 0.032837 0.088857 0.030280 0.093000 1 (female) 0.050529 0.030280 0.088857 0.060437 2 0.025912 0.079839 0.051884 0.174300
Note: locus 1 in columns, locus 2 in rows
MLS = 5.942
option = 1 - general model - all 8 variance components iterated
========================================================================
IFAIL is a diagnostic code generated by the NAG subroutine E04JAF The program will most often generate a code of 5, 6, 7, 8 indicating that convergence has been acheived (to varying degrees of confidence). For more details see the NAG documentation on the WWW pages mentioned above.
The joint IBD sharing matrix contains maximum likelihood estimates.
CALC_ALPHA
calc_alpha is a (hopefully) useful little script that works in conjunction with vitesse (as a likelihood computation engine) in order to compute 16 sex-specific prior IBD sharing probabilities (alphas) for an affected sibpair. The script expects just two arguments, the male and female recombination fractions separating the two susceptibility genes. It then writes out 16 nuclear families, one for each of the 16 sex-specific joint IBD configurations (pedfile.dat), and writes a datafile.dat file with the pertinent parameters to calculate the relevant likelihoods (in the format required for a LINKMAP run). vitesse is then called to calculate the relevant likelihoods. I have added an awk "program" to read in the voutfile.dat file generated by a successful vitesse run and to write out the 16 joint IBD probabilities to a file alpha.dat (which is in a suitable format to be directly read in by twoloc). The order of listing the alphas is the same as discussed in the context of the "twoloc" taus.
MKP
mkp is a utility program (written in pascal) that reads in a LINKAGE formatted pedigree file (which must be called pedin.dat and must have been been processed through makeped so that it contains the first offspring and next sibling pointers and so on). The pedigree file must contain the affection status of each individual (0, 1 or 2) as the first phenotype (and no liability code please!) and a list of genotypes for each marker in the analysis (in LINKAGE terms, locus type 3 for codominant markers). The program will write out 16 new pedigrees for each original sibpair, with two new loci (dummy markers) that specify exactly each of the 16 possible sex-specific joint IBD configurations. If there are more than 2 affected sibs in a particular family, then the program will write out 16 pedigrees for each pair of sibs that can be constructed.
For example, here is a LINKAGE formatted pedigree file for two affected siblings and a third sib whose phenotype status is unknown:
disease marker marker status one two 7 1 0 0 3 0 0 1 1 0 1 2 0 0 Ped: 7 Per: 1 7 2 0 0 3 0 0 2 0 0 0 0 3 4 Ped: 7 Per: 2 7 3 1 2 0 4 4 1 0 2 1 3 1 3 Ped: 7 Per: 3 7 4 1 2 0 5 5 1 0 2 2 4 2 4 Ped: 7 Per: 4 7 5 1 2 0 0 0 2 0 0 1 4 1 3 Ped: 7 Per: 5
mkp has generated 16 families (one for each sex-specific joint IBD configuration) as follows:
marker four ****************************************************** three *********************************************** **** two **************************************** **** **** one ********************************* **** **** **** **** **** **** **** **** **** **** **** 1 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 1 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 1 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 1 4 1 2 0 5 5 1 0 2 4 2 4 2 4 2 4 Original ped no. 7 1 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 2 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 2 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 2 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 2 4 1 2 0 5 5 1 0 2 4 2 4 2 4 1 4 Original ped no. 7 2 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 3 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 3 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 3 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 3 4 1 2 0 5 5 1 0 2 4 2 4 2 4 2 3 Original ped no. 7 3 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 4 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 4 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 4 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 4 4 1 2 0 5 5 1 0 2 4 2 4 2 4 1 3 Original ped no. 7 4 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 5 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 5 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 5 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 5 4 1 2 0 5 5 1 0 2 4 2 4 1 4 2 4 Original ped no. 7 5 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 6 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 6 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 6 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 6 4 1 2 0 5 5 1 0 2 4 2 4 1 4 1 4 Original ped no. 7 6 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 7 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 7 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 7 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 7 4 1 2 0 5 5 1 0 2 4 2 4 1 4 2 3 Original ped no. 7 7 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 8 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 8 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 8 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 8 4 1 2 0 5 5 1 0 2 4 2 4 1 4 1 3 Original ped no. 7 8 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 9 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 9 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 9 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 9 4 1 2 0 5 5 1 0 2 4 2 4 2 3 2 4 Original ped no. 7 9 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 10 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 10 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 10 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 10 4 1 2 0 5 5 1 0 2 4 2 4 2 3 1 4 Original ped no. 7 10 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 11 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 11 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 11 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 11 4 1 2 0 5 5 1 0 2 4 2 4 2 3 2 3 Original ped no. 7 11 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 12 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 12 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 12 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 12 4 1 2 0 5 5 1 0 2 4 2 4 2 3 1 3 Original ped no. 7 12 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 13 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 13 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 13 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 13 4 1 2 0 5 5 1 0 2 4 2 4 1 3 2 4 Original ped no. 7 13 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 14 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 14 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 14 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 14 4 1 2 0 5 5 1 0 2 4 2 4 1 3 1 4 Original ped no. 7 14 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 15 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 15 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 15 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 15 4 1 2 0 5 5 1 0 2 4 2 4 1 3 2 3 Original ped no. 7 15 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7 16 1 0 0 3 0 0 1 1 1 2 0 0 1 2 1 2 Original ped no. 7 16 2 0 0 3 0 0 2 0 0 0 3 4 3 4 3 4 Original ped no. 7 16 3 1 2 0 4 4 1 0 1 3 1 3 1 3 1 3 Original ped no. 7 16 4 1 2 0 5 5 1 0 2 4 2 4 1 3 1 3 Original ped no. 7 16 5 1 2 0 0 0 2 0 1 4 1 3 0 0 0 0 Original ped no. 7
Markers one and two are as before, markers three and four are new dummy markers, one for each susceptibility gene.
mkp also writes out a suitable datafile (in LINKMAP format) to accompany the pedigree file. In order to set up a LINKMAP run to calculate likelihoods for susceptibility genes being separated by theta = 0.2 (males) and theta = 0.3 (females) tightly linked to markers one and two, then specify the order of loci as 1 3 2 4 and the male and female recombination fractions between the loci as:
4 0 0 4 0 0.0 0.0 0 1 3 2 4
3 4 # marker no: 1 0.25000 0.25000 0.25000 0.25000
3 4 # marker no: 2 0.25000 0.25000 0.25000 0.25000
3 4 # first susceptibility gene 0.25 0.25 0.25 0.25
3 4 # second susceptibility gene 0.25 0.25 0.25 0.25
2 0 0.00000 0.20000 0.00000 0.00000 0.30000 0.00000 1 0.00000 1
I have not tested this part of the program extensively and I would not be surprised if problems with the datafile.dat file were not encountered on occasion (especially with the last line). The idea is to compute a single likelihood for each family (with separate male and female recombination fractions) for the recombination fractions selected when running mkp. The last line may require editing by the user I'm afraid.
mkp also writes out the total number of families in pedfile.dat to a new file pedcount.dat
mkp is a very simple program that was designed to work with nuclear families (i.e. parental and sibling generations ONLY). It assumes that parents can be identified as they have no specified ancestors (i.e. their parents are missing) and everyone else in a family is assumed to belong to the sibling generation. An example pedin.dat file is included for you to try out, I have also included the output of mkp (pedcount.dat, datafile.dat and pedfile.dat) in case you wish to try out calc_ibd in the first instance. I have compiled and run the program under Solaris 2.5 and IRIX 6.2. Curiously, under OSF/1 v3.2, the program crashes with a segmentation violation which I have been unable to resolve to date. Also, mkp writes out equal frequencies in the datafile.dat file for your markers, you should edit the datafile.dat file using a text editor to change them if necessary.
CALC_IBD
calc_ibd is a script that uses vitesse as a likelihood computation engine to calculate sex-specific joint IBD probabilities (taus). An "awk" program is included to read in the resulting voutfile.dat and apply Bayes's theorem and write out the joint IBD matrix to the file joint_prob.dat.
VITESSE
Authors: Jeff O'Connell and Dan Weeks.
For more details:
"The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance," O'Connell JR, Weeks DE, Nature Genetics 11:402-408, December 1995
This program can be obtained from ftp://watson.hgen.pitt.edu/pub (USA) or ftp://ftp.ebi.ac.uk/pub/software/linkage_and_mapping/hgen_pitt (EUROPE).
PLEASE ENSURE THAT IF YOU USE VITESSE THAT YOU CITE O'CONNELL AND WEEKS
A brief outline to two-locus sibpair analysis.
1) Run calc_alpha to create a file alpha.dat that contains 16 sex-specific, joint IBD probabilities (alphas). vitesse must have been installed on your system as this is called to calculate the relevant likelihoods.
INPUT FILES: awk/awk.alpha.0, awk/awk.alpha.1 OUTPUT FILE: alpha.dat
2) Create a pedigree file (pedin.dat) in LINKAGE format containing the affection status (no more than 1 liability class please!), followed by at least two markers.
3) Process pedigree file with the program mkp to split up multiplex families into affected sibpairs and add dummy markers to specify all 16 possible, sex-specific, joint IBD configurations; edit the resultant datafile.dat file to check that the marker allele frequencies are correct:
INPUT FILES: pedin.dat OUTPUT FILES: pedfile.dat, datafile.dat, pedcount.dat
3) Run calc_ibd to create a file (joint_prob.dat) containing the sex-specific, joint IBD probabilities (taus). vitesse must have been installed on your system as this is called to calculate the relevant likelihoods.
INPUT FILES: pedfile.dat, datafile.dat, pedcount.dat, awk/awk.ibd OUTPUT FILES: joint_prob.dat
4) Finally run twoloc to fit various two-locus models to your sibpair data.
INPUT FILES: joint_prob.dat (see step 3), alpha.dat (see step 1) OUTPUT FILE: mls.out
Conditions of use:
This software is provided as a "gift" to a non-profit making research organization, no attempts should be made to sell or patent this program or incorporate it into another program. The software is provided "as is", no warranty as to the accuracy or reliability of the results can be provided or the fitness of the program for any particular application. Please feel free to modify or adapt the code as you wish.
For further enquiries, bug-reports and general advice on sib-pair linkage analysis, feel free to contact (preferably by e-mail):
Martin Farrall is ........
Dept. Cardiovascular Medicine The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN UK
phone: +44 (0)1865 740 012 (direct line) phone: +44 (0)1865 742 441 (general enquiries) FAX: +44 (0)1865 742 196 e-mail: mfarrall@well.ox.ac.uk
Dr. Heather Cordell may be contacted by e-mail at cordell@darwin.cwru.edu
If you send me your e-mail message, I will send you notice of updates, bugs etc. that will no doubt arise. Curiously no one has bother to register so far although from our anonymous FTP log, a fair number of people have downloaded the software.