Valbreed User Guide

William Valdar

October 8, 2008

1 Getting started

1.1 Tutorial: simulating the Archie cross

We start by simulating a example cross of founders A, B, D and E to produce subjects Fred and Archie. Below is a simplified ped-format file archiebreeding.ped describing the relationships between the individuals. The columns are Family, Individual, Father and Mother. Valbreed ignores the Family (ie, first) column. As with all the files read or produced by Valbreed, the format is white space delimited: items are separated by white space with multiple spaces (or other white space characters such as tabs) treated as a single delimiter.

1       Fred    Sam     Mary  
1       Archie  John    Mary  
1       John    A       B  
1       Mary    A       B  
1       Sam     D       E  
1       A       NA      NA  
1       B       0       0  
1       D       NA      NA  
1       E       NA      NA

In the above pedigree Fred is the child of Sam and Mary, Archie is the child of John and Mary and so on. The names NA and 0 are special and denote missing data. Because A, B, D, E all have missing parents, Valbreed assumes these are founders.

In order to generate the genotypes for the individuals above, Valbreed needs the genotypes and marker positions of the founders in the founder file archiefounders.md, which is in my own MarkerData or “md” format:

#begin#MarkerData::CMarkerSet  
m1      1       0  
m2      1       20  
m3      1       50  
q1      1       45  
q2      2       5  
#end#MarkerData::CMarkerSet  
#begin#MarkerData::CPhaseKnownGenome  
id:     A  
m1      11      A       11      A  
m2      10      A       10      A  
m3      11      A       11      A  
q1      1       A       1       A  
q2      1       A       1       A  
#end#MarkerData::CPhaseKnownGenome  
#begin#MarkerData::CPhaseKnownGenome  
id:     B  
m1      10      B       10      B  
m2      11      B       11      B  
m3      11      B       11      B  
q1      1       B       1       B  
q2      1       B       1       B  
#end#MarkerData::CPhaseKnownGenome  
#begin#MarkerData::CPhaseKnownGenome  
id:     D  
m1      10      D       10      D  
m2      10      D       10      D  
m3      10      D       10      D  
q1      1       D       1       D  
q2      1       D       1       D  
#end#MarkerData::CPhaseKnownGenome  
#begin#MarkerData::CPhaseKnownGenome  
id:     E  
m1      11      E       11      E  
m2      10      E       10      E  
m3      10      E       10      E  
q1      0       E       0       E  
q2      0       E       0       E  
#end#MarkerData::CPhaseKnownGenome

The first section between the #begin and #end delimiters describes the location of genetic variants that will be tracked during the simulation. In this case, the founders have genotypes at five loci: m1, m2, m3, q1 and q2. The loci m1, m2, m3 and q1 are spread along chromosome 1 at 0cM, 20cM, 50cM and 45cM, and the locus q2 is on chromosome 2 at 5cM. The remaining sections specify the alleles and descent at each locus. Specifically, the line

m2   10   A    11   B

refers to the m1 locus and says that this individual has allele “10” descended from ancestor A on both the first haplotype and allele “11” descended from B on the second haplotype. (Note that in the above founders file, all founders happen to be inbred). The alleles, founders and markers can be named using any alphanumeric string, and there is no restriction on the number of alleles per locus. Moreover, the order in which the loci are listed for a given individual is ignored: Valbreed internally reorders everything using the chromosome names and cM distances, as we will see later.

Lastly, we generate a configuration file for Valbreed archie.config, which in this case just has the name of the ped file that specifies the breeding:

breeding_strategy                Pedigree  
Pedigree_file                    archiebreeding.ped

Running at the command line

valbreed archiefounders.md archie.config 1 --write_md=out.md

simulates new individuals based on the founders and the pedigree. The third argument is the seed used for the random number generator and can be any integer. The –write_md option writes the whole population to out.md, which when I did it gave a file like archiefounders.md but with the following additional lines:

#begin#MarkerData::CPhaseKnownGenome  
id:     Archie  
m1      10      B       11      A  
m2      11      B       11      B  
q1      1       B       1       B  
m3      11      B       11      B  
q2      1       A       1       B  
#end#MarkerData::CPhaseKnownGenome  
#begin#MarkerData::CPhaseKnownGenome  
id:     Fred  
m1      10      B       11      E  
m2      11      B       10      E  
q1      1       B       0       E  
m3      11      B       10      E  
q2      1       A       1       D  
#end#MarkerData::CPhaseKnownGenome  
#begin#MarkerData::CPhaseKnownGenome  
id:     John  
m1      10      B       11      A  
m2      11      B       10      A  
q1      1       B       1       A  
m3      11      B       11      A  
q2      1       B       1       A  
#end#MarkerData::CPhaseKnownGenome  
#begin#MarkerData::CPhaseKnownGenome  
id:     Mary  
m1      10      B       11      A  
m2      11      B       10      A  
q1      1       B       1       A  
m3      11      B       11      A  
q2      1       B       1       A  
#end#MarkerData::CPhaseKnownGenome  
#begin#MarkerData::CPhaseKnownGenome  
id:     Sam  
m1      11      E       10      D  
m2      10      E       10      D  
q1      0       E       1       D  
m3      10      E       10      D  
q2      0       E       1       D  
#end#MarkerData::CPhaseKnownGenome

2 Input Files