Assessing linkage disequilibrium
-compute-ld-withoption can be used to compute LD metrics between the genotype data and a seperate set of genotype contained in a second file. E.g.:
For each biallelic variant in
example.bgen, and each biallelic variant in
QCTOOL constructs the pairwise table of non-missing genotypes, uses an EM algorithm to resolve phase
of the double heterozygotes, and then outputs the frequency of each haplotype and the D'
and r2 statistics. (Multiallelic variants are currently not handled.)
Note: to compute LD, samples are matched between datasets using the primary ID (the first column
in the sample files) by default. To alter this, use the
This is described further on the page on merging datasets.
Currently LD output is always to a sqlite database file. In the above command,
results are placed in the file
ld.sqlite and in a table called
LD; a convenience view, called
LDView is also constructed. The simplest way
to view that data is to use the sqlite3 command-line client
(which is already installed on most linux systems by default). E.g. the command:
-max-ld-distanceoptions can be used, e.g.
-max-ld-distance 1000), in kilobases (e.g.
-max-ld-distance 1kb), or in megabases as in the example above.
-prior-ld-weightoption can be used to adjust the strength of this prior, e.g. the command
-stratifyoption tells QCTOOL to compute LD statistics stratified over subsets of the data. E.g. suppose the sample file contains a column called
POPwhich reflects the population group of each sample. Then the command:
will compute LD statistics for all pairs of variants in each population. Output columns will be named
in the form
For example, suppose we've extracted data for the O blood group mutation rs8176719 from the
1000 Genomes Project data into a file named
rs8176719.vcf. This command will
find all tagging SNPs in the flanking region: