QCTOOL is a command-line utility program for basic quality control of gwas datasets. It supports the same file formats used by the WTCCC studies, as well as the binary file format described here, and is designed to work seamlessly with SNPTEST and related tools. QCTOOL computes per-sample and per-SNP summary statistics, and uses these to filter out samples and SNPs from the dataset (either by removing them from the files or by writing exclusion lists).
QCTOOL can be used to :
- Calculate per-sample and per-SNP summary statistics;
- Filter out samples and SNPs, either by creating new data files or by creating exclusion lists.
- Populate sample files with missingness and heterozygosity information;
- Convert files between different GEN file formats.
QCTOOL has a number of features designed to make it easy to use :
- Automatically run across the whole genome (using the chromosomal wildcard character '#' to find matching files)
- Robust file reading and careful error checking - QCTOOL will tell you if it thinks you are doing something wrong.
- Informative output and progress bars giving an estimate of remaining runtime.
Acknowledgements. The following people contributed to the design and implementation of qctool:
In addition, QCTOOL contains the SNP-HWE code by Jan Wigginton et al., described in "A Note on Exact Tests of Hardy-Weinberg Equilibrium", Wigginton et al, Am. J. Hum. Genet (2005) 76:887-93Contact. For more information or questions, please contact me at the following email address:
gavin.band (at) well.ox.ac.uk.
QCTOOL works according to the schematic on the right. A detailed list of options is given by the command
$ qctool -helpwhich produces this output.
Qctool works with the following per-sample summary statistics, calculated using the -sample-stats option:
- Missing data proportion
- the total proportion of missing genotype data for this sample across all SNPs. This is the sum of the three genotype probabilities for the sample across all SNPs, divided by the total number of SNPs. A large missing data proportion might be due, for example, to a badly-prepared sample. You can filter on missingness using the -sample-missing-rate option.
- Heterozygosity
- This is the sum of heterozygote call probabilities across all SNPs divided by the total number of SNPs. A high value of heterozygosity might indicate, for example, that the DNA from this sample was been accidentally mixed with another sample during processing; a low value might indicate a higher degree of relatedness than expected among the ancestors of the individual. You can filter on heterozygosity using the -heterozygosity option.
Qctool works with the following per-SNP summary statistics:
- Missing data proportion
- The proportion of missing genotype data (null genotype call probabilities) across all samples for the SNP. A high value indicates that the SNP is not well called. You can filter these SNPs out using the -snp-missing-rate option.
- Missing call proportion
- The proportion of individuals for which the maximum genotype probability is less than a threshhold of 0.9. You can filter these SNPs out using the -snp-missing-call-rate option.
- Minor allele frequency
- The estimated frequency of the less common allele. The -maf option can be used to retain only SNPs within a given range of minor allele frequencies.
- Information
- A measure of how much statistical information the genotypes provide about the allele frequency, ranging from 0 (least information) to 1 (most information). Use the -info option to filter on this statistic.
In general, qctool tries to warn you if it thinks you are doing something wrong. In these cases you can override qctool using the -force option.
For more information, see the tutorial.
QCTOOL is available either as binaries or as source code.
Binaries
Pre-compiled binaries are available for the following platforms.
| Platform | File |
|---|---|
| Linux x86-64 static build | qctool_v1.0-static-linux-x86-64.bz2 (1000kb) |
| Mac OS X 10.6.3 | qctool_v1.0-static-osx-10.6.3.bz2 (432kb) |
To run qctool, download the relevant file and extract it using the following commands
$ bunzip2 qctool-static-<machine>.bz2 $ chmod u+x qctool-static-<machine> $ ./qctool-static-<machine> -help
Source
The source code to qctool is available as a mercurial repository here. Assuming you have mercurial installed, a basic download and compilation sequence (for the currently released version) would be:
$ hg clone --rev release https://gavinband@bitbucket.org/gavinband/qctool destination directory: qctool requesting all changes adding changesets adding manifests adding file changes added 232 changesets with 1170 changes to 270 files updating to branch default 201 files updated, 0 files merged, 0 files removed, 0 files unresolved $ cd qctool $ ./waf-1.5.8 configure $ ./waf-1.5.8
./build/release/qctool
You will need boost and zlib installed. More detailed build instructions can be found on the QCTOOL wiki.