QCTOOL is a command-line utility program for manipulation and quality control of gwas datasets and other genome-wide data. QCTOOL can be used
- To compute per-variant and per-sample QC metrics.
- To filter out samples or variants.
- To merge datasets in various ways.
- To convert dataset between file formats. (In particular QCTOOL can read and write BGEN files, including full support for the BGEN v1.2 format that has been used for the UK Biobank imputed data full release).
- To manipulate datasets in various ways - e.g. by updating data fields or aligning alleles to a reference sequence based on information in a strand file.
- To annotate variants with information from BED files, sequence from FASTA files, or with genetic map positions.
- To compute LD metrics between variants.
- To compare genotypes for individuals typed or imputed or phased in different datasets.
- To compute between-sample relatedness and principal components.
- To compute 'genetic risk predictor' scores.
QCTOOL is designed to be as easy-to-use as possible and we hope you find it so. See the documentation page for a description of how QCTOOL works.
IMPORTANT: this page documents QCTOOL version 2, which differs in several important ways from the original QCTOOL v1. Version 1 is still available but is now unsupported.
Contact. Please send any questions or reports of issues to the the OXSTATGEN mailing list.
Change history. QCTOOL version 2 is currently in 'release candidate' state. This means some features may not work, or not work well, or work wrongly, or destroy your computer, or your sanity. The best place to receive support is on the OXSTATGEN mailing list. QCTOOL also has a public issue tracker.QCTOOL v2 also differs in several important ways from the v1 release series. Some important changes in QCTOOL v2 relative to v1 are:
- Support for more file formats: QCTOOL v2 supports a diverse array of common file formats - see the file formats page for more information.
- Support for more features: QCTOOL v2 has a bunch of features not found in v1 - for example it can compute LD metrics, apply strand alignments, annotate variants with information from external sources, and more.
Removal of on-the-fly filtering options: The options for direct filtering based on summary statistics (
-snp-missing-rate, etc.) have been removed. Instead, it's expected you will inspect summary statistics and manually create lists of variants and/or samples for removal, using the
-excl-options to exclude them in a seperate QCTOOL run as described here and here. (That's often what you want anyway, since it's useful to have a record of what you've removed.)
Treatment of chromosomes: QCTOOL v1 always converted chromosomes to a two-digit form (
02, ...) and would treat chromosomes as missing if they were not of specific forms pertinent to human datasets. QCTOOL v2 instead allows arbitrary strings to be used as chromosomes. This change brings QCTOOL into line with other tools, e.g. those that use contig identifiers from a reference genome build. However, this also breaks some workflows that would previously have worked, namely when matching between datasets that have differently encoded chromosome names. A possible workaround is to use the
-map-id-dataoption to replace chromosome identifiers on the fly during analysis.
Changes to output of summary stats: QCTOOL performs several types of per-variant summary computation, that are specified using options like
-annotate-options. When outputting results, all output is sent to a single output file that is specified using the
-osnpoption. This file will automatically inherit columns from each requested computation. Similarly, all per-sample summary computations are output the file specified by
Acknowledgements. The following people contributed to the design and implementation of qctool:
In addition, QCTOOL contains the SNP-HWE code by Jan Wigginton et al., described in "A Note on Exact Tests of Hardy-Weinberg Equilibrium", Wigginton et al, Am. J. Hum. Genet (2005) 76:887-93. Further acknowledgements can be found on the QCTOOL wiki.