qctool v2
A tool for quality control and analysis of gwas datasets.

Filtering variants

Filtering variants based on an external file
QCTOOL has a set of options to filter variants, each namely: -[in|ex]cl-rsids, -[in|ex]cl-snpids, -[in|ex]cl-positions, -[in|ex]cl-variants, -[in|ex]cl-variants-matching. Here are examples of these options:
$ qctool -g example.bgen -og subsetted.bgen -excl-rsids <filename>
Here the specified file should contain a whitespace-separated list of rsids that will be excluded from processing.
$ qctool -g example.bgen -og subsetted.bgen -excl-snpids <filename>
Ditto, but for alternate IDs.
$ qctool -g example.bgen -og subsetted.bgen -excl-positions <filename>
The specified file should contain a list of genomic positions in the format [chromosome:]position. The chromosome should be omitted if you want to specify variants that have missing chromosome information.
$ qctool -g example.bgen -og subsetted.bgen -excl-variants <filename>
The specified file should contain a list of variants. Currently this must be a text file with six named columns; the first four must be SNPID, rsid, chromosome, position, followed by columns containing the first and second alleles. The -compare-variants-by option control how variants are matched to this file - see the page on sorting data for more information on this option.
Range filtering
The -[in|ex]cl-range option filters variants by range. E.g.:
$ qctool -g example.bgen -og subsetted.bgen -[in|ex]cl-range [chromosome]:[start]-[end]
This includes / excludes any variant in the given range. Ranges are treated as closed ranges, i.e. the range includes both start and end positions. Optionally you can omit the chromosome; this will additionally capture variants that have missing chromosome information. Also, either the start or end positions can be omitted, in which case the range is treated as containing all positions up to the end coordinate, or all variants from the start coordinate onwards respectively. Examples of valid ranges are: 1:100-200, 1:-200, or X:1000000-.
Wildcard variant filtering
You can filter variants based on a wildcard match of ID fields. E.g.:
$ qctool -g example.bgen -og subsetted.bgen -incl-variants-matching rsid~rs1%
This command will retain all variants that have rsid starting with 'rs1'. The general format of this command is:
$ qctool -g example.bgen -og subsetted.bgen -incl-variants-matching [field~][value]
Here field can be 'snpid' (matching all alternate IDs) or 'rsid' (matching the first, or rs id), or it can be omitted to match any id. The value can optionally contain a single '%' character, which will expand to match any string value. A complete match is required, hence the value 'a%b' will match the ID 'ab', 'a1b', etc., but not 'zab' or 'ab2'.
Combining multiple filters in the same command

The logic for processing multiple inclusion/exclusion options is as follows. First, if any inclusion option is specified multiple times, the results are logically ORd together. (Thus, for example specifying -incl-range twice results in including variants in either range). Second, the resulting conditions are ANDed together. This means that a variant will then be included if it is included by each of the inclusion options and is not excluded by any exclusion option.

For example, the following command includes any variant that is in either range and that is not in the given file:

$ qctool -g example.bgen -og subsetted.bgen -incl-range 1:-1000 -incl-range 2:-1000 -excl-rsids rsids.txt

while the following command includes only variants that are in the given range and have rsid staring with "rs1":

$ qctool -g example.bgen -og subsetted.bgen -incl-range 1:-1000 -incl-variants-matching rsid~rs1%