Filtering variants based on an external file
QCTOOL has a set of options to filter variants, each namely:
Here are examples of these options:
$ qctool -g example.bgen -og subsetted.bgen -excl-rsids <filename>
Here the specified file should contain a whitespace-separated list of rsids that will be excluded from processing.
$ qctool -g example.bgen -og subsetted.bgen -excl-snpids <filename>
Ditto, but for alternate IDs.
$ qctool -g example.bgen -og subsetted.bgen -excl-positions <filename>
The specified file should contain a list of genomic positions in the format
The chromosome should be omitted if you want to specify variants that have missing chromosome information.
$ qctool -g example.bgen -og subsetted.bgen -excl-variants <filename>
The specified file should contain a list of variants. Currently this must be a text file with six named
columns; the first four must be
followed by columns containing the first and second alleles. The
how variants are matched to this file - see the page on sorting data
for more information
on this option.
-[in|ex]cl-range option filters variants by range. E.g.:
$ qctool -g example.bgen -og subsetted.bgen -[in|ex]cl-range [chromosome]:[start]-[end]
This includes / excludes any variant in the given range. Ranges are treated as closed ranges,
i.e. the range includes both start and end positions.
Optionally you can omit the chromosome; this will additionally capture variants that
have missing chromosome information. Also, either the start or end positions can be omitted, in which case the range is treated as containing
all positions up to the end coordinate, or all variants from the start coordinate onwards respectively.
Examples of valid ranges are:
Wildcard variant filtering
You can filter variants based on a wildcard match of ID fields. E.g.:
$ qctool -g example.bgen -og subsetted.bgen -incl-variants-matching rsid~rs1%
This command will retain all variants that have rsid starting with 'rs1'. The general format of this command is:
$ qctool -g example.bgen -og subsetted.bgen -incl-variants-matching [field~][value]
field can be 'snpid' (matching all alternate IDs) or 'rsid' (matching the first, or rs id),
or it can be omitted to match any id. The value can optionally contain a single '%' character, which will
expand to match any string value. A complete match is required, hence the value 'a%b'
will match the ID 'ab', 'a1b', etc., but not 'zab' or 'ab2'.
Combining multiple filters in the same command
The logic for processing multiple inclusion/exclusion options is as follows.
First, if any inclusion option is specified multiple times, the results are logically ORd together.
(Thus, for example specifying
-incl-range twice results in including variants in either range).
Second, the resulting conditions are ANDed together. This means that a variant will then be
included if it is included by each of the inclusion options and is not
excluded by any exclusion option.
For example, the following command includes any variant that is in either range and that is not in the given file:
$ qctool -g example.bgen -og subsetted.bgen -incl-range 1:-1000 -incl-range 2:-1000 -excl-rsids rsids.txt
while the following command includes only
variants that are in the given range and have rsid staring with "rs1":
$ qctool -g example.bgen -og subsetted.bgen -incl-range 1:-1000 -incl-variants-matching rsid~rs1%