Filtering samples by identifier
-excl-samplesoptions can be used to select a subset of samples to process. E.g.:
$ qctool -g example.bgen -s example.sample -og filtered.bgen -excl-samples samples.txt
This command excludes all samples whose identifier is in the file
samples.txt(which should contain a whitespace-separated list of identifiers). Samples are identified by the first identifier field (often
ID_1) in the sample file, or if no sample file is specified, by sample identifiers specified in genotype data source (e.g. by the header in vcf or bgen formats). The option
-incl-samplesbehaves similarly but includes only samples with identifier in the given file.
Filtering samples by sample file column
It's also possible to filter samples based on the value of a column in the sample file. The general command is:
$ qctool -g <genotype file> -s <sample file> -[in|ex]cl-samples-where <column>[=|==|!=]<value> [+other options]
For example, the following command will write a new filtered genotype file excluding all samples listed as "male" in the "sex" column of the sample file:
$ qctool -g example.bgen -s example.sample -og filtered.bgen -excl-samples-where 'sex = male'
You can currently filter sample file column values
for equality (using
==) or inequality (
Note: depending on the column names and expression used, it may be necessary
to place quotation marks around the expression to stop it being expanded by the shell.
As in the example above this also allows you to use whitespace to format the condition.
-incl-samples-where behaves similarly, but includes only samples matching the condition.
In general inclusion and exclusion options come in pairs of the form
-excl-<option>. If multiple overlapping conditions are specified, the logic is that samples that satisfy any of the specified exclusion criteria will be excluded, and likewise samples that do not satisfy any of the specified inclusion criteria will be excluded. This in the following command:
$ qctool -g example.bgen -s example.sample -og filtered.bgen -excl-samples-where 'sex = male' -incl-samples samples.txt
The output will contain only male samples that are also listed in the file