Filtering samples
Filtering samples by identifier
The
-incl-samples
and -excl-samples
options can be used to select a subset of
samples to process. E.g.:
$ qctool -g example.bgen
-s example.sample
-og filtered.bgen
-excl-samples samples.txt
This command excludes all samples whose identifier is in the file
samples.txt
(which should contain a whitespace-separated list of identifiers).
Samples are identified by the first identifier field (often ID_1
) in the sample file, or if no
sample file is specified, by sample identifiers specified in genotype data
source (e.g. by the header in vcf or bgen formats). The option -incl-samples
behaves similarly but includes only samples with identifier in the given file.
Filtering samples by sample file column
It's also possible to filter samples based on the value of a column in the sample file. The general command is:
$ qctool -g <genotype file> -s <sample file>
-[in|ex]cl-samples-where <column>[=|==|!=]<value> [+other options]
For example, the following command will write a new filtered genotype file excluding
all samples listed as "male" in the "sex" column of the sample file:
$ qctool -g example.bgen
-s example.sample
-og filtered.bgen
-excl-samples-where 'sex = male'
You can currently filter sample file column values
for equality (using =
or ==
) or inequality (!=
).
Note: depending on the column names and expression used, it may be necessary
to place quotation marks around the expression to stop it being expanded by the shell.
As in the example above this also allows you to use whitespace to format the condition.
The -incl-samples-where
behaves similarly, but includes only samples matching the condition.
Combining filters
In general inclusion and exclusion options come in pairs of the form
-incl-<option>
, -excl-<option>
.
If multiple overlapping conditions are specified, the logic is that samples that satisfy any of the specified exclusion criteria will
be excluded, and likewise samples that do not satisfy any of the specified inclusion criteria will be excluded. This in the following command:
$ qctool -g example.bgen
-s example.sample
-og filtered.bgen
-excl-samples-where 'sex = male'
-incl-samples samples.txt
The output will contain only male samples that are also listed in the file
samples.txt
.