qctool v2
A tool for quality control and analysis of gwas datasets.

Merging variants from one dataset into another

Merging data
The -merge-in option can be used to merge variants in one dataset into another. For example:
$ qctool -g first.bgen -s first.sample -merge-in second.bgen second.sample -og merged.bgen

This command produces a dataset that contains a record for each variant from first.bgen and a record for each variant from second.bgen - i.e. it has L1+L2 variants, where L1 and L2 are the number of variants in the two datsets.

Data is output for the set of samples in the first dataset; any other samples in the merged-in dataset are ignored.

Controlling how samples are matched between datasets

By default, samples are matched by the first ID column in each dataset. The -match-sample-ids option can be used to change this. For example:

$ qctool -g first.bgen -s first.sample -merge-in second.bgen second.sample -og merged.bgen -match-sample-ids column1~column2
Where column1 and column2 are columns in first.sample and second.sample respectively, containing the fields to match on. We recommend that sample file columns used to match samples should contain unique sample identifiers.
Controlling what variants appear in the output

The -merge-strategy option controls what happens when the same variant appears in both datasets. Possible values are -keep-all (the default) or -drop-duplicates. For example:

$ qctool -g first.bgen -s first.sample -merge-in second.bgen second.sample -og merged.bgen -merge-strategy drop-duplicates

In this command, if the same variant appears in first.bgen and in second.bgen, only the first will be output. As when combining datasets, the -compare-variants-by option is used to control how variants are compared, and it is assumed that variants are sorted by these fields in each input dataset.

To further help disambiguate the source of data in the output file, the -merge-prefix option can also be used to add a prefix to the identifier of each merged-in -variant, e.g.:

$ qctool -g first.bgen -s first.sample -merge-in second.bgen -s second.sample -og merged.bgen -merge-prefix "merged:"

Currently this only affects the 'alternate' identifier fields (e.g. the SNPID field of GEN or BGEN files).