Merging variants from one dataset into another
-merge-inoption can be used to merge variants in one dataset into another. For example:
This command produces a dataset that contains a record for each variant
first.bgen and a record for each variant from
- i.e. it has L1+L2 variants,
where L1 and L2 are the
number of variants in the two datsets.
Data is output for the set of samples in the first dataset; any other samples in the merged-in dataset are ignored.
By default, samples are matched by the first ID column in each dataset.
-match-sample-ids option can be used to change this. For example:
column2are columns in
second.samplerespectively, containing the fields to match on. We recommend that sample file columns used to match samples should contain unique sample identifiers.
-merge-strategy option controls what happens when the same variant appears in both
datasets. Possible values are
-keep-all (the default) or
In this command, if the same variant appears in
first.bgen and in
only the first will be output. As when combining datasets,
-compare-variants-by option is used to control how variants are compared, and it is assumed
that variants are sorted by these fields in each input dataset.
To further help disambiguate the source of data in the output file,
-merge-prefix option can also be used to add a prefix to the identifier of each merged-in -variant, e.g.:
Currently this only affects the 'alternate' identifier fields (e.g. the SNPID field of GEN or BGEN files).