Merging variants from one dataset into another
-merge-in
option can be used to merge variants in one dataset into another. For example:
This command produces a dataset that contains a record for each variant
from first.bgen
and a record for each variant from second.bgen
- i.e. it has L1+L2 variants,
where L1 and L2 are the
number of variants in the two datsets.
Data is output for the set of samples in the first dataset; any other samples in the merged-in dataset are ignored.
By default, samples are matched by the first ID column in each dataset.
The -match-sample-ids
option can be used to change this. For example:
column1
and column2
are columns in first.sample
and second.sample
respectively,
containing the fields to match on.
We recommend that sample file columns used to match samples should contain unique sample identifiers.
The -merge-strategy
option controls what happens when the same variant appears in both
datasets. Possible values are -keep-all
(the default) or -drop-duplicates
.
For example:
In this command, if the same variant appears in first.bgen
and in second.bgen
,
only the first will be output. As when combining datasets,
the -compare-variants-by
option is used to control how variants are compared, and it is assumed
that variants are sorted by these fields in each input dataset.
To further help disambiguate the source of data in the output file,
the -merge-prefix
option can also be used to add a prefix to the identifier of each merged-in -variant, e.g.:
Currently this only affects the 'alternate' identifier fields (e.g. the SNPID field of GEN or BGEN files).