Computing annotations
-annotate-sequence
option can be used to extract sequence bases from FASTA file(s)
and annotate the output file with them. E.g.:
-annotate-sequence
, the -flanking
option tells QCTOOL to additionally annotate output with flanking sequence
from FASTA files. For example:
-annotate-genetic-map
option can be used to output genetic (recombination) map coordinates
for each variant, e.g:
The genetic map files should be in the 'hapmap' format, i.e. one file per chromosome
with three columns specifying position, recombination rate in centimorgans per megabase,
and the accumulated recombination map position. It is expected that genetic map files
are split by chromosome, and the chromosome is inferred from the filename.
Suitable genetic map files for human build 37 can be found
on the IMPUTE2 website.
The output will contain columns cM_per_Mb
and cM_from_start_of_chromosome
.
-annotate-bed3
and -annotated-bed4
options can be used to compute
membership of the intervals
in a BED file, or
the value(s) assigned to intervals in a bed file, at each input variant:
Output will contain a column with the same name as the BED file (minus the
.bed
or .bed.gz
extension).
For -annotate-bed3
, this column will contain a 1 if the variant was contained
in an interval in the file, or 0 otherwise. For -annotate-bed4
, the column
will contain a comma-separated list of values from the fourth column of the BED file,
for those intervals which the variant is in.
Note: BED files are assumed to contain intervals in 0-based, right-open coordinates, while QCTOOL by convention assumes genotype data is expressed in 1-based coordinates. QCTOOL handles this internally by adding 1 to the start coordinate of each interval.
It's also possible to compute membership of intervals in a set of BED files in the same column.
The general syntax is
-annotate-bed[3|4] file1.bed[,file2.bed[,...]][+<N>bp]
.
This internally concatenates file1.bed
, file2.bed
, etc. into a single
list of intervals. Further, if the +<N>bp
modifier is added, where N
is an integer, then all intervals are expanded by N bases to the left and the right
before processing. For example: