qctool v2
A tool for quality control and analysis of gwas datasets.

Computing annotations

Annotating with sequence bases
The -annotate-sequence option can be used to extract sequence bases from FASTA file(s) and annotate the output file with them. E.g.:
$ qctool -g <input file> -osnp output.txt -annotate-sequence chr#.fa reference
Note Currently, to be used with this option, FASTA files must be split into one file per chromosome; the chromosome is then inferred from the file name (using the chromosomal wildcard character #) as in the command above. Also, it's assumed that the FASTA file starts at base pair 1 so that bases may be looked up in the file. I have used this to annotate alleles from the human reference sequence (e.g. this one) or ancestral sequence (e.g. this one).
Annotating with flanking sequence
When using -annotate-sequence, the -flanking option tells QCTOOL to additionally annotate output with flanking sequence from FASTA files. For example:
$ qctool -g <input file> -osnp output.txt -annotate-sequence chr#.fa reference -flanking 200 200
This will output the 200bp from the FASTA file preceding and following each variant, and the bases covered by the variant's alleles.
Annotating with genetic map information
The -annotate-genetic-map option can be used to output genetic (recombination) map coordinates for each variant, e.g:
$ qctool -g <input file(s)> -annotate-genetic-map genetic_map_chr#.txt -osnp output.txt

The genetic map files should be in the 'hapmap' format, i.e. one file per chromosome with three columns specifying position, recombination rate in centimorgans per megabase, and the accumulated recombination map position. It is expected that genetic map files are split by chromosome, and the chromosome is inferred from the filename. Suitable genetic map files for human build 37 can be found on the IMPUTE2 website. The output will contain columns cM_per_Mb and cM_from_start_of_chromosome.

Interval annotations
The -annotate-bed3 and -annotated-bed4 options can be used to compute membership of the intervals in a BED file, or the value(s) assigned to intervals in a bed file, at each input variant:
$ qctool -g <input file(s)> -annotate-bed3 file1.bed -annotated-bed4 file2.bed

Output will contain a column with the same name as the BED file (minus the .bed or .bed.gz extension). For -annotate-bed3, this column will contain a 1 if the variant was contained in an interval in the file, or 0 otherwise. For -annotate-bed4, the column will contain a comma-separated list of values from the fourth column of the BED file, for those intervals which the variant is in.

Note: BED files are assumed to contain intervals in 0-based, right-open coordinates, while QCTOOL by convention assumes genotype data is expressed in 1-based coordinates. QCTOOL handles this internally by adding 1 to the start coordinate of each interval.

It's also possible to compute membership of intervals in a set of BED files in the same column. The general syntax is -annotate-bed[3|4] file1.bed[,file2.bed[,...]][+<N>bp]. This internally concatenates file1.bed, file2.bed, etc. into a single list of intervals. Further, if the +<N>bp modifier is added, where N is an integer, then all intervals are expanded by N bases to the left and the right before processing. For example:

$ qctool -g <input file(s)> -annotate-bed4 file1.bed,file2.bed+100bp
This command will annotate each variant with the values of all intervals that it lies within 100bp of.