Illuminus version 2.0

Documentation

 

NAME

            illuminus – an Illumina genotype calling algorithm

 

 

SYNOPSIS

            illuminus [options] [-i INPUT_FILE] [-o OUTPUT_FILE]

 

 

DESCRIPTION

 

The code reads in a text file (columns: rs, coord, allelesAB, id_1a, id_1b, id_2a, id_2b, etc), and iterates using an EM algorithm to a convergent set of calls. Please note that Illumina microarrays which do not contain a certain beadtype (SNP) have their intensities represented by 'NaN', which will be handled automatically. If there are negative intensities, these will be considered missing.

 

Please see the academic paper for a detailed description, and cite this reference if using data generated by the software:

 

Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP, Clark TG. A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics 2007, Oct 15;23(20):2741-6.

 

 

OPTIONS

 

            -i FILE

 

The input file name, Please see the example input file, example.txt.

 

            -o FILE

 

The output file name. This will have the suffix '_calls' appended to it for            the genotype calls and '_probs' for the posterior probabilities (if that option is chosen). The output format is space-delimited with columns: coordinate, rs, perturbation score, allelesAB, call_1, call_2, call_3,.... The order of the calls is the same as the header from the input file. The calls are encoded as 1 = AA, 2 = AB (heterozygote), 3 = BB, 4 = NN (no call).

 

            -t NUM

 

The no call threshold, the default for this value is 0.95.

 

            -p

 

Output the posterior probabilities for each possible call (1, 2, 3, 4) for each SNP.

 

            -w

 

Optimise the algorithm for whole genome amplified DNA. Please see paper for details.

 

            -a

 

Perform perturbation analysis on each SNP. Briefly, this introduces an error term to the input intensities and each SNP is recalled with the 'perturbed' X/Y values. The concordance rate between the original and perturbed genotypes, or perturbation score, is then outputted adjacent to the SNP 'rs' number in the output file. Currently, we recommend a perturbation score of >0.95 to represent 'stable' genotypes. Please see the paper for details.

 

            -x  FILE

 

A file with indicators of sex (=1 for male), so that chromosome X may be genotyped. Unlike the autosomal chromosomes, Hardy-Weinberg equilibrium is not assumed in the calling of genotypes.

 

 

            -s NUM1 NUM2

 

Only cluster intensities for a range of SNPs (from NUM1 to NUM2). This option is essential for parallelisation of illuminus since each SNP is clustered independent of the others. It is also very useful for memory control.

 

BUGS

 

Please report any bugs or problems in the software to Taane Clark (tgc@well.ox.ac.uk), YY Teo  (yy.teo@well.ox.ac.uk) or Mike Inouye (mi1@sanger.ac.uk)

 

 

EXAMPLE OF USAGE

 

./illuminus -i example.txt -o out -c -a -p

 

This will run illuminus on example.txt, outputting out_calls and out_probs, as well as performing a perturbation analysis reported in both files.

 

 

 

CLICK HERE TO DOWNLOAD A LINUX (-NOT WINDOWS) EXECUTABLE

 

CLICK HERE TO DOWNLOAD TARRED-GZIPPED SOURCE CODE

 

CLICK HERE TO DOWNLOAD A GZIPPED EXAMPLE DATASET

 

 

Software and Page Last Dated: 23/10/2008