Platypus: A Haplotype-Based Variant Caller For Next Generation Sequence Data

Authors:  Andy Rimmer, Hang Phan, Iain Mathieson, Gerton Lunter, Gil McVean


Platypus is a tool designed for efficient and accurate variant-detection in high-throughput sequencing data. Platypus uses local realignment of reads and local assembly to detect variants. It is both sensitive and specific. Platypus can detect SNPs, MNPs, short indels, replacements and (using the assembly option) deletions up to several kb. It has been extensively tested on whole-genome, exon-capture, and targeted capture data, and has been run on very large datasets as part of the Thousand Genomes and WGS500 projects, and is being used in clinical sequencing trials in the Mainstreaming Cancer Genetics programme. Platypus has been thoroughly tested on data mapped with Stampy and BWA. I have not tested it with other mappers, but believe it should behave well. Platypus has been used to detect variants in Human, Mouse, Rat, and Chimpanzee samples, amongst others; it should perform well on data from any diploid organism; it has also been used to find somatic mutations in cancer, and mozaic mutations in human exome data.

Capabilities Platypus reads data from BAM files, and outputs a single VCF file containing a list of identified variants, and genotype calls and likelihoods for all samples. It can identify SNPs, MNPs and short (less than one read length) indels, and has experimental support for larger (up to several kb deletions and maybe 200bp insertions) variants using local assembly. Platypus can process large amounts of BAM data very efficiently, and can handle samples spread across multiple BAM files. Duplicate read marking, local re-alignment, and variant identification and filtering are performed on-the-fly using a single command. Platypus will run on any input data in BAM format, but has only been properly tested on Illumina data.
Reference:  The Platypus paper is currently out for review. For the time being, if you use Platypus, please reference this site, with the following text: Rimmer A, Mathieson I, Lunter G, McVean G, (2012) Platypus: An Integrated Variant Caller (

Download Platypus:

You can download the latest stable version of Platypus here.


Platypus is written in Python, Cython and C. It requires only Python (>= 2.6) and a C compiler to build; these are standard on most linux and Mac OS distributions, and Platypus should build and run without problems for most people.

Building Platypus

To build Platypus, simply un-pack the tar-ball and run the script provided:

tar -xvzf Platypus_x.x.x.tgz

cd Platypus_x.x.x


This will take a minute or so, and generate quite a lot of warnings. If the build is successful, you will see a message, 'Finished building Platypus'. Platypus is then ready for variant-calling.

Running Platypus

Platypus can be run from the commnad-line, using Python. It needs 1 or more BAM input files, and a FASTA reference file. The BAM file(s) must be indexed using Samtools or an equivalent program, and the FASTA file must also be indexed using 'samtools faidx' or equivalent.

The simplest way to tun Platypus is as follows:

python callVariants --bamFiles=input.bam --refFile=ref.fa --output=VariantCalls.vcf

the output will be a single VCF file containing all the variants that Platypus identified, and a 'log.txt' file, containing log information. The last line in the log file, and on the command-line output, should be 'Finished variant calling'. This means that the calling has completed without major errors. It is a good idea to also check the log output for warnings or errors.


Examples of how to run Platypus in different settings are here. For Frequently Asked Questions (FAQ), see here. A README file, descibing the various options you can set when running Platypus, as well as the VCF output information is included in the download, and the same information is documented here

Contact:  Bug reports, comments, and feature requests (positive feedback also greatly appreciated) can be sent to Andy Rimmer (