Platypus  FAQ

Frequently asked questions relating to various aspects of tha Platypus Variant Caller. These questions are split into sections in a hopefully obvious way.

Installation and Setup

What do I need to run Platypus?

You need Python 2.6 or later (I haven't tested Platypus on Python 3.X yet), and a C compiler, both of which should be standard on most linux and Mac OS distributions.
   

What platforms does Platypus work on?

Platypus has been tested on Linux and Mac OS, but not Windows. I do not expect it to work on Windows, though I'd be happy to hear if anyone has managed to do this.
   

Can I install Platypus without being a root user/administrator?

Yes. Platypus does not install any libraries/files anywhere on your system except in the directory created when you unpack the downloaded file. Everything is build locally, which means that you can download and install Platypus without needing root/administrative privileges.
   

How do I uninstall/remove Platypus?

Simply delete the Platypus_X.X.X folder and all its contents.
   

Can I run multiple versions of Platypus on the same system without problems?

Yes. See above: nothing is installed in any system folders: everything is contained in the original Platypus_X.X.X folder. So you can have several of these folders, containing different versions of Platypus, on the same system, without any problems.
   
Can I run Platypus using multiple processors/cores? Yes. Platypus is designed for parallel processing. The default is to run on only one processor. If you want to parallelise Platypus, use the "--nCPU=X' option, where 'X' is the number of processors/cores you want to use. The calling will then be run in 'X' jobs, and Platypus will merge the output at the end. This should result in a speedup of roughly 'X' times.

 

Running Platypus

How do I run Platypus?

The easiest way to run Platypus is like this: python Platypus.py callVariants --bamFiles=data.bam --refFile=REF.fa --output=Calls.vcf By default Platypus will process all the data and output calls in all regions.
   

How do I run on multible BAM files?

For the --bamFiles option, either specify a comma-separated list of BAMs (--bamFiles=file1.bam,file2.bam,file3.bam,...) or use a text file with one BAM file name per line (--bamFiles=listOfFiles.txt)
   

How do I run only on one chromosome?

Use the --regions option to specify a single chromosome (--regions=chrX)
   

How do I run on several chromosomes

Use the --regions option to specify a comma-separated list of chromosomes (--regions=chr1,chr2,chr3)
   

How do I run on a specific region of one or more chromosomes?

Use the --regions option to specify a comma-separated list of regions, where each region looks like chr:start-end (--regions=chr1:0-100000,chr2:1000-2000,chrY:0-100)
   

What if I want to specify lots of regions?

Use a text file containing a list of regions, one on each line, with the format chr:start-end and specify this using --regions=regions.txt
   

How can I get a list of all Platypus options?

 Running the command python Platypus.py callVariants --help will give you a list of all command-line options. These are also documented in the README file distributed with Platypus.

Running Platypus On Cancer Tumour/Normal Data

Can Platypus detect low frequency somatic mutations?

Yes. Platypus is quite sensitive to low-frequency (i.e. significantly below 50% in a single sample) mutations. Sensitivity to very low frequency mutations increases with coverage. Platypus requires at least 2 reads covering a mutation before it will consider it for calling. Some low frequency mutations will be flagged as 'alleleBiased' rather than 'PASS' in the VCF file, so make sure not to filter these if you're looking for somatic changes.

 

 

Should I call tumour/normal pairs together in the same Platypus run?

Yes, absolutely. If you want to detect somatic changes it is essential that these are run together, then you will have information for all the same sites in both samples.
   

What about large variants?

Platypus can detect anything up to about one read length in size by default. If you enable the assembly option (--assemble=1) then larger variants, up to around 1kb (see assembly options) can be called. Platypus does not currently deal with very large structural variants. Anything larger than a few kb will not be identified.

 

Using The Assembly Option

Can platypus assemble large variants?

Yes. Platypus uses local assembly in regions of 1.5kb across the genome to assemble variants which are too large to fit into a single read. The assembly is not as thoroughly tested as the default variant-calling, but seems to give good results. To enable the assembly, set the option --assemble=1.Platypus will run more slowly with this enabled (maybe a factor of 2-3 slower, depending on the data, but I haven't timed it properly).
   

How large?

The size limit on what it can detect is given by a parameter, --assemblyRegionSize, which defaults to 1.5kb.
   

Does it use distantly-mapped mates for assembly?

Yes, but you have to enable this. If you want Platypus to recover the mates of reads which are mapped to different chromosomes or far apart on the same chromosome and use these for local assembly, set the option --assembleBrokenPairs=1This will slow things down even more, but will increase sensitivity for larger insertions and deletions.

 

Outputting Reference Calls With Platypus

Can Platypus output reference call blocks?

Yes. You can enable this with the option --outputRefCalls=1Platypus will output hard reference calls in blocks of 1kb. Platypus attempts to quantify the likelihood that the data covering each block/region supports only the reference allele, using coverage and quality information.
   
How are these calls reported? Each reference call is output in the VCF file, and marked with 'REFCALL' in the filter column, and an 'END' tag in the info column. The calls are in blocks, starting at the position listed in the normal VCF position column, and ending at the position marked by the 'END' tag.
   
What does the quality score mean? These are PHRED scores. If the score is high, then there is good support for the reference allele in that block; if this score is low, then there is either some evidence for variation within the block, or insufficient good quality coverage to make a strong reference call.

 

Running Platypus On Haloplex/Targeted Capture Data
(See here for more information on this)




Using Platypus For Genotyping A List Of Known Variants

Can I use Platypus to genotype my data for a list of known sites/alleles? Yes. See here for the exact commands. You can give Platypus a compressed, indexed VCF of known variants, and it will output genotypes for each sample for each variant, with likelihoods.
   
Can I do this at the same time as normal variant detection? Yes, if you like. Platypus can use the input VCF to augment its own list of variant candidates. This can work particularly well for larger variants, which might be hard to detect from the read alignments.
   
Can I supply variants from several sources, i.e. multiple VCFs Yes, just specify a comma-separated list of VCFs on the command-line, as shown here