A fast, accurate tool for sequence analysis

Whole-genome sequencing promises to transform the search for genetic variants that might aid diagnosis in patients with a wide variety of diseases. A software tool developed at WTCHG brings this promise a step closer to reality by offering a quick and simple route to finding these variants.


Called ‘Platypus’, the open-source software has been available for the past couple of years. Now Andy Rimmer, Gerton Lunter and their colleagues have published a paper in Nature Genetics demonstrating its high sensitivity and accuracy in a number of clinical applications.

Finding mutations is harder than it sounds, even with the reference sequence database generated by the Human Genome Project. There are 3 billion ‘letters’ in the human genome and sequencing machines are not 100 per cent accurate, so the ‘signals’ of a handful of variants have to be picked out from the ‘noise’ generated by the errors in sequencing.

Scientists reduce the noise by sequencing each region at least 30 times over. It is then a matter of lining up an individual’s sequence against the reference sequence and looking for the differences. Most other software tools that do this, look at these differences  letter by letter.  Instead, Platypus works with 1000 letters at a time, directly from the raw sequence reads, and looks for variants in the overlaps as it reassembles them. This reveals not only single-letter changes, but also larger insertions and deletions that can be misinterpreted by the letter-by-letter approach.

To demonstrate the clinical usefulness of the software, Rimmer, Lunter and their colleagues successfully identified mutations in 15 children who had a serious disease but whose parents were unaffected. DNA from these families had already been sequenced as part of the WGS500 project.

In another application they used Platypus to examine changes in the highly variable HLA region of the genome, which determines whether or not transplanted tissue will be rejected. Platypus was able to identify the correct HLA type from a database of over 1000 such types, holding out the prospect of a simpler, faster and less labour intensive method of typing blood and tissue before operations. ‘Compared with conventional methods of typing, which progressively distinguish types to a finer and finer level, Platypus gives you all the levels at once’, says Lunter. ‘It is very simple to use and very fast, and the quality is at least as good as existing tools.’


Andy Rimmer, Hang Phan, Iain Mathieson, Zamin Iqbal, Stephen R F Twigg, WGS500 Consortium, Andrew O M Wilkie, Gil McVean & Gerton Lunter. Integrating mapping, assembly and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics 2014: published online 13 July doi:10.1038/ng.3036