Dr Sha (Joe) Zhu

Joe Zhu

Post-doctoral researcher


Wellcome Trust Centre for Human Genetics, Roosevelt Dr. 
Oxford, OX3 7BN


Research summary

My research interest is in phylogenetics, developing methodologies and solving mathematical problems raised by phylogenetic studies, particularly phylogeny reconstruction using coalescent techniques.

My current projects include:

  1. My current work focuses on mixed infection problem in the PF3K study. My objection is to infer the number of mixed strains and proportions with a population reference graph of the PF3K data, which also allows us to impute missing data reliably. I am also interested in understanding the Plasmodium falciparum population structure and differentiation.
  2. Developing tools to infer demographic structures and histories from whole-genome data via the particle filtering technique.


S. J. Zhu, J. A. Garcia, and G. McVean. (2017) Deconvoluting multiple infections in Plasmodium falciparum from high throughput sequencing data. bioRxiv 099499. doi: 10.1101/099499.

S. Zhu and J. H Degnan. Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. Systematic Biology, 2016. doi: 10.1093/sysbio/syw097.

S. Zhu, J. H Degnan, S. J Goldstien, B. Eldon. Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees. BMC Bioinformatics, 16(292), 2015. doi:10.1186/s12859-015-0721-y.

Staab, P. R., S. Zhu, D. Metzler, and G. Lunter (2015). scrm: efficiently simulating long sequences using the approximated coalescent with recombination. Bioinformatics., 31 (10): 1680-1682, 2015.

S. Zhu, C. Than, and T. Wu. Clades and clans: a comparison study of two evolutionary models. Journal of Mathematical Biology, 71 (1)–99-124, 2014.

S. Zhu and M. A. Steel. Does Random Tree Puzzle produce Yule–Harding trees in the many-taxon limit? Mathematical Biosciences, 243(1):109–116, 2013.

S. Zhu, J. H. Degnan, and M. A. Steel. Clades, clans and reciprocal monophyly under neutral evolutionary models. Theoretical Population Biology, 79:220–227, 2011.


DEploid is designed for deconvoluting mixed genomes with unknown proportions. Traditional ‘phasing’ programs are limited to diploid organisms. Our method modifies Li and Stephen’s algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haloptype searches in a multiple infection setting.

scrm is a coalescent simulator for biological sequences. Different to similar programs, it can approximate the Ancestral Recombination Graph as closely as needed, but still has only linear runtime cost for long sequences. It allows you to rapidly simulate chromosome scale sequences with essentially correct genetic linkage.

hybrid-coal is used to compute gene tree probabilities given species network under coalescent process. We use a new representation of the species network likelihood that expresses the probability distribution of the gene tree topolgies as a linear combination of gene tree distributions given a set of species trees.

hybrid-Lambda is a software package that can simulate gene trees within a rooted species network or a rooted species tree under the Kingman's coalescent or Lambda coalescent process.

Grant involvement

Collaborator, Icelandic Centre for Research Grant of Excellence 185151-051, Population genomics of highly fecund codfish. PIs: Einar Arnason (corresponding PI, University of Iceland), Katrin Halldorsdottir (University of Iceland), Alison Etheridge (University of Oxford), Wolfgang Stephan and Bjarki Eldon (Berlin Natural History Museum).