It is now straightforward to sequence the DNA in a person's genome, and databases that link genetic data to a range of phenotypes are becoming ever larger. What is less straightforward is to process and interpret these data. My group is interested in developing computational and statistical methods to help use these growing resrouces to answer questions in medical genomics and population genetics.
The questions we address range from data processing to interpretation of genetic variants and understanding the history our species. This range is reflected in the variety of projects we take on, which recently have included:
- improving the accuracy of reads from the Oxford Nanopore (ONT) single-moledule portable sequencing device;
- Inferring demographic events such as migrations and population bottlenecks from whole genome sequencing data;
- Understand the impact of non-coding mutations on disease by building sequence-to-phenotype models;
- Charting the differentiation of B cells in response to vaccination and infection.
We draw on a range of sources for our methods, but key recurring ingredients are Bayesian statistics, machine learning, and algorithm design. We are particularly interested in the application of deep learning methods, such as convolutional neural networks, and in particular for the interpretation of non-coding mutations these methods show a lot of promise. We also use a range of more traditional machine-learning methods, such as Bayesian statistics, hidden Markov models and particle filters, and design novel algorithms, such as based around the Burrows-Wheeler transform, to deal with the often very large data sets.
An Equivariant Bayesian Convolutional Network predicts recombination hotspots and accurately resolves binding motifs.
Brown R. and Lunter G., (2018), Bioinformatics
Haplotype matching in large cohorts using the Li and Stephens model.
Lunter G., (2018), Bioinformatics
A high throughput screen for active human transposable elements.
Kvikstad EM. et al, (2018), BMC genomics, 19, 115 - 115
Accurate clinical detection of exon copy number variants in a targeted NGS panel using DECoN.
Fowler A. et al, (2016), Wellcome open research, 1, 20 - 20
OpEx - a validated, automated pipeline optimised for clinical exome sequence analysis.
Ruark E. et al, (2016), Scientific reports, 6, 31029 - 31029