In the last decade biology has become a data-rich science. However, turning these data into an understanding of biology and disease remains challenging. Our group develops novel analytical methods to address a range of specific problems in genetics and sequence analysis, with the eventual goal to better understand the genetic basis of human biology in health and disease.
We are interested in understanding how genomes evolve through mutations and evolutionary pressures, and use this understanding to inform research ranging from disease to human ancestry.
We try to understand non-coding functional DNA by analyzing functional genomic data through deep neural networks, to classify tumors using genome-wide signatures obtained from sequencing tumor samples, to describe past demographic events such as population splits, bottlenecks and migrations by analysing modern and ancient whole-genome sequences, and to understand the three-dimensional structure of DNA facilitating interactions between features far apart on the linear genome. We also develop software tools, such as the variant caller Octopus that accurately calls and phases a range of variant types, and we develop methods for decoding Oxford Nanopore long-read sequencing data.
We draw on a range of techniques, mostly from a field known as machine learning. A few examples of tools we use include Bayesian inference to deal with uncertainty in the data, neural networks to learn from complex data in an unsupervised way, particle filters to do inference on complex models, latent Dirichlet analysis to model the hidden structure of data, and computational techniques such as the Burrows-Wheeler transform combining huge compression factors with fast lookups, enabling the analysis of large data sets.
Some projects are described in more detail below.
Gerton Lunter has a PhD in maths, and has worked in bioinformatics since 2002. He has contributed to the 1000 Genomes Project and various primate sequencing projects. In 2014 he co-founded Genomics plc, which analyses large-scale genomic data sets to speed up drug discovery. He divides his time between the company and his research group.