Bioinformatics Group

Richard Mott's Home Page


Mouse Resources

HS QTL Project

QTL mapping with HAPPY

Mouse Haplotype Structure

Complex Trait Consortium Meeting, Oxford 1-3 July 2003

SNP Selection Methods


Domain Localisation

DNA-Protein Binding


Sequence Alignment with ARIADNE

Sequence Alignment with monotonic gap penalties

Aligning an EST to a genomic sequence


BIOSAPIENS Network of Excellence

Integrated Genotyping System (IGS)

 

Wellcome Trust Centre for Human Genetics

Domain Classification Page

This page contains resources related to a project to classify protein domains and sequences by subcellular localisation into nuclear, cytoplasmic and secreted locales. The work is in collaboration with Chris Ponting, Joerg Schultz and Peer Bork.

The work has been published as Richard Mott, Jörg Schultz, Peer Bork, and Chris P. Ponting (2002). Predicting Protein Cellular Localization Using a Domain Projection Method Genome Res12: 1168-1174.

Abstract

We investigate the co-occurrence of domain families in eukaryotic proteins, in order to predict protein cellular localisation. Approximately half (300) of SMART domains form a ‘small-world network’, linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic and nuclear proteins. The projection method takes into account the existence of ‘bridging’ domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbours in the projection. While the majority of domains are specific to a compartment (‘locale’), and hence may be used to localise any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localisations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy.

Download some data:

Java Applet Viewer for Domain Projection

Drop me an email for more details.

 
 
spacer