|
Bioinformatics Group
My Home Page
Marker Selection Methods
Description of Method (PDF)
Input File Format
Running the Programs
Output
Whole Chromosome Analysis
Download C Source Code
Web Server
|
Marker Selection Page
This page contains resources related to a project to identify markers, and in particular SNPs,
for genotyping. The objective is, given a set of markers and associated
haplotypes with haplotype population frequencies, choose a subset of
markerss that best approximate the haplotypic diversity in the
popultion.
The idea is to compute the marker subset with maximum
entropy.
Suppose we have a set of k markers typed
across n individuals, producing 2n observed haplotypes, which are
assumed known. The markers can be SNPs, microsatellites or any mixture of the two. Suppose haplotype with label i occurs with
frequency fi in the sample. We are interested in
finding subsets of markers that capture as much of the haplotypic
diversity as possible, weighted by the population frequencies of the
haplotypes.
Let pi = fi/2n. Then the entropy of the data is defined as
E = - i
pi log2 pi
(where 0 log2 0 is interpreted as 0).
Entropy is a good measure of haplotypic diversity, attaining a
maximum if all haplotypes are present in equal quantities. The marker
subset of some fixed size that maximises the entropy will merge some
haplotypes together, because they cannot be distinguished within the
subset, but will attempt to equalise the population frequencies of the
merged haplotypes.
- A more extensive description is available in this PDF Document
- The method has been applied in the paper
Haplotypic analysis of the TNF locus by association efficiency and
entropy by Hans Ackerman, Stanley Usen, Richard Mott, Anna Richardson,
Fatoumatta Sisay-Joof, Pauline Katundu, Terrie Taylor, Ryk Ward,
Malcolm Molyneux, Margaret Pinder and Dominic P Kwiatkowski, Genome Biology 2003
- C source code is available for download
- You can analyse your data remotely on our Web Server.
- The work is in collaboration with Dominic Kwiatkowski
-
There are three programs available. They will work for both SNP and microsatellite data, or any mixture of markers:
- span_haplotype Finds the optimal subset of markers by exhaustive search. Only suitable for datasets with fewer than about 20 SNPs
- span_haplotype2 Finds a near-optimal subset of markers by greedy search. On our test datasets span_haplotype2 produces results very similar or identical to span_haplotype. It is suitable for larger datasets
- big_haplotype Finds good sets of markers over very large regions such as whole chromosomes, involving hundreds or thousands of markers.
Drop me an email
for more details.
|
|