Bioinformatics Group

My Home Page

Marker Selection Methods

Description of Method (PDF)

Input File Format

Running the Programs

Output

Whole Chromosome Analysis

Download C Source Code

Web Server


Wellcome Trust Centre for Human Genetics

SNP Selection Page

running span_haplotype, greedy_haplotype, big_haplotype

There are three programs available:

  • span_haplotype which finds the optimal subsets of markers by exhaustive search. This is only suitable for datasets with < 30 markers if you want the program to run in a few minutes.
  • greedy_haplotype which finds suboptimal subsets of markers by greedy search. This is suitable for all datasets. The two programs produce the same answers on our test data, but we have not been able to prove if they are equivalent (Can you ?).
  • Both programs have the same command-line options:
          usage: span_haplotype
    	      -max            integer            The maximum subset size to consider. Default is # of markers
    	      -top            integer            Report the top N subsets for each subset size (not available for greedy_haplotype)
    	      -minfreq        float              Ignore haplotypes with less than this frequency
    	      -haplotypes     Readable File      Input file name
    	      -help           switch             This help
    
  • big_haplotype works slightly differently from span_haplotype and greedy_haplotype. It is used to process very long regions such as whole chromosomes, by moving a wondow along the genome and perfroming a greedy SNP search within each window. Each SNP is given a score, and then the best SNPs are selected by their score. The result is a global set of SNPs that uniformly approximate to the local haplotypic structure. A by-product is the calculation of entropy for each window which gives a local measure of linkage disequilibium (where low entropy implies high LD)
  • big_haplotype has the command-line options:
          usage: big_haplotype
    	      -max            integer              The maximum subset size to consider. Default is # of markers
    	      -minfreq        float                Ignore haplotypes with less than this frequency
    	      -window         integer              The moving window size (defaults to 20)
    	      -step           integer              The step size fo the moving window (defaults to 1) 
    	      -ptop           float                The proportion of SNPs to select (defaults to 0.5 ie 50%)
    	      -haplotypes     Readable File        Input file name
    	      -help           switch               This help
    
    

Drop me an email for more details.

 
 
spacer