|
Sequence Alignment with monotonic gap penalties   |
STATISTICSariadne and prospero test the statistical significance of similarity scores using the following model (download this for more details): A score S in a comparison between two sequences or a sequence and a profile has pairwise p-value P given by P = 1-exp(-Kmn exp(-L*S)) where m,n are the sequence lengths and K, L are parameters depending on the compositions, scoring scheme and (slightly) on the sequence lengths. K, L are calculated using the formula described in Mott, 2000, which takes account of sequence composition, substitution matrix, gap penalty, sequence length: L = L_u*(1.013 -2.61*alpha + f(m,n)( -0.76 + 9.34*alpha +1.12/H) ) K = K_u*exp( 0.26 -18.92*alpha + f(m,n)(-1.76 + 32.69*alpha + 192.52*alpha^2 + 3.24/H ) ) where: f(m,n) = log(m*n)*(1/m+1/n) L_u, K_u are the parameters for ungapped alignments H is the entropy of ungapped alignments (eg as defined Karlin-Altschul PNAS 1990, or Mott and Tribe 1999), and alpha is a parameter depending on the gap penalty A+B*k: alpha = 2*s*exp(-L_u(A+B))/(1-exp(-L_u*B)) where s = sqrt{ (K_u/H) * [ delta / exp(L_u *delta ) ] } and delta is the smallest span of score values (usually 1) In the example above, K = 1.361261e-01 L = 3.478619e-01, m = 51, n = 43, S = 125, so P = 1-exp(-0.1361*51*43*exp(-0.3478*125)) = 5.95e-16. The database size was set at 100000 = 1.0e6 sequences, so the evalue is 5.95e-10.
|
||||
|