Profiles.
Format for each profile is a 3-line header followed by the profile data. Blank lines are ignored.
The header looks like
>name name of the profile, eg "globin"
profile-length alphabet-length eg "141 21"
alphabet-order List of space-separated amino-acids in the order they occur in the profile
The profile contains a series of lines, one per profile position, with format
[position] [consensus] [profile-scores in order defined by alphabet-order]
For example, part of a profile looks like this:
>globin
141 21
A R N D C Q E G H I L K M F P S T W Y V X
1 H -4 -1 2 -2 -5 -1 -1 -3 10 -5 -5 -2 -4 -3 -4 -2 -3 -4 0 -5 -1
2 L -4 -5 -6 -7 0 -5 -6 -6 -5 -1 6 -5 0 0 -6 -5 -4 6 -3 -2 -1
3 S -1 -4 -1 -1 -4 -2 -2 -2 -4 -4 -4 -3 -2 -5 -4 5 6 -5 -5 -3 -1
4 A 4 -3 0 3 -1 0 2 -1 -2 -3 -3 -1 -2 -3 0 0 -2 -6 -3 -3 -1
5 E 2 -1 -1 3 -3 0 3 -1 0 -5 -3 1 -4 -3 -1 -1 -1 -6 -4 -3 -1
6 E -3 -3 0 4 -6 4 5 -4 -3 -3 -5 -2 -4 -6 -4 -2 -1 -6 -5 -3 -1
7 K 0 3 -2 -4 -2 -1 1 -4 -1 0 -3 5 -2 -1 -4 -2 -3 4 -4 -1 -1
8 A 3 -1 0 0 -2 2 0 -2 1 -2 -2 1 -3 -2 -4 1 1 -5 -4 -3 -1
9 L 2 -2 2 -2 -2 -2 -1 -4 0 1 3 0 -1 -2 -5 -2 0 -5 -2 0 -1
10 V -2 -6 -6 -6 -4 -5 -5 -6 -6 5 1 -5 -1 -2 -5 -4 -3 -6 -4 6 -1
11 K -1 3 1 -2 -3 2 -2 -3 0 -3 0 5 0 -5 -4 -1 1 -5 -4 -2 -1
12 A 2 -1 0 1 -3 0 0 0 1 -4 -3 1 -2 -4 -4 3 0 -6 -4 -3 -1
13 L 0 -4 -2 -3 0 0 -2 -4 -1 1 1 -3 -1 -1 -4 3 2 -5 -3 1 -1
14 W -3 -6 -6 -5 -2 -5 -4 -4 -5 -2 -4 -6 -2 2 -7 -4 -4 13 0 -2 -1
........... etc etc
So eg at position 2 the consensus is L and the score for matching an A at this position is -4
A library of profiles can contain multiple profiles.
Profiles and substitution matrices should be such that the smallest
span of score values is 1 Also, for reasons of efficiency it is best
is the total range of score values is quite small, say < 30
Some example profile sets derived from PFAM-A are provided for download.