advertisement: compare things at compare-stuff.com! |
The scores for the alignments (see below) between the query sequence and each of the 82 library sequences are sorted and then normalised by the mean and standard deviation (of the same 82 scores), to give the so-called Z-score. Ideally, correctly identified folds would rank highest and their alignments would have Z-scores greater than, say, 3.0. The fold recognition trials could be assessed by the number of correct vs. incorrect predictions above such a threshold. However, using the different methodologies presented in this and the following chapter, we observed wide variations in the distributions of alignment scores. A single Z-threshold cannot be used to give equivalent numbers of predictions. Furthermore, the numbers of such predictions are small, leading to discretised and noise prone success rates. Instead, a more continuous and robust measure was employed which was not dependent upon a threshold. The important issue of prediction confidence is discussed in Section 4.4.4 following the refinement of the methods.
For a query sequence which has recognisable non-self folds in the
library, the mean rank (in the list of library folds sorted by alignment
score) of the correct folds is simply calculated as
, where
is the rank of recognisable fold
.
However, values of
are dependent on
: a perfect ranking
when
yields
, whilst a perfect ranking when
yields
. Both predictions should ideally score the same.
Therefore
is divided by the best possible score (
) to
give the mean adjusted rank,
, which equals 1.0
for a perfect prediction, and is higher for worse predictions.
In order to compare results with other methods[Rost et al., 1997, for
example], we also calculate the number of correct top ranking
predictions, , (excluding self predictions). For folds where
there can only be one correct top ranking prediction, hence the
maximum possible value of
is 27, the number of recognisable queries.