advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Evaluation of alignment quality Up: Methods and Data Previous: Fold library and query   Contents

Evaluation of fold recognition performance

The scores for the alignments (see below) between the query sequence and each of the 82 library sequences are sorted and then normalised by the mean and standard deviation (of the same 82 scores), to give the so-called Z-score. Ideally, correctly identified folds would rank highest and their alignments would have Z-scores greater than, say, 3.0. The fold recognition trials could be assessed by the number of correct vs. incorrect predictions above such a threshold. However, using the different methodologies presented in this and the following chapter, we observed wide variations in the distributions of alignment scores. A single Z-threshold cannot be used to give equivalent numbers of predictions. Furthermore, the numbers of such predictions are small, leading to discretised and noise prone success rates. Instead, a more continuous and robust measure was employed which was not dependent upon a threshold. The important issue of prediction confidence is discussed in Section 4.4.4 following the refinement of the methods.

For a query sequence which has $N$ recognisable non-self folds in the library, the mean rank (in the list of library folds sorted by alignment score) of the correct folds is simply calculated as $\overline{R} =
N^{-1}\sum^N{R_n}$, where $R_n$ is the rank of recognisable fold $n$. However, values of $\overline{R}$ are dependent on $N$: a perfect ranking when $N=7$ yields $\overline{R}=4$, whilst a perfect ranking when $N=2$ yields $\overline{R}=1.5$. Both predictions should ideally score the same. Therefore $\overline{R}$ is divided by the best possible score ($N/2$) to give the mean adjusted rank, $\overline{R}_{adj}$, which equals 1.0 for a perfect prediction, and is higher for worse predictions.

In order to compare results with other methods[Rost et al., 1997, for example], we also calculate the number of correct top ranking predictions, $T$, (excluding self predictions). For folds where $N>1$ there can only be one correct top ranking prediction, hence the maximum possible value of $T$ is 27, the number of recognisable queries.


next up previous contents
Next: Evaluation of alignment quality Up: Methods and Data Previous: Fold library and query   Contents
Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.