advertisement: compare things at compare-stuff.com! |
A visual examination of the Matthews correlation coefficients in
Table 3.1 reveals that in almost every case, the coefficient
for the prediction of mainly- is greater than those for the other
two classes. As mentioned above, this indicates that the prediction
algorithm is best able to predict the presence or absence of helix. Since
even the best prediction accuracies (69% correct expected for 81% of
blind predictions) are not remarkable, we decided to `move the goalposts'
and apply the prediction to just two secondary structural classes:
mainly-
and helix-containing (formerly mainly-
and
mixed-
). Using the same dataset and i,i+3 duplets as in
Result 20, we obtain the correct 2-class prediction for 83% of the
sequences, whilst the Matthews correlation coefficients are both (by
definition) 0.56 (
was 0.55 in the three-class prediction).
Furthermore, using the reliability measure,
, to rank the predictions,
the top 50% of predictions are approximately 95% accurate, whilst the top
33% of predictions are a remarkable 99% correct.
To summarise, we have developed and rigorously tested an algorithm which not only predicts useful structural information from sequence information (the presence of helix) but does so with high accuracy and with a reliable measure of confidence.