advertisement: compare things at compare-stuff.com! |
The published results of Nakashima et
al.nakashima:cpred state an overall accuracy of 70% for a
five-class prediction (
and
irregular) using a dataset of 135 proteins. Chou and
Zhangchou:critreview repeated this method on a dataset of
120 proteins to obtain a four class accuracy (
) of 63%. We apply our implementation of
the method to 113 of the 120 proteins (7 were obsolete) and obtain 69%
accuracy (see Result 3 in Table 3.1). Clearly the outcome
of the prediction is sensitive to the choice of dataset. With a
larger dataset these effects should be diluted and the results will
reflect more accurately the amount of global structural information in
the amino acid composition of protein sequences.
Using a program by Alex Michie[Michie et al.,
1996], 403 of the 470
homologous superfamily representatives (86%) of the 1996 CATH domain
classification can be classified automatically using 3D structural
information into one of four classes (
). The remaining 14% are borderline cases requiring visual
inspection. Fully automated sequence-based prediction methods are not
therefore expected to predict more than 86% of domains correctly. With
the dataset of 403 unambiguously classified domains, the overall accuracy
is 52% (Result 4 in Table 3.1); clearly much worse than the
original published figures of 70-80%[Nakashima et al.,
1986].
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||