advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Hierarchical class and architecture Up: Class and architecture prediction Previous: Helix/non-helix prediction   Contents

Secondary structural content prediction

As previously mentioned in Section 3.2.10, three class predictions are not very accurate. The combination of this poor accuracy and subjective class definitions leads to an understandable questioning of the whole approach. Possibly a more robust solution is to predict the secondary structural content instead of structural class. Eisenhaber et al.eisenhaber:sscp1,eisenhaber:sscp2 have recently published two papers discussing this in depth. They developed a secondary structural content prediction algorithm known as SSCP, which can also be used to predict (indirectly) structural class defined using secondary structure composition cutoffs[Nakashima et al., 1986].

Their basic method attempts to decompose the amino-acid composition vector into three idealised component vectors (for helix, sheet and coil) whose magnitudes are estimates of secondary structural content. Applied to a dataset of 475 protein sequences they obtain average absolute errors for the fraction of helix, sheet and coil of 14.7%, 12.1% and 12.8% respectively (with standard deviations of 11.9%, 10.0% and 9.4% respectively).

Using code kindly donated by the authors, we have recreated, as far as possible, the method of Eisenhaber et al.eisenhaber:sscp1,eisenhaber:sscp2 using our datasets. Using 393 domain sequences of 80 residues or more we obtain very similar results: 14.6% (s.d. 11.5%), 12.2% (9.4%) and 13.1% (10.2%) for helix, sheet and coil.

Do our modifications made to structural class prediction algorithms also improve secondary structural content prediction? The results in Table 3.2 show that they do. The combined use of multiple sequences (with indels not removed because this will affect the prediction of coil) and i,i+3 amino-acid duplets gives an improvement of around 2-3% on the mean absolute errors.




Table 3.2: Secondary Structural Content Prediction using a modified version of SSCP[Eisenhaber et al., 1996a,Eisenhaber et al., 1996b]
input information dimensions $\Delta\alpha$1 $\Delta\beta$ $\Delta$coil
original authors 20 14.7 (11.9) 12.1 (10.0) 12.8 (9.4)
this work repeat of above 20 14.6 (11.5) 12.2 (9.4) 13.1 (10.2)
multiple sequences 20 13.1 (10.3) 11.0 (8.8) 12.0 (9.8)
i,i+3 amino-acid duplets 400 11.3 (9.2) 10.6 (8.4) 10.6 (8.9)
       
1 mean error in prediction of percentage composition of helix (standard deviation in parentheses)


next up previous contents
Next: Hierarchical class and architecture Up: Class and architecture prediction Previous: Helix/non-helix prediction   Contents
Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.