Secondary structural content prediction

As previously mentioned in Section 3.2.10, three class predictions are not very accurate. The combination of this poor accuracy and subjective class definitions leads to an understandable questioning of the whole approach. Possibly a more robust solution is to predict the secondary structural content instead of structural class. Eisenhaber et al.eisenhaber:sscp1,eisenhaber:sscp2 have recently published two papers discussing this in depth. They developed a secondary structural content prediction algorithm known as SSCP, which can also be used to predict (indirectly) structural class defined using secondary structure composition cutoffs[Nakashima et al., 1986].

Their basic method attempts to decompose the amino-acid composition vector into three idealised component vectors (for helix, sheet and coil) whose magnitudes are estimates of secondary structural content. Applied to a dataset of 475 protein sequences they obtain average absolute errors for the fraction of helix, sheet and coil of 14.7%, 12.1% and 12.8% respectively (with standard deviations of 11.9%, 10.0% and 9.4% respectively).

Using code kindly donated by the authors, we have recreated, as far as possible, the method of Eisenhaber et al.eisenhaber:sscp1,eisenhaber:sscp2 using our datasets. Using 393 domain sequences of 80 residues or more we obtain very similar results: 14.6% (s.d. 11.5%), 12.2% (9.4%) and 13.1% (10.2%) for helix, sheet and coil.

Do our modifications made to structural class prediction algorithms also improve secondary structural content prediction? The results in Table 3.2 show that they do. The combined use of multiple sequences (with indels not removed because this will affect the prediction of coil) and i,i+3 amino-acid duplets gives an improvement of around 2-3% on the mean absolute errors.

**Table 3.2:** Secondary Structural Content Prediction using a modified version of SSCP[Eisenhaber *et al.*, 1996a,Eisenhaber *et al.*, 1996b]
input information	dimensions	$\Delta\alpha$ ¹	$\Delta\beta$	$\Delta$ coil
original authors	20	14.7 (11.9)	12.1 (10.0)	12.8 (9.4)
this work repeat of above	20	14.6 (11.5)	12.2 (9.4)	13.1 (10.2)
multiple sequences	20	13.1 (10.3)	11.0 (8.8)	12.0 (9.8)
i,i+3 amino-acid duplets	400	11.3 (9.2)	10.6 (8.4)	10.6 (8.9)

¹ mean error in prediction of percentage composition of helix (standard deviation in parentheses)