advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Analysis and Discussion Up: Results Previous: Sequence Conservation   Contents

Secondary structure predictions

The secondary structure prediction program DSC[King & Sternberg, 1996, also see Section 1.4.4] was used to generate probabilities for the helix and strand secondary structural states for each residue in a domain sequence using as input its associated multiple sequence alignment. Therefore the property vector, $P$, has two components. The DSC method is approximately 70% accurate using the $Q_3$ measure, as tested in blind trials at CASP2. We make the assumption that the DSC algorithm does not significantly memorise the secondary structural states of residues in its learning set of protein structures (many of which will have homologues in our dataset). Our reason is that it has far fewer parameters than the PHD method (DSC:1000, PHD:25,000) compared with the number of residues in the training set (23,000). Furthermore, when tested without jack-knifing on its training set of 126 proteins, $Q_3$ increases by only a few percent (R. King, personal communication).

The results for fold recognition using alignments of two component vectors of helix and strand probabilities are quite interesting (Table 4.4). The number of correct top-ranking predictions is not special ($T=8$), but the mean adjusted rank is the best so far ( $\overline{R}_{adj}=9.0$). The alignment quality is, however, the worst so far ( $\overline{S}=26.8$). It seems likely that the improved average ranking is the result of non-specific recognition of domains with similar predicted secondary structure content. The poor alignments are probably due to the lack of phase information in the secondary structure predictions, compared with the hydrophobicity information which, as already discussed, is frequently alternating in magnitude.

As with the alignment of sequence conservation information, we found that the combination of hydrophobicity with predicted secondary structure information was cooperative ($P$ is now a three component vector). Figure 4.2 shows the effect of the ratio between the two components. The minima for $\overline{R}_{adj}$ and $\overline{S}$ are found, surprisingly, at different ratios: 2:1 and 1:2 respectively (hydrophobicity:secondary structure prediction), both giving better performance than either measure alone. The numerical results for these combinations and the intervening ratio of 1:1 are given in Tables 4.4 and 4.9.





Table 4.9: Summary of fold recognition results using alignments of combined raw hydrophobicity (Kyte and Doolittle) and DSC secondary structure prediction probabilities for helix and strand.
rank by   domain CATH length  
all query Z-score query library topology query library $\overline{S}$
1 1 2.069 5p2100 1hurA0 3.40.330 166 180 0.5
2 1 1.847 1hurA0 5p2100 3.40.330 180 166 0.5
3 1 1.027 1atr03 1atnA2 3.30.420 107 108 4.5
5 1 1.008 1dgd02 1aam02 3.40.640 264 271 19.5
6 1 0.989 1atnA2 1atr03 3.30.420 108 107 6.0
8 1 0.874 1cgt04 1exg00 2.60.40 104 110 17.1
11 1 0.854 4fxn00 1ntr00 3.40.330 138 124 2.2
12 1 0.843 1aam02 1dgd02 3.40.640 271 264 18.3
13 1 0.841 1ntr00 4fxn00 3.40.330 124 138 2.4
14 1 0.815 1gdhA2 1raaA1 3.40.330 184 152 3.3
15 1 0.796 1cnd01 1pkm03 2.40.90 106 103 21.9
18 3 0.781 1exg00 1cgt04 2.60.40 110 104 15.1
20 1 0.758 1raaA1 1gdhA2 3.40.330 152 184 3.7
28 2 0.722 1sxaA0 1exg00 2.60.40 151 110 15.3
35 1 0.670 1pii01 1tpfA0 3.20.40 261 250 24.7

Figure 4.2: Fold recognition performance using alignments of three-component vectors of hydrophobicity and DSC helix and strand prediction probabilities, using different ratios of the components (helix:strand ratio was always 1:1). (a) Fold recognition ranking. (b) Alignment quality. Combinations outperform individual components in both ranking and alignment quality, although the ratios at the minima are different.
\begin{figure}\begin{center}
\par (a)~\epsfig{file=chap5/figs/kyte_dsc/rankplot....
...hap5/figs/kyte_dsc/shiftplot.eps,width=\oneandahalf}\par\end{center}\end{figure}

The low $\overline{R}_{adj}$ results for ratios 1:1 and 1:2 may be in part due to the improved recognition of Ig-like topologies (2.60.40). There are $T=13$ correct top ranking predictions with a ratio of 1:1, almost twice as many as the basic Smith Waterman method.


next up previous contents
Next: Analysis and Discussion Up: Results Previous: Sequence Conservation   Contents
Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.