Secondary structure predictions

The secondary structure prediction program DSC[King & Sternberg, 1996, also see Section 1.4.4] was used to generate probabilities for the helix and strand secondary structural states for each residue in a domain sequence using as input its associated multiple sequence alignment. Therefore the property vector,

, has two components. The DSC method is approximately 70% accurate using the

measure, as tested in blind trials at CASP2. We make the assumption that the DSC algorithm does not significantly memorise the secondary structural states of residues in its learning set of protein structures (many of which will have homologues in our dataset). Our reason is that it has far fewer parameters than the PHD method (DSC:1000, PHD:25,000) compared with the number of residues in the training set (23,000). Furthermore, when tested without jack-knifing on its training set of 126 proteins,

increases by only a few percent (R. King, personal communication).

The results for fold recognition using alignments of two component vectors of helix and strand probabilities are quite interesting (Table 4.4). The number of correct top-ranking predictions is not special (

), but the mean adjusted rank is the best so far ( $\overline{R}_{adj}=9.0$ ). The alignment quality is, however, the worst so far ( $\overline{S}=26.8$ ). It seems likely that the improved average ranking is the result of non-specific recognition of domains with similar predicted secondary structure content. The poor alignments are probably due to the lack of phase information in the secondary structure predictions, compared with the hydrophobicity information which, as already discussed, is frequently alternating in magnitude.

As with the alignment of sequence conservation information, we found that the combination of hydrophobicity with predicted secondary structure information was cooperative (

is now a three component vector). Figure 4.2 shows the effect of the ratio between the two components. The minima for $\overline{R}_{adj}$ and $\overline{S}$ are found, surprisingly, at different ratios: 2:1 and 1:2 respectively (hydrophobicity:secondary structure prediction), both giving better performance than either measure alone. The numerical results for these combinations and the intervening ratio of 1:1 are given in Tables 4.4 and 4.9.

**Table 4.9:** Summary of fold recognition results using alignments of combined raw hydrophobicity (Kyte and Doolittle) and DSC secondary structure prediction probabilities for helix and strand.
rank by			domain		CATH	length
all	query	Z-score	query	library	topology	query	library	$\overline{S}$
1	1	2.069	5p2100	1hurA0	3.40.330	166	180	0.5
2	1	1.847	1hurA0	5p2100	3.40.330	180	166	0.5
3	1	1.027	1atr03	1atnA2	3.30.420	107	108	4.5
5	1	1.008	1dgd02	1aam02	3.40.640	264	271	19.5
6	1	0.989	1atnA2	1atr03	3.30.420	108	107	6.0
8	1	0.874	1cgt04	1exg00	2.60.40	104	110	17.1
11	1	0.854	4fxn00	1ntr00	3.40.330	138	124	2.2
12	1	0.843	1aam02	1dgd02	3.40.640	271	264	18.3
13	1	0.841	1ntr00	4fxn00	3.40.330	124	138	2.4
14	1	0.815	1gdhA2	1raaA1	3.40.330	184	152	3.3
15	1	0.796	1cnd01	1pkm03	2.40.90	106	103	21.9
18	3	0.781	1exg00	1cgt04	2.60.40	110	104	15.1
20	1	0.758	1raaA1	1gdhA2	3.40.330	152	184	3.7
28	2	0.722	1sxaA0	1exg00	2.60.40	151	110	15.3
35	1	0.670	1pii01	1tpfA0	3.20.40	261	250	24.7

**Figure 4.2:** Fold recognition performance using alignments of three-component vectors of hydrophobicity and DSC helix and strand prediction probabilities, using different ratios of the components (helix:strand ratio was always 1:1). (a) Fold recognition ranking. (b) Alignment quality. Combinations outperform individual components in both ranking and alignment quality, although the ratios at the minima are different.
$\begin{figure}\begin{center} \par (a)~\epsfig{file=chap5/figs/kyte_dsc/rankplot.... ...hap5/figs/kyte_dsc/shiftplot.eps,width=\oneandahalf}\par\end{center}\end{figure}$

The low $\overline{R}_{adj}$ results for ratios 1:1 and 1:2 may be in part due to the improved recognition of Ig-like topologies (2.60.40). There are

correct top ranking predictions with a ratio of 1:1, almost twice as many as the basic Smith Waterman method.