advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Secondary structure predictions Up: Results Previous: Alignments of hydrophobicity-related information   Contents


Sequence Conservation

The measure of sequence conservation, $g$, adopted for this work, is adapted from Taylortaylor:mst. For residue $i$ in a multiple sequence alignment, $g_i$ is defined as follows:

\begin{displaymath}
g_i = \frac{2}{n^2 - n} \sum_{j=1}^{n-1}{\sum_{k=j+1}^{n}{D_{aa_{ij},aa_{ik}}}}
\end{displaymath} (7)

where $n$ is the number of multiple sequences, and $D_{aa_{ij},aa_{ik}}$ is the score from the substitution matrix (PAM250) between the amino acids at position $i$ in sequences $j$ and $k$ in the multiple sequence alignment. Two special cases account for gaps: $D_{gap,aa}=-10$ and $D_{gap,gap}=-12$ (the PAM250 scores range from -8 to +17). Other measures of conservation (for example Sander and Schneidersander:hssp) take the relatedness of sequences into consideration, down-weighting the contributions made by pairs of more related sequences. Since the multiple sequences used in this study (see Appendix A) have had sequences removed until no pair has more than 70% identity, it is felt that the simpler measure is adequate.

Overall fold recognition results are poor using only sequence conservation information (Table 4.4). This might be expected since sequence conservation is the secondary consequence of the structural and functional characteristics of proteins. One interesting outcome, however, is that the second highest Z-score, correctly identifies library domain 1hcnB0 for query domain 2tgi00 (one of only two correct top hits, data not shown). This (2.10.90) topology contains the cysteine knot motif, a cluster of disulphide bonds connecting $\beta $-strands. Cysteines making such bonds are known to be well conserved, and in this fold, the pattern of conservation appears to be clear enough for recognition purposes.

Combinations of conservation and hydrophobicity give understandably better results; it is the conserved hydrophobic residues that are expected to be in the core of protein folds (but also in the active sites of some), and amphipathic patterns of hydrophobicity ought to be conserved in core secondary structure elements. A measure of conserved hydrophobicity, after Taylortaylor:mst, can be calculated as follows:

\begin{displaymath}
H_i = (\overline{h}_i + c_h)(g_i + c_g)
\end{displaymath} (8)

where $c_h$ and $c_g$ are constants which shift all values into the positive domain.

The fold recognition results using conserved hydrophobicity, given in Tables 4.4 and 4.8, are the best so far in terms of $T=9$, and $\overline{R}_{adj}$ is also good. Remarkably, 8 out of the top 9 ranking predictions (Table 4.8) from this trial are correct, at rank number one on a per-query basis. Using this small fold library and a Z-score threshold of 1.0, the alignment of conserved hydrophobicity could give 80-90% correct first hits above the threshold, with a coverage of about 30% (the chance of getting a result above the threshold; see Section 4.4.4 for more discussion).





Table 4.8: Summary of fold recognition results using alignments of conserved hydrophobicity score.
rank by   domain CATH length  
all query Z-score query library topology query library $\overline{S}$
1 1 2.365 1hurA0 5p2100 3.40.330 180 166 0.5
2 1 2.269 5p2100 1hurA0 3.40.330 166 180 0.5
3 1 1.230 1atr03 1atnA2 3.30.420 107 108 1.1
4 1 1.141 1ntr00 4fxn00 3.40.330 124 138 3.0
5 1 1.117 4fxn00 1ntr00 3.40.330 138 124 2.5
6 1 1.115 1atnA2 1atr03 3.30.420 108 107 1.1
7 1 1.075 1pii01 1tpfA0 3.20.40 261 250 23.1
9 1 1.045 1dgd02 1aam02 3.40.640 264 271 16.8
23 2 0.830 1tpfA0 1pii01 3.20.40 250 261 25.8
28 2 0.792 2tgi00 1hcnB0 2.10.90 112 110 20.1
34 4 0.753 1raaA1 4fxn00 3.40.330 152 138 66.1
35 1 0.748 1hcnB0 2tgi00 2.10.90 110 112 20.8

With similar goals in mind, sequences encoded by a two-component vector $P$, of hydrophobicity ($\overline{h}$) and sequence conservation ($g$), were also aligned. Figure 4.1 shows the results for mean adjusted rank and mean alignment error using a range of different weightings for the sequence conservation component vs. the hydrophobicity component. Mean adjusted rank results were better than those from either of the two measures alone or the conserved hydrophobicity measure, when the ratio of hydrophobicity to conservation was 20:1 or 10:1. A further improvement in the number of top ranking correct folds ($T=10$) is seen with both these ratios. Alignment quality did not improve, however, beyond that already obtained using hydrophobicity alone (Figure 4.1(b)).

Figure 4.1: Fold recognition performance using alignments of two-component vectors of hydrophobicity and conservation, with different ratios (relative weights). (a) Fold recognition ranking. (b) Alignment quality. Ranking performance improves beyond trials using hydrophobicity or conservation alone or the combined measure of conserved hydrophobicity. Alignment quality for these combinations is worse than that obtained using either hydrophobicity or conserved hydrophobicity.
\begin{figure}\begin{center}
\par (a)~\epsfig{file=chap5/figs/kyte_cons/rankplot...
...ap5/figs/kyte_cons/shiftplot.eps,width=\oneandahalf}\par\end{center}\end{figure}


next up previous contents
Next: Secondary structure predictions Up: Results Previous: Alignments of hydrophobicity-related information   Contents
Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.