Sequence Conservation

advertisement: compare things at compare-stuff.com!

Next: Secondary structure predictions Up: Results Previous: Alignments of hydrophobicity-related information Contents

Sequence Conservation

The measure of sequence conservation, , adopted for this work, is adapted from Taylortaylor:mst. For residue in a multiple sequence alignment, is defined as follows:

$\begin{displaymath} g_i = \frac{2}{n^2 - n} \sum_{j=1}^{n-1}{\sum_{k=j+1}^{n}{D_{aa_{ij},aa_{ik}}}} \end{displaymath}$

(7)

where

is the number of multiple sequences, and $D_{aa_{ij},aa_{ik}}$ is the score from the substitution matrix (PAM250) between the amino acids at position

in sequences

and

in the multiple sequence alignment. Two special cases account for gaps: $D_{gap,aa}=-10$ and $D_{gap,gap}=-12$ (the PAM250 scores range from -8 to +17). Other measures of conservation (for example Sander and Schneidersander:hssp) take the relatedness of sequences into consideration, down-weighting the contributions made by pairs of more related sequences. Since the multiple sequences used in this study (see Appendix A) have had sequences removed until no pair has more than 70% identity, it is felt that the simpler measure is adequate.

Overall fold recognition results are poor using only sequence conservation information (Table 4.4). This might be expected since sequence conservation is the secondary consequence of the structural and functional characteristics of proteins. One interesting outcome, however, is that the second highest Z-score, correctly identifies library domain 1hcnB0 for query domain 2tgi00 (one of only two correct top hits, data not shown). This (2.10.90) topology contains the cysteine knot motif, a cluster of disulphide bonds connecting $\beta$ -strands. Cysteines making such bonds are known to be well conserved, and in this fold, the pattern of conservation appears to be clear enough for recognition purposes.

Combinations of conservation and hydrophobicity give understandably better results; it is the conserved hydrophobic residues that are expected to be in the core of protein folds (but also in the active sites of some), and amphipathic patterns of hydrophobicity ought to be conserved in core secondary structure elements. A measure of conserved hydrophobicity, after Taylortaylor:mst, can be calculated as follows:

$\begin{displaymath} H_i = (\overline{h}_i + c_h)(g_i + c_g) \end{displaymath}$

(8)

where

and

are constants which shift all values into the positive domain.

The fold recognition results using conserved hydrophobicity, given in Tables 4.4 and 4.8, are the best so far in terms of , and $\overline{R}_{adj}$ is also good. Remarkably, 8 out of the top 9 ranking predictions (Table 4.8) from this trial are correct, at rank number one on a per-query basis. Using this small fold library and a Z-score threshold of 1.0, the alignment of conserved hydrophobicity could give 80-90% correct first hits above the threshold, with a coverage of about 30% (the chance of getting a result above the threshold; see Section 4.4.4 for more discussion).

**Table 4.8:** Summary of fold recognition results using alignments of conserved hydrophobicity score.
rank by			domain		CATH	length
all	query	Z-score	query	library	topology	query	library	$\overline{S}$
1	1	2.365	1hurA0	5p2100	3.40.330	180	166	0.5
2	1	2.269	5p2100	1hurA0	3.40.330	166	180	0.5
3	1	1.230	1atr03	1atnA2	3.30.420	107	108	1.1
4	1	1.141	1ntr00	4fxn00	3.40.330	124	138	3.0
5	1	1.117	4fxn00	1ntr00	3.40.330	138	124	2.5
6	1	1.115	1atnA2	1atr03	3.30.420	108	107	1.1
7	1	1.075	1pii01	1tpfA0	3.20.40	261	250	23.1
9	1	1.045	1dgd02	1aam02	3.40.640	264	271	16.8
23	2	0.830	1tpfA0	1pii01	3.20.40	250	261	25.8
28	2	0.792	2tgi00	1hcnB0	2.10.90	112	110	20.1
34	4	0.753	1raaA1	4fxn00	3.40.330	152	138	66.1
35	1	0.748	1hcnB0	2tgi00	2.10.90	110	112	20.8

With similar goals in mind, sequences encoded by a two-component vector , of hydrophobicity ( $\overline{h}$ ) and sequence conservation (), were also aligned. Figure 4.1 shows the results for mean adjusted rank and mean alignment error using a range of different weightings for the sequence conservation component vs. the hydrophobicity component. Mean adjusted rank results were better than those from either of the two measures alone or the conserved hydrophobicity measure, when the ratio of hydrophobicity to conservation was 20:1 or 10:1. A further improvement in the number of top ranking correct folds () is seen with both these ratios. Alignment quality did not improve, however, beyond that already obtained using hydrophobicity alone (Figure 4.1(b)).

**Figure 4.1:** Fold recognition performance using alignments of two-component vectors of hydrophobicity and conservation, with different ratios (relative weights). (a) Fold recognition ranking. (b) Alignment quality. Ranking performance improves beyond trials using hydrophobicity or conservation alone or the combined measure of conserved hydrophobicity. Alignment quality for these combinations is worse than that obtained using either hydrophobicity or conserved hydrophobicity.
$\begin{figure}\begin{center} \par (a)~\epsfig{file=chap5/figs/kyte_cons/rankplot... ...ap5/figs/kyte_cons/shiftplot.eps,width=\oneandahalf}\par\end{center}\end{figure}$

Next: Secondary structure predictions Up: Results Previous: Alignments of hydrophobicity-related information Contents