Fold recognition and self-organising maps

advertisement: compare things at compare-stuff.com!

Next: Concluding remarks Up: Possible developments Previous: Structure-mapping of sequence information Contents

Fold recognition and self-organising maps

We have tried to use self-organising maps to filter sequences of residue properties, such as hydrophobicity, using three dimensional structure information. The output most closely resembles the input where the input sequence properties are clustered in three dimensional space, and is smoothed where they are not. The mapped (or filtered) sequences should describe the essence of the fold in terms of the properties used. In other words, self-organised sequence property `fields' are generated within the structure. In order to test how well this abstraction could describe the common features of distantly homologous and analogous folds, the structure-mapped sequences were used in fold recognition trials. Their poor performance in this task was probably due to the loss or corruption of detailed magnitude information in the property vectors. An approach is needed whereby during the alignment structurally relevant information is aligned more stringently than structurally irrelevant information. One way to achieve this is to align the query sequence with unmapped library sequences which have been masked in some way according to how well the residue property vectors map. This can be measured by the quantisation error (Equation 5.3), , which is large when the input vector represents a residue with structurally anomalous properties. One could also set position specific gap penalties based on . Both approaches might be equivalent to the already common use of secondary structure dependent gap penalties[Barton & Sternberg, 1987]. This highlights the major limitation of the approach, at best we can hope to create a field of hydropathy for library folds and/or define core/non-core residues.

If somehow these ideas could be transformed into a true threading method, taking into account the three-dimensional interactions of query residues placed `inside' the library fold, further progress might be made. By creating maps of structure, sequence properties and sequence position (distance along the chain), it would be possible to perform a double dynamic `structural' alignment using the map of the library sequence and structure. For example, pairwise vectors and could be compared in a low level alignment to determine $S_{i,j}$ in a similar way to SSAP. The vector would be a two dimensional vector between the map nodes corresponding to the mapping of input vectors and respectively. Vector would be likewise defined. The input vectors for these mappings , , and are defined below. Their components, in order, are carbon- $\alpha$ coordinates (, , ) and component(s) of the sequence property vector ().

$\begin{displaymath} v_i = \left ( \begin{array}{c} - \\ - \\ - \\ P^A_i \\ \end{array} \right ) \end{displaymath}$

(16)

$\begin{displaymath} v_j = \left ( \begin{array}{c} x_j \\ y_j \\ z_j \\ P^B_j \\ \end{array} \right ) \end{displaymath}$

(17)

$\begin{displaymath} v_k = \left ( \begin{array}{c} - \\ - \\ - \\ P^A_k \\ \end{array} \right ) \end{displaymath}$

(18)

$\begin{displaymath} v_l = \left ( \begin{array}{c} x_l \\ y_l \\ z_l \\ P^B_l \\ \end{array} \right ) \end{displaymath}$

(19)

Dashes indicate that the atomic coordinates are not used to find the closest map vector. In the mapping of query sequence input vectors and using the library sequence-structure map, the sequence property vectors should contain sufficient information to direct the mapping in the absence of three dimensional information. Such information could include separate components of hydrophobicity, conservation and secondary structure prediction as already discussed, and one dimensional sequence position (with a low weighting) might also help.

Another approach might be to use self-organising maps in a completely different way to generate multiple sequence/profile based distance potentials for threading, which cannot currently be represented algorithmically[Taylor, 1997]. The implementation would be slow, because using the mapping to generate preferred distances from pairs of profiles would involve a complete search of the potential map instead of the quickly accessed array implementation of single sequence distance potentials. Testing on a few alignments would be feasible however, and processing power will inevitably increase to make full library searches possible. A similar, and possibly faster approach could be to use feed-forward neural networks, trained on pairs of profiles with known distance separations.

Self-organising maps have not been shown here to improve alignments using structural information. This work has, however, shown that the self-organising principle has interesting potential in sequence and structural studies.

Next: Concluding remarks Up: Possible developments Previous: Structure-mapping of sequence information Contents