advertisement: compare things at compare-stuff.com! |
Fold recognition algorithms are concerned with the identification of distant homologues whose structural similarity is usually confined to the topological arrangement of secondary structures in the core. Details such as the length and conformation of loops and termini, and the precise numbers and lengths of secondary structures are generally different, as a consequence of both neutral and functionally constrained evolution. Most published algorithms have taken this into account, either by concentrating at the level of secondary structure elements[Russell et al., 1996], by down-weighting matches in loop regions, or by eliminating non-core residues from alignments[Jones et al., 1992,Madej et al., 1995].
The core of a protein fold can be defined in many ways: from single structures, alignments of multiple structures and alignments of multiple sequences. A number of methods have been proposed to locate the cores of single structures[Zehfus, 1995,Swindells, 1995b,Tsai & Nussinov, 1997] and the closely related problem of automatic domain definition[Swindells, 1995a,Siddiqui & Barton, 1995,Islam et al., 1995]. These methods attempt to optimally cluster hydrophobic, solvent-inaccessible, and secondary structure-forming residues; but this is an indirect route to the identification of key interactions necessary for the integrity of fold. The clusters found may not always be equivalent between distant homologues and structural analogues. Multiple structure-based alignments produce the best results but rely on the availability of suitable structures. Uninterrupted blocks of conserved amino-acids in multiple sequence alignments often give a reasonable indication of the core of a protein. However patterns of conservation can also result from family-specific functional constraints, rather than the global cross-family structural features fold recognition methods attempt to detect. A means by which the `essence' of a fold could be elucidated from a single structure and a multiple sequence alignment would be an important refinement to existing fold-recognition algorithms. In this chapter, we investigate whether self-organising maps[Kohonen & Makisara, 1989,Kohonen, 1990] are able to extract the essential features of protein structures and their associated sequence information. This is partly tested by applying the SIVA fold recognition algorithm (explained in Chapter 4) to `mapped' sequence information.