Introduction

Fold recognition and sequence database searching methods share the common aim of identifying distant ancestral relationships between sequences. The success of these methods can only be measured when these relationships are accurately known. Structural comparisons give the least ambiguous measures of relatedness; sequence database annotations are often erroneous or absent.

The field of fold recognition has grown around the idea that a query sequence may be compared to known structures and their sequences (the fold library). Using query sequences of known structure, the performance of fold recognition algorithms can, of course, be measured fairly unambiguously. In Section 1.4.2 we discussed the possibility that threading methods might not make full use of the detailed structural information, but instead rely only on general features, such as hydrophobic burial and local secondary structure preferences (like 3D-1D methods).

Solvent accessibility and secondary structure preferences are linearly encoded in the amino acid sequence to a certain extent. Can sequence-only methods effectively use this information and detect these distant similarities quickly and simply? Certainly, profile-based methods (see Section 1.4.1) effectively use the information in multiple sequence alignments. But it is not clear exactly how well these methods perform in comparison to threading and structure-based profile methods (3D-1D) since they are not always rigorously tested on sequences with known structure. In this chapter, a novel sequence-only alignment protocol, called SIVA, is presented, which makes use of multiple sequence alignments and sequence-derived information (SIVA = Sequence-derived Information Vector Alignment). The effectiveness of SIVA is estimated using sets of sequences of known structural relatedness, and compared to the standard Smith Waterman single-sequence method, it is far superior. Comparisons with other fold recognition methods are discussed. Unfortunately, a comparison with other profile-based methods has not been possible. This method, or one of its descendents, will be publicly tested in the blind fold recognition trials at the next CASP meeting.

At the heart of SIVA is the standard global dynamic programming alignment algorithm[Needleman & Wunsch, 1970, and see Section 1.2.1]. Its major novel feature is the use of the Euclidean distance between property vectors in the calculation of suitability scores, in place of the log-odds score matrix used in the majority of methods. The alignment of sequences on the basis of physico-chemical amino acid properties is not new, however. Kubota et al.kubota:jtb81 and Argos et al.argos:jbc83 explored correlations between the physico-chemical properties of sequence segments. Argosargos:jmb87 then developed the method to include automatic alignment generation and the use of multiple sequence alignments. Five published sequence properties were chosen which gave optimal correlations between structurally aligned sequences: hydrophobicity, turn preference, residue bulk, refractivity index (closely correlated to molecular weight), and anti-parallel strand preference[Argos, 1987, for further details and references]. The method was restricted to the re-evaluation of marginal sequence hits (obtained using conventional techniques) due to computational limitations. Rohde and Borkrohde:cabios93 developed an algorithm to align sequences on the basis of conservation of binary property vectors in multiple sequence alignments. Taylor and Thorntontaylor:jmb84 searched $\alpha/\beta$ type proteins for patterns of hydrophobicity and predicted secondary structure resembling observed patterns in the $\beta\alpha\beta$ super-secondary motif.

In the following section, the types of sequence-derived information suitable for remote homology detection are discussed. The SIVA method is then described and its performance in fold recognition is measured.

Subsections

Sequence properties and derived information

Next: Sequence properties and derived Up: Alignments of multiple sequence-derived Previous: Alignments of multiple sequence-derived Contents