advertisement: compare things at compare-stuff.com! |
Fold recognition and sequence database searching methods share the common aim of identifying distant ancestral relationships between sequences. The success of these methods can only be measured when these relationships are accurately known. Structural comparisons give the least ambiguous measures of relatedness; sequence database annotations are often erroneous or absent.
The field of fold recognition has grown around the idea that a query sequence may be compared to known structures and their sequences (the fold library). Using query sequences of known structure, the performance of fold recognition algorithms can, of course, be measured fairly unambiguously. In Section 1.4.2 we discussed the possibility that threading methods might not make full use of the detailed structural information, but instead rely only on general features, such as hydrophobic burial and local secondary structure preferences (like 3D-1D methods).
Solvent accessibility and secondary structure preferences are linearly encoded in the amino acid sequence to a certain extent. Can sequence-only methods effectively use this information and detect these distant similarities quickly and simply? Certainly, profile-based methods (see Section 1.4.1) effectively use the information in multiple sequence alignments. But it is not clear exactly how well these methods perform in comparison to threading and structure-based profile methods (3D-1D) since they are not always rigorously tested on sequences with known structure. In this chapter, a novel sequence-only alignment protocol, called SIVA, is presented, which makes use of multiple sequence alignments and sequence-derived information (SIVA = Sequence-derived Information Vector Alignment). The effectiveness of SIVA is estimated using sets of sequences of known structural relatedness, and compared to the standard Smith Waterman single-sequence method, it is far superior. Comparisons with other fold recognition methods are discussed. Unfortunately, a comparison with other profile-based methods has not been possible. This method, or one of its descendents, will be publicly tested in the blind fold recognition trials at the next CASP meeting.
At the heart of SIVA is the standard global dynamic programming alignment
algorithm[Needleman & Wunsch, 1970, and see
Section 1.2.1]. Its major novel
feature is the use of the Euclidean distance between property vectors in
the calculation of suitability scores, in place of the log-odds score
matrix used in the majority of methods. The alignment of sequences on the
basis of physico-chemical amino acid properties is not new, however. Kubota
et al.kubota:jtb81 and Argos et
al.argos:jbc83 explored correlations between the
physico-chemical properties of sequence segments.
Argosargos:jmb87 then developed the method to include automatic
alignment generation and the use of multiple sequence alignments. Five
published sequence properties were chosen which gave optimal correlations
between structurally aligned sequences: hydrophobicity, turn preference,
residue bulk, refractivity index (closely correlated to molecular weight),
and anti-parallel strand preference[Argos, 1987, for further details and
references]. The method was restricted to the re-evaluation
of marginal sequence hits (obtained using conventional techniques) due to
computational limitations. Rohde and Borkrohde:cabios93
developed an algorithm to align sequences on the basis of conservation of
binary property vectors in multiple sequence alignments. Taylor and
Thorntontaylor:jmb84 searched type proteins for
patterns of hydrophobicity and predicted secondary structure resembling
observed patterns in the
super-secondary motif.
In the following section, the types of sequence-derived information suitable for remote homology detection are discussed. The SIVA method is then described and its performance in fold recognition is measured.