advertisement: compare things at compare-stuff.com! |
Most of the folds correctly recognised with SIVA have lengths within +/-10% of the query domain. The alignment and scoring parameters do not permit high scoring alignments with large differences in length or with large insertions. This may not sound particularly serious, but using small fold libraries and global alignments, the fold recognition algorithm is in effect using sequence length as a discriminatory factor. With larger fold libraries, there will be more folds with similar lengths, but different topologies, and so the fold recognition task will become less length-oriented. The results from the 78 vs. 197 trials were comparable with those from the smaller trial. The algorithm is obviously using more than just length information, but an even larger trial ought to be performed.
It would be desirable to detect similarities between sequences of different lengths using local alignments. We modified the alignment code to do just this and found that using the Euclidean distance measure, the local alignments were similar to the global alignments in extent (i.e. they were not local at all!). This could be because our Euclidean similarity score is behaving like a very low-contrast (tolerant) substitution matrix, and the dynamic programming algorithm is always able to trace a complete path through the matrix.