Alignment quality

advertisement: compare things at compare-stuff.com!

Next: Future developments of SIVA Up: Analysis and Discussion Previous: Can we recognise all Contents

Alignment quality

In a paper concerning alignment accuracy and the incorporation of multiple sequence information into threading algorithms, Taylortaylor:mst found that a combination of 3D-1D environment scoring and full 3D threading produced the best alignments. Disregarding the pairwise packing term, the double dynamic programming and other details, the algorithm is much like ours in its scoring of local matches of conserved hydrophobicity and (all or nothing) secondary structure predictions without the use of log-odds matrices. The contribution of the pairwise threading term was found to be small but important for stable alignments. We have not thoroughly investigated alignment stability (but see the Chapter 5). Fortunately, one of Taylor's three example alignments, for which alignment shift data are given, has a close equivalent in this work allowing a comparison to be made. Using the SCOP classification Taylor chose to align a flavodoxin fold 4fxn with a chemotaxis-Y protein-like fold 3chy. 4fxn00 is in the fold library of this study, as is another chemotaxis-Y protein-like fold, 1ntr00 belonging to the same SCOP family. Taylor reported a mean alignment shift of 2.79 over all equivalences. We report a shift of 2.4 (Table 4.9) using the same structural comparison algorithm (SSAP), but obviously with a slightly different pair of proteins. The tentative conclusion is that our sequence-only alignments are comparable with much more sophisticated methods involving double dynamic threading. Our alignments of immunoglobulin-like folds are much worse ( mean shift) than those reported by Taylor ( $\approx 5$ ), although interestingly both algorithms perform worse on these folds than the flavodoxin example.

Alignment quality has, in general, been very variable. Mainly- $\beta$ proteins and the larger domains were aligned least well ( $\beta\alpha\beta$ register shifts were observed in a number of high scoring TIM barrels alignments). Amphipathic helices presumably provide strong enough phase information in the hydrophobicity information to steer the alignments towards the correct equivalences. In Figure 4.4(c) the alignment shifts are plotted against the fraction structurally aligned by SSAP. The two quantities appear to be inversely correlated, but there is a massive spread of alignment qualities for the more dissimilar pairs. In particular, good alignments can be obtained for pairs of proteins with a small fraction of alignable carbon- $\alpha$ atoms, but these may be occurring by chance. Figure 4.4(d) shows that above a Z-score threshold of 1.0, the majority of alignments have mean shifts less than 20. Below this threshold the alignment shifts, again, have a large spread.

With hindsight, more robust measures of alignment quality should have been used. The percentage of correctly aligned residues (within, say, 4 residues to allow for shifts along helices) would reward correct alignment without penalising neighbouring sections of non-specific alignment. The mean shift measure is dominated by this non-specific noise, and worse still, is sequence length dependent. The mean shift is still useful when the alignment is generally correct. The fraction of alignments with low mean shifts (say, less than 10) would have been less noise prone than the average mean shift over all alignments. Agreement with the structural alignment is not always the ultimate goal. Wholesale shifts of secondary or super-secondary structures may occur in alignments, but the alignment could still be used to produce an approximate 3D model (for example, with TIM barrels). Measures of alignment quality at the secondary structure level would be desirable in these situations.

Next: Future developments of SIVA Up: Analysis and Discussion Previous: Can we recognise all Contents