advertisement: compare things at compare-stuff.com! |
In a paper concerning alignment accuracy and the incorporation of multiple
sequence information into threading algorithms,
Taylortaylor:mst found that a combination of 3D-1D environment
scoring and full 3D threading produced the best alignments. Disregarding
the pairwise packing term, the double dynamic programming and other
details, the algorithm is much like ours in its scoring of local matches of
conserved hydrophobicity and (all or nothing) secondary structure
predictions without the use of log-odds matrices. The contribution of the
pairwise threading term was found to be small but important for stable
alignments. We have not thoroughly investigated alignment stability (but
see the Chapter 5). Fortunately, one of Taylor's three
example alignments, for which alignment shift data are given, has a close
equivalent in this work allowing a comparison to be made. Using the SCOP
classification Taylor chose to align a flavodoxin fold 4fxn with a
chemotaxis-Y protein-like fold 3chy. 4fxn00 is in the fold library of this
study, as is another chemotaxis-Y protein-like fold, 1ntr00 belonging to
the same SCOP family. Taylor reported a mean alignment shift of 2.79 over
all equivalences. We report a shift of 2.4
(Table 4.9) using the same structural comparison
algorithm (SSAP), but obviously with a slightly different pair of proteins.
The tentative conclusion is that our sequence-only alignments are
comparable with much more sophisticated methods involving double dynamic
threading. Our alignments of immunoglobulin-like folds are much worse
( mean shift) than those reported by Taylor (
), although
interestingly both algorithms perform worse on these folds than the
flavodoxin example.
Alignment quality has, in general, been very variable. Mainly-
proteins and the larger domains were aligned least well (
register shifts were observed in a number of high scoring TIM barrels
alignments). Amphipathic helices presumably provide strong enough phase
information in the hydrophobicity information to steer the alignments
towards the correct equivalences. In Figure 4.4(c) the
alignment shifts are plotted against the fraction structurally aligned by
SSAP. The two quantities appear to be inversely correlated, but there is a
massive spread of alignment qualities for the more dissimilar pairs. In
particular, good alignments can be obtained for pairs of proteins with a
small fraction of alignable carbon-
atoms, but these may be
occurring by chance. Figure 4.4(d) shows that above a
Z-score threshold of 1.0, the majority of alignments have mean shifts less
than 20. Below this threshold the alignment shifts, again, have a large
spread.
With hindsight, more robust measures of alignment quality should have been used. The percentage of correctly aligned residues (within, say, 4 residues to allow for shifts along helices) would reward correct alignment without penalising neighbouring sections of non-specific alignment. The mean shift measure is dominated by this non-specific noise, and worse still, is sequence length dependent. The mean shift is still useful when the alignment is generally correct. The fraction of alignments with low mean shifts (say, less than 10) would have been less noise prone than the average mean shift over all alignments. Agreement with the structural alignment is not always the ultimate goal. Wholesale shifts of secondary or super-secondary structures may occur in alignments, but the alignment could still be used to produce an approximate 3D model (for example, with TIM barrels). Measures of alignment quality at the secondary structure level would be desirable in these situations.