advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Alignment quality for doubly Up: Alignment of structure-mapped sequence Previous: Methods fold   Contents

Results -- fold recognition using structure mapped library sequences

Whilst the best results using raw information were obtained using secondary structure prediction and hydrophobicity combined (Table 4.9), the initial experiments using mapping, for simplicity, use just hydrophobicity (Kyte and Doolittle). Figure 5.12 shows the results of 17 different fold recognition trials conducted using independent mappings employing different ratios of sequence to structure information. According to the mean adjusted rank measure (lower half of Figure 5.12(a)), fold recognition performance does not improve beyond the baseline value of $\overline{R}_{adj}=11.4$ obtained using alignments of the same unmapped information. In fact its lowest levels (ratios between 4.5:1 and 5.5:1) appear to be about equal to this value. The poor performance for low ratios is expected since the maps are heavily biased towards structure.

Figure 5.12: Fold recognition performance using structure-mapped hydrophobicity information -- different ratios of sequence:structure information (see Section 5.4.1). (a) upper half: Mean alignment shift error over trials (27 queries, 82 library domains as above) (b) lower half: Mean adjusted rank over trials. (c) Number of correct top ranking predictions (out of 27). Baseline or benchmark values are shown for the same trials using unmapped hydrophobicity information. Alignment quality and the number of correct top ranking predictions appear to be marginally improved, whilst ranking is not.
\begin{figure}\begin{center}
\par (a)~\epsfig{file=chap5/figs/mapunmap_a.eps,wid...
...apunmap_b.eps,width=\oneandahalf}\par\vspace{0.5in}
\par\end{center}\end{figure}

The results for alignment quality are more promising. The upper part of Figure 5.12(a) shows that the mean alignment shift error, $\overline{S}$, is lower than the baseline value from alignments of unmapped information ( $\overline{S}=22.4$) for nearly all mappings with sequence:structure ratios greater than 3:1.

The number of correct top predictions, $T$, from this experiment is shown in Figure 5.12(b). Values of $T=10$ occur three times in the region of the plot corresponding to the minima in Figure 5.12(a), suggesting that the mapping does improve the specificity of fold recognition. However these data are very noisy (this is expected using discrete integer measures) and a result of $T=7$ occurs between two of these top scoring trials. We repeated this experiment on a larger dataset (78 queries, 197 library domains) and obtained fewer correct hits than the unmapped sequence control (23) using sequence to structure ratios 4:1, 4.5:1, 5:1. Fold recognition is not improved using mapped sequences, with the methods presented here.

The problem with the results presented here is that we are trying to compare results from multiple mappings and alignments with a single result from alignments of unmapped data. From Figure 5.12 it is clear that the results are prone to noise. It is not unreasonable to suppose that the alignments of unmapped sequence information are not perfectly stable. Therefore replicate fold recognition trials should be performed with both mapped and unmapped data to assess the spread of likely results. For the mapping protocol this is easy; the self-organising mapping algorithm is non-deterministic and suitable for replication. Alignments of unmapped information are deterministic and cannot be repeated. Furthermore, since a full fold recognition trial, including map generation, takes a few hours, the number of repeat experiments cannot be very large. Alternative approaches were devised to investigate the results more thoroughly.


next up previous contents
Next: Alignment quality for doubly Up: Alignment of structure-mapped sequence Previous: Methods fold   Contents
Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.