advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Alignments of hydrophobicity-related information Up: Results Previous: Results   Contents

Baseline comparison -- Smith Waterman searches of the fold library

We hope to show that the new method developed here performs better than current methods, both threading and sequence based. Starting with sequence based methods, the best control experiment is to apply the Smith Waterman[Smith & Waterman, 1981] local sequence alignment search program, SSEARCH, which is part of the FASTA package[Pearson, 1990]. Unlike FASTA, which initially screens sequences using a fast but approximate method for scoring diagonals in the alignment matrix, SSEARCH performs complete local alignments for every comparison. Whilst SSEARCH is only a single sequence method, we improved its chances of success by creating a sequence library from the multiple sequence homologues of the fold library domains (1421 sequences in total). We also used multiple query sequences: for each query domain, each sequence homologue was scanned against all 1421 sequences (except those derived from the query), and the resulting local alignment scores were ranked and the top alignment selected. Gap penalties of 12 (opening) and 2 (extension) were used with the BLOSUM50 matrix, according to the recommendations in the literature[Henikoff, 1996,Pearson, 1995].

Using this protocol, the number of correct non-self top-ranking folds, $T$, was 7. Details of these hits are presented in Table 4.2. Mean adjusted rank and alignment shifts were not calculated. It is clear from these results that the dataset is not strictly non-homologous. Whilst none of the pairs of domains are more than 20% identical by global alignment methods, some are clearly detectable using standard local alignment methods with a small library of similarly sized domain sequences. Six of the seven hits are pairs which recognise each other. Each of these pairs share similar functions either by E.C. number or SCOP classification. 2ohx and 1gdh are both oxidoreductases acting on the CH-OH group of donors with NAD$^+$ or NADP$^+$ as the acceptor. 1tpf and 1pii are both isomerases which interconvert aldoses and ketoses. 5p21 and 1hur are both in the G-protein family of SCOP. These two pairs of domains are placed in the same homologous superfamilies in the latest release of CATH. The recognition of 1exg00 by 1cgt04 is not so easily explained since in SCOP they have different fold classifications, yet 1exg is in the ``carbohydrate-binding domain'' superfamily, and the equivalent SCOP domain for 1cgt04 is in the ``Starch-binding domain'' superfamily. In CATH, both domains belong to the immunoglobulin-like topology. These domains may be more related than SCOP suggests.





Table 4.2: Details of correct top-ranking fold recognition results using the Smith Waterman local alignment sequence search of the fold library of 82 domains.
query library CATH global local
domain length domain length topology $\%_{id}$ $\%_{id}$ overlap
5p2100 166 1hurA0 180 3.40.330 19.8 30.4 125
1hurA0 180 5p2100 166 3.40.330 19.8 29.6 125
2ohxA2 139 1gdhA2 184 3.40.330 12.0 32.8 64
1gdhA2 184 2ohxA2 139 3.40.330 12.0 32.8 64
1tpfA0 250 1pii02 191 3.20.40 14.8 17.2 215
1pii02 191 1tpfA0 250 3.20.40 14.8 17.2 215
1cgt04 104 1exg00 110 2.60.40 13.4 25.8 62


next up previous contents
Next: Alignments of hydrophobicity-related information Up: Results Previous: Results   Contents
Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.