Fold library and query sequences

advertisement: compare things at compare-stuff.com!

Next: Evaluation of fold recognition Up: Methods and Data Previous: Methods and Data Contents

Fold library and query sequences

As discussed in Chapter 1 the fold library is a set of known structures to which we wish to find similarities from a set of sequences of unknown structure, called queries. In order to test fold recognition methods, however, the query sequences must have known structures, of which some or all must be recognisable in the fold library. Thus in these trials the query sequences are taken from fold library structures, and self-recognition is not counted as correct fold recognition.

Due to time restrictions, a dataset of 82 domains was used initially. Complete all-against-all fold-recognition trials using this dataset take around 1-3 hours, allowing a modest number of experiments to determine the behaviour of the system. The 82 domains of the fold library were selected objectively using the following criteria:

non-homologous CATH (homologous superfamily) representatives
100-300 residues
10 or more multiple sequences (with no more than 70% pairwise identity - see Appendix A)

The 82 domains are shown in Table 4.1. To save time in the fold recognition trials, the query set was defined as the subset of the fold library where non-self fold topology recognition (to CAT level) was possible. Thus the query set contains domains belonging to topologies with more than one representative member in the fold library (see Table 4.1). There are 27 domains in this query set, and a total of 96 non-self fold-recognisable pairs. Full trials involving queries not expected to yield correct fold recognitions (null predictions) are discussed in Section 4.4.4.

Next: Evaluation of fold recognition Up: Methods and Data Previous: Methods and Data Contents