advertisement: compare things at compare-stuff.com! |
FASTA3[Pearson, 1990] was used to search
SWISS-PROT[Bairoch & Boeckmann, 1991,Bairoch & Apweller, 1997] release 34
(release 33 was used in the first half of Chapter 3) for
sequences similar to each fragment extracted from the PDB file. Default
settings were used and the output was filtered as follows. Locally aligned
fragments were extracted and retained (as `hits') if the percentage
identity was above a threshold,
, where
is the
length of the alignment (overlap). This length-dependent function was
derived empirically and produces a conservative threshold for percent
sequence identity: for lengths 50, 100 and 200,
is 38%, 32% and
27% respectively.
When database searches have been completed for each sequence fragment, the hits are reassembled in order according to the SWISS-PROT identifier of the sequence from which they originated. Fragments are joined with a single `-' character, which is interpreted as a gap in the following steps. The original probe sequence is also joined in this way.
Finally the reassembled hits are filtered such that only those whose lengths are within +/- 20% of the probe sequence are retained, in order to improve the quality of the multiple sequence alignments described below.