advertisement: compare things at compare-stuff.com!
next up previous contents
Next: CATH descriptions Up: Gathering multiple sequences for Previous: Indel removal   Contents

Removal of redundant sequences (weeding)

The process by which sequences in a multiple sequence alignment were made non-redundant on the basis of their pairwise sequence identity is described in the legend to Figure A.2.

Figure A.2: Weeding multiple sequences to create a non-redundant set. The result of this process is a set of sequences with no more than $T\%$ pairwise sequence identity. In this example, $T=70\%$. See the referring chapters for the exact values used. (a) The multiple sequence alignment, from which percentage pairwise identities ($\%_{id}$) (not counting aligned gaps) are calculated directly, as shown in (b). (c) The sequence pairs are sorted by $\%_{id}$. If the top pair (WZ in the example) has $\%_{id}>T$ (70%) then the sequence from this pair which next occurs in the list (W) is eliminated from consideration (i.e. all pairs containing W are removed, as shown by an asterisk). Step (c) is repeated until the highest $\%_{id}$ is less than the threshold, $T$. All sequences belonging to the remaining pairs (d) form the non-redundant set shown in (e).
\begin{figure}\begin{center}
\par\epsfig{file=appendix/weed.eps,width=\onetoapage}\par\end{center}\end{figure}


next up previous contents
Next: CATH descriptions Up: Gathering multiple sequences for Previous: Indel removal   Contents
Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.