advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Dataset size Up: Class prediction Previous: Class prediction   Contents

Overview of algorithm

The method of Nakashima et al.nakashima:cpred has been applied to new datasets and adapted to use different information. In essence the method involves calculating mean amino acid composition vectors, or centroids, for each secondary structural class of protein domain. The normalisation of amino acid composition vector components has been performed using means and standard deviations calculated from the sequences used in the predictions themselves (see below), rather than from a different set of sequences as in the original paper. A normalised amino acid composition vector is calculated for each sequence to be predicted (query). Class assignments (i.e. predictions) are then made according to the class of the centroid nearest to the query vector using the Euclidean distance metric. In addition to the mean prediction accuracy over the whole dataset, $Q_c$, the Matthews correlation coefficient[Matthews, 1975], $C_x$, for each class $x$ is presented for most of the results. Near-zero values for $C_x$ indicate random predictions whilst for a perfect prediction $C_x=1.0$.



Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.