Overview of algorithm

advertisement: compare things at compare-stuff.com!

Next: Dataset size Up: Class prediction Previous: Class prediction Contents

Overview of algorithm

The method of Nakashima et al.nakashima:cpred has been applied to new datasets and adapted to use different information. In essence the method involves calculating mean amino acid composition vectors, or centroids, for each secondary structural class of protein domain. The normalisation of amino acid composition vector components has been performed using means and standard deviations calculated from the sequences used in the predictions themselves (see below), rather than from a different set of sequences as in the original paper. A normalised amino acid composition vector is calculated for each sequence to be predicted (query). Class assignments (i.e. predictions) are then made according to the class of the centroid nearest to the query vector using the Euclidean distance metric. In addition to the mean prediction accuracy over the whole dataset, , the Matthews correlation coefficient[Matthews, 1975], , for each class is presented for most of the results. Near-zero values for indicate random predictions whilst for a perfect prediction .