advertisement: compare things at compare-stuff.com! |
The method of Nakashima et al.nakashima:cpred has been
applied to new datasets and adapted to use different information. In
essence the method involves calculating mean amino acid composition
vectors, or centroids, for each secondary structural class of protein
domain. The normalisation of amino acid composition vector components has
been performed using means and standard deviations calculated from the
sequences used in the predictions themselves (see below), rather than from
a different set of sequences as in the original paper. A normalised amino
acid composition vector is calculated for each sequence to be predicted
(query). Class assignments (i.e. predictions) are then made according to
the class of the centroid nearest to the query vector using the Euclidean
distance metric. In addition to the mean prediction accuracy over the
whole dataset, , the Matthews correlation
coefficient[Matthews, 1975],
, for each class
is
presented for most of the results. Near-zero values for
indicate
random predictions whilst for a perfect prediction
.