advertisement: compare things at compare-stuff.com!
next up previous contents
Next: Sequence length and noise Up: Class prediction Previous: Pairwise similarity   Contents

Jack-knifing

The previous discussion on dataset bias has ignored the important distinction between measures of self-consistency and true measures of prediction accuracy. The method has been traditionally implemented so that the centroids are calculated once with the whole dataset and predictions are then made for each member of the dataset. This only produces a measure of self consistency. We have implemented full jack-knifing (also known as leave-one-out cross-validation) into the method so that the class centroids contain no information from the sequence being predicted. The results obtained with the jack-knifed implementation for the full 3-class 1996 CATH database are shown in Table 3.1 (Result 7); the overall accuracy is 57% (worse by all measures than the non-jack-knifed Result 5). As dataset size increases, the need for jack-knifing is diminished because the information is diluted in the calculation of the centroids (data not shown). However, the results which follow are all jack-knifed.



Copyright Bob MacCallum - DISCLAIMER: this was written in 1997 and may contain out-of-date information.