Next: Ribbon or not ribbon?
Up: Hierarchical class and architecture
Previous: Distinction between mainly- and
  Contents
Prediction of mainly-
architectures
The compositional approach is suited to the prediction of any pre-defined
sub-grouping of sequences, not only secondary structural class. For
example, in Chapter 1 the prediction of sub-cellular
location from amino-acid composition was reviewed. The CATH architectures
for the mainly-
domains (see Figure 3.3) have been
predicted using exactly the methods used above. The dataset contains 124
domains (the domains of five architectures with single representatives were
removed). Overall, 50% of the 124 domains were predicted correctly,
against a background of 25% expected accuracy using random prediction
based on prior probabilities. The reliability quartile accuracies are
61%, 58%, 48% and 32%. The detailed results for each architecture are
given in Table 3.3. Most notably, the best quality
predictions are for the ribbon architecture, with a Matthews correlation
coefficient
, and 71% correct. The barrel and sandwich
architectures are also predicted better than random (
and
). None of the 10 distorted sandwiches is predicted
correctly; 4 of them are predicted as (normal) sandwiches. This result is
interesting because it suggests that similarities in architecture are
reflected by similarities in amino acid pattern composition. The remaining
architectures have few representatives and are likely to compromise the
overall prediction accuracy (see Section 3.2.3).
Figure 3.3:
Common mainly-
architectures. Figures generated using
Molscript[Kraulis, 1991].
(a) Ribbon (2tgi00)
|
(b) Barrel (1mjc00)
|
(c) Sandwich (1hlcA0)
|
(d) Distorted Sandwich (1bcx00)
|
|
Table 3.3:
Architecture prediction
within the mainly- class.
|
|
|
|
|
reliability
quartiles1 |
Architecture |
CATH |
n2 |
Q%3 |
4 |
1 |
2 |
3 |
4 |
Ribbon |
2.10 |
17 |
71 |
0.634 |
3/3 |
1/2 |
3/5 |
5/7 |
Barrel |
2.40 |
25 |
60 |
0.393 |
4/7 |
7/8 |
4/8 |
0/2 |
Sandwich |
2.60 |
52 |
65 |
0.345 |
11/13 |
10/15 |
8/14 |
5/10 |
2 Solenoid |
2.150 |
3 |
33 |
0.158 |
1/2 |
0/0 |
0/0 |
0/1 |
Single Sheet |
2.20 |
3 |
0 |
-0.014 |
0/0 |
0/0 |
0/1 |
0/2 |
Trefoil |
2.80 |
4 |
0 |
-0.016 |
0/1 |
0/1 |
0/0 |
0/2 |
Complex |
2.170 |
2 |
0 |
-0.020 |
0/1 |
0/1 |
0/0 |
0/0 |
7 Propellor |
2.130 |
2 |
0 |
-0.020 |
0/0 |
0/1 |
0/1 |
0/0 |
Distorted Sandwich |
2.70 |
10 |
0 |
-0.027 |
0/4 |
0/3 |
0/0 |
0/3 |
Roll |
2.30 |
2 |
0 |
undef |
0/0 |
0/0 |
0/1 |
0/1 |
Aligned Prism |
2.100 |
2 |
0 |
undef |
0/0 |
0/0 |
0/1 |
0/1 |
4 Propellor |
2.110 |
2 |
0 |
undef |
0/0 |
0/0 |
0/0 |
0/2 |
|
- 1 Predictions are sorted by the reliability
measure and split into quartiles. 1=most reliable, 4=least
reliable. Numbers show number correct out of number predicted.
- 2 number of domains in dataset
- 3 overall accuracy
- 4 Matthews correlation coefficient for each architecture
|
Subsections
Next: Ribbon or not ribbon?
Up: Hierarchical class and architecture
Previous: Distinction between mainly- and
  Contents
Copyright Bob MacCallum
- DISCLAIMER: this was written in 1997 and may contain out-of-date information.