advertisement: compare things at compare-stuff.com! |
The CATH database from
this laboratory[Orengo et al.,
1997] is a more ambitious project. Using
semi-automated methods a set of domains from the single- and multi-domain
structures of the PDB is hierarchically classified at four levels: Class, Architecture, Topology and Homologous
superfamily. These levels are described both by names (for ease of human
understanding) and numbers (for easy computer manipulation). In this
latter form CATH is similar to the EC enzyme classification
scheme[NC-IUBMB, 1992].
Class has already been described and can be one of: mainly-, mainly-
, mixed-
and irregular. The unique feature of CATH is the
architectural description at the next level in the hierarchy. Many of the
folds (in the mainly-
and mixed-
classes in particular)
appear to be constructed according to the same basic principles. For
example a sizeable subset of mixed-
folds can be thought of as
having three layers: two of helix surrounding a single central
layer of
-sheet; this is the 3-layer
sandwich
architecture. Architectures are defined manually for the whole of fold
space. Folds within the same architectural subdivision may have different
numbers and ordered connections of secondary structures, and are
discriminated by the topology descriptor. The final level groups all
domains belonging to the same homologous superfamily. These are structures
which are clearly or weakly related by sequence but have the same function
and are most likely evolutionarily related.
The topology and homology classifications of CATH are performed by the
automated application of the SSAP algorithm and sequence comparisons. The
numerical cutoffs are described in full on the CATH web
site. In
certain dense regions of fold space (such as in the 3-layer
sandwich architecture) more strict thresholds were
required to subdivide topologies into smaller and more informative
groupings. Only through consultation with users and a deeper understanding
of fold evolution will classifications of protein structures be completely
`error-free'.