Concluding remarks

How will the work presented in this thesis further our understanding of biological systems? In this chapter, some broad conclusions and possible future applications of these studies are discussed.

Perhaps the most imminent and widespread use of computer modelling and prediction in molecular biology will be in the screening of potential therapeutics to disrupt protein function. Although `wet' experimental screening protocols with high throughput are currently possible with the aid of robotics, computational methods will inevitably be cheaper and more productive. The search for effective therapeutics is very much a `needle in a haystack' problem, and as such requires high levels of specificity. In nature, the production of antibodies in vertebrates follows a similar paradigm: vast numbers of potentially useful molecules are produced, of which only a few are used to fight infection. The survey of antibody structures presented in Chapter 2 highlights the diversity of molecules which antibodies can recognise. Advances have already been made in the in vitro screening of specific antibody-like molecules made by filamentous phage[Winter et al., 1994]. The engineering and de novo design of antibodies to suit a number of purposes including drug delivery, catalysis and nanotechnology is not yet widely successful. One of the main hindrances to progress in this area was identified in Chapter 2. Although antibodies have conserved global structure, we showed that the CDR residues involved in antigen contact cannot be predicted with great certainty and that the shape of the combining site does not follow the `rules' we proposed strictly. In protein structure analysis, similar problems arise. In Chapter 3, features in global sequence composition were shown to correlate with secondary structural class and certain architectures, however none of the correlations were perfect.

Antibodies and other proteins have not evolved in order to be understood. In most cases, evolutionary processes operate at the level of the organism, and their downstream effects on the cellular protein machinery are more difficult to rationalise than gross phenotypic features. For example, convergent evolution clearly explains the similarity between the wings of bats and birds, but the abundance of the TIM barrel topology is less easily explained. Despite these conceptual difficulties, the power of evolution can be harnessed in many areas of research. Using artificial implementations of random mutation, crossover and selection, genetic algorithms (GAs) can be more effective than conventional techniques (for example Monte Carlo methods) in searching solution-space for large problems. This has been shown for a simplified representation of the protein folding problem using conformational search for a global energy minimum[Unger & Moult, 1993]. Our attempts to find optimal sequence patterns for class prediction using GAs were not successful, but this does not imply that they cannot be useful. It may be possible to incorporate long-range interactions into structure prediction using self-organised rules or principles generated using GAs.

Another self-organising competitive algorithm, Kohonen's self-organising map, [Kohonen & Makisara, 1989,Kohonen, 1990] was investigated in Chapter 5 in an attempt to capture the essence of protein folds and sequences. The filtered sequence information using this technique did not improve fold recognition using the SIVA algorithm presented in Chapter 4. However a possible link between the self-organising process and protein stability was uncovered in the course of this work. Unfortunately, time limitations did not permit a full investigation of this phenomenon.

Advances in the prediction and analysis of protein structure and interactions arise either through incremental improvements in techniques or from quantum leaps in understanding. The latter are rare in this field, whilst the former are frustrated by the huge volume of data and literature. Particular difficulties result from the ever increasing size of the structure and sequence databases. For example, analyses from the 1980s may have been inconclusive due to the lack of data, but it is unlikely that this work will be repeated using modern data, due to the pressures of the research environment.

Incremental advances also require the integration of current and past knowledge into single tools. This integration requires significant investment into software design and collaboration. Black-box methods such as GAs and neural networks may help to combine information from multiple sources and make decisions based on it. As methods become more complex, they require more training data and more careful testing. Fortunately, the integration of class and architecture predictions with fold recognition methods presented in this thesis would be a relatively simple step forward in this field.

Next: Gathering multiple sequences for Up: Computational Analysis of Protein Previous: Fold recognition and self-organising Contents