SBC logo

Bob MacCallum has moved

Division of Cell & Molecular Biology
Imperial College London
South Kensington Campus
London SW7 2AZ

I still get mail on my SBC address and continue to support my web services.

My Imperial web page is here.


Group members

An up-to-date list can be obtained from the SBC personnel database (also: same list plus ex-members).

Research at SBC

The challenge I set myself when moving to Sweden was to make significant advances in protein structure prediction using a fairly new kind of evolutionary machine learning algorithm called genetic programming (see below). The group as a whole has performed research in a number of other areas.

Research highlights

A major contribution to the community has been the release of my Perl genetic programming system PerlGP under the GPL licence, importantly allowing others to replicate and scrutinise my research. As an example of an "off-the-shelf" application of PerlGP, we have evolved simple Perl expressions to predict the nuclear localisation of proteins from amino acid sequence. This project, codenamed NucPred, is being prepared for publication and through our web service will provide novel clues for experimental biologists.

We also have an interest in other biologically inspired computational techniques with emergent properties, such as Kohonen's self-organising map (SOM). We showed that the SOM has a novel use in finding optimal views for 3D protein structures (project OVOP, [pdf]). SOMs and genetic programming are currently being used in contact prediction and prediction of ordered/disordered regions in proteins.

Research lowlights

I have worked on secondary structure prediction using evolved regular expressions, with undramatic results [pdf]. Early on, we looked at the relationship between the "non-localness" of protein structures (measured by contact order) and secondary structure patterns and prediction accuracy, but unfortunately no firm conclusions could be drawn. Self-organised mate selection was introduced into the genetic programming algorithm, but it was very difficult to show if it helped in the search for better solutions.

Genetic programming

Like other evolutionary algorithms, genetic programming (GP) simulates the processes of fitness-based selection, reproduction and mutation seen in natural populations. In GP we evolve populations of computer programs, expressions or subroutines that should solve some particular task. Typically, the size and structure of these programs is not limited - so that complex features and substructures may evolve and be swapped around in processes similar to biological recombination. Other biologically inspired machine learning techniques, such as artificial neural networks, have been very useful for finding patterns and trends in data, however they typically have a fixed architecture and repertoire of operations, and so cannot be expected to be able to solve all problems. GP-derived solutions can be, in theory at least, infinitely expressive - that is to say that they can contain conditional statements, loops and memory which should be sufficient to compute anything. In practice however, GP is not yet routinely providing solutions to the world's problems. One issue is, of course, the size of the search space that is often encountered with real-world problems. A deeper challenge is to overcome the crudeness of our algorithms and problem definitions (e.g. fitness measures) that is seen when they are compared to real biological systems.


Structure prediction servers

  1. Disordered (non-structured) regions in proteins (more...)
  2. Residue-residue contact predictions (more...)


A new nuclear localization prediction tool, and analysis of eukaryotic proteomes.

PerlGP - PerlGP

I am the author of PerlGP - an open source, fully featured, Perl-based genetic programming system.

OVOP - Optimal Views Of Proteins

Check out my ex-student Oscar Sverud's project OVOP to find the best automatic views of proteins. [get the colour preprint here]


A fun toy I made a few years ago. You can compare things based on web search or PubMed document totals. There were similar tools out there before compare-stuff (many are now dead) but this is a little bit more sophisticated. Here are some examples of things you can try.


Automatic Discovery of Cross-Family Sequence Features Associated with Protein Function. Markus Brameier, Josien Haan, Andrea Krings & Robert M. MacCallum. BMC Bioinformatics 2006, 7:16, supplementary information.

Striped sheets and protein contact prediction. Robert M. MacCallum. Accepted for oral presentation at ISMB/ECCB 2004 (peer reviewed, to be published in a special issue of Bioinformatics). Here is a preprint of the paper and the slides from my talk.

Evolving Regular Expression-based Sequence Classifiers for Protein Nuclear Localisation; Amine Heddad, Markus Brameier and Robert M. MacCallum; Accepted (by peer-review) at the 2nd European Workshop on Evolutionary Bioinformatics (EvoBIO2004); [preprint]

Evolved Matrix Operations for Post-Processing Protein Secondary Structure Predictions; Varun Aggarwal and Robert M. MacCallum; Accepted (by peer-review) at the 7th European Conference on Genetic Programming (EuroGP2004); [preprint]

Towards optimal views of proteins; Oscar Sverud and Robert M. MacCallum; Bioinformatics Vol. 19 no. 7 2003 Pages 882-888; [abstract] [colour preprint].

Introducing a Perl Genetic Programming System -- and Can Meta-evolution Solve the Bloat Problem?; Robert M. MacCallum; In Proceedings of the 6th European Conference on Genetic Programming (EuroGP2003); Lecture Notes in Computer Science 2610: 364-373 2003; [preprint].

Evolving Perl code for protein secondary structure prediction [extended abstract] [slides from talk] presented at PPSN 2002 workshop: Evolutionary and Neural Computation in the BioSciences.

You can also see a list of all my publications in PubMed, including some from my post-doc in London.

Teaching resources

Various lecture handouts of variable depth and quality.

Previous research

During my post-doc with Mike Sternberg at the Imperial Cancer Research Fund in London (now at Imperial College, London) I was working on remote homology detection methods and protein sequence annotation. In particular I was involved with the 3D-PSSM fold recognition algorithm and server. This has always done quite well in CASP and LiveBench (and is now feeding hungry meta-servers). I also did some relatively pioneering work on automatic functional annotation comparison in a project called SAWTED. I was originally a co-architect of the 3D-GENOMICS annotation pipeline.

My PhD thesis, supervised by Janet Thornton then at UCL in London, is also available on the web.

Publication policy

PLOS Wherever possible, I communicate the fruits of my research through open-access publishing channels. At the moment, the exceptions are certain unavoidable conference proceedings (e.g. EuroGP and, fingers crossed..., CASP). Group members with opposing views will not be restrained from publishing where they wish.


Miscellaneous links and non-work stuff...

[an error occurred while processing this directive]