SBC logo Authors: Bob MacCallum, Markus Brameier, Andrea Krings and Amine Heddad, Stockholm Bioinformatics Center, Stockholm University, Sweden.


NucPred - Predicting Nuclear Localization of Proteins

NucPred (pronounced newk-pred) analyses a eukaryotic protein sequence and predicts if the protein:

  • spends at least some time in the nucleus
    or
  • spends no time in the nucleus
Don't forget that proteins can have multiple functions and/or multiple subcellular locations. However, if a protein is already known to be secreted or is an integral membrane protein, a second role as a nuclear protein is not likely. NucPred will make a small number of confident but contradictory predictions like this. So please use all sources of biological information (both real and predicted) when interpreting the results.

Services available

Single protein Submit the sequence of a single protein to our server, and get an immediate result showing probable location of important subsequences and a full explanation of the scoring system.
 
Protein family Submit up to 15 related sequences and receive a ClustalW multiple alignment coloured according to the NucPred scores.
 
Eukaryotic proteomes Browse and query the results of NucPred, PredictNLS (Murat Cokol, Rajesh Nair and Burkhard Rost) and PSORT II (Paul Horton and Kenta Nakai) predictions on the proteomes of most fully sequenced eukaryotes and all eukaryotic proteins in the UniProt Knowledgebase.
Example query: Human disease proteins that are not currently known to be nuclear but are confidently predicted (by three independent methods) to be nuclear.
 
3D structure Sorry, no longer available. Perform a NucPred prediction on the sequence of a protein structure in the PDB. View the results in 3D using Rasmol or your favourite structure viewer.
 
Batch mode Obtain the NucPred score for up to 1000 sequences (FASTA format input). Note that the per-residue colouring is not performed (use single protein or family mode for this).
 

How it works

NucPred is an ensemble (or jury) of 100 sequence based predictors. Each is given the sequence of interest and provides a "yes" or "no" answer to the question "does the protein spend some time in the nucleus?". If the fraction of predictors giving a "yes" answer (also known as the NucPred score) exceeds some prior agreed threshold, then the protein is predicted to have a nuclear role.

The individual predictors are evolved using an evolutionary machine learning approach called genetic programming on a set of known nuclear and non-nuclear proteins. The predictors use regular expression pattern matching to make the yes/no decision and the regular expressions are themselves evolved (using the open source PerlGP system). The Perl source code of the evolved predictors and a demo script to use them is provided free of charge to all.

Genetic programming in a nutshell

Genetic programming is an artificial evolutionary algorithm where computer program code is generated automatically - usually in order to perform some predefined task. As with other evolutionary algorithms, a population (of computer programs) undergoes repeated cycles of selection (according to the fitness/suitability of the computer program), mutation and recombination. One particular feature of genetic programming is that the evolving individuals are optimised in terms of both their "shape" and parameters, while most other optimisation methods assume a fixed shape and optimise only the parameters. This freedom to explore the search space is particularly useful when evolving regular expressions as we have done here.

Authors

Amine Heddad  Original method development, benchmarking
Andrea Krings Analysing the predictors and predictions, eukaryotic proteome processing and web interface
Markus Brameier Method development, web interface & outreach
Bob MacCallum Project leader

Related services

Find nuclear localisation signals with PredictNLS (our server gives a handy link to this service too). Predict subcellular location with TargetP, pr PSORT II. Use TMHMM to predict transmembrane helices (and potentially rule out a nuclear location for your protein).

Further information

If you find NucPred useful, please cite this paper:
NucPred - Predicting Nuclear Localization of Proteins. Brameier M, Krings A, Maccallum RM. Bioinformatics, 2007. PubMed id: 17332022

Follow these links for the source code for NucPred and a preprint of the original NucPred methods paper. An analysis of orthologues from March 2004 may be of interest. For all enquiries, please contact Bob MacCallum.