Prediction of Protein Secondary Structure Using GP


June - July, 2003
Summer Internship, Stockholm Bioinformatics Center, Sweden
Under Dr. R. M. MacCallum

In this project we attempted to use Genetic Programming (GP) to predict the secondary structure of protein. Genetic Programming is basically a search/optimization algorithm, which attracted global attention through the book “Genetic Programming: On the Programming of Computers by Means of Natural Selection”, by Prof. John Koza. GP is a class of algorithms that try to ‘evolve’ programs to solve a given problem. It primarily requires the ideal input and output dataset to calculate the performance of the evolving program. A typical GP would begin with a random population of programs and try to search for a solution using principles of natural evolution, such as ‘survival of the fittest’, crossovers, mutations, etc. GP is generally sensitive to parameter variation and there is no guarantee that GP would be able to solve a given problem. (However, there have been numerous publications, where GP has been able to solve NP-Complete problem, regression problems, circuit design problems, etc.)

We used the multi-aligned amino acid sequence as the input data to the GP and wanted as output the correct secondary structure of the protein. Since the input data set is highly dimensional, it becomes difficult for the GP to use it effectively. Therefore, we decided to use Self Organizing Maps (SOM) to transform the initial higher dimensional data (the multi-aligned amino acid sequence) to lower dimensional data (positions in the SOM). SOM is a neural network tool, which uses unsupervised (competitive) learning to classify data into positions in a map. One can choose the dimensions of the map (analogous to number of classification parameters) and size of the map (analogous to total number of different classes). We used GP and some simpler statistical techniques to find the optimal mapping between SOM co-ordinates and the three
secondary structure states.

A major part of the study is completed, which gave encouraging and interesting results. The problem is still under investigation for further improvements.


[Back to Griha]