ABSTRACT
Bradley, P., Cowen, L., Menke, M., King, J. and Berger, B. (2001) PNAS, in press.
BetaWrap: Successful prediction of parallel ß-helices from primary sequence reveals an association with many microbial pathogens
The amino acid sequence rules that specify ß-sheet structure in proteins remain obscure. A subclass of ß-sheet proteins, parallel ß-helices, represent a processive folding of the chain into an elongated topologically simpler fold than globular ß-sheets. In this paper, we present a computational approach that predicts the right-handed parallel ß-helix super-secondary structural motif in primary amino acid sequences by using ß-strand interactions learned from non-ß-helix structures. A program called BetaWrap (http://theory.lcs.mit.edu/betawrap) implements this method and recognizes each of the seven known SCOP parallel ß-helix families, when trained on the known parallel ß-helices from outside the family. BetaWrap identifies 2448 sequences among 595,890 screened from the NCBI nonredundant protein database as likely parallel ß-helices. It identifies surprisingly many bacterial and fungal protein sequences that play a role in human infectious disease; these include toxins, virulence factors, adhesins, and surface proteins of Chlamydia, Helicobacteria, Bordetella, Leishmania, Borrelia, Richettsia, and Neisseria. Also unexpected was the rarity of the parallel ß-helix fold and its predicted sequences among higher eukaryotes. The computational method introduced here can be called a 3D dynamic profile method because it generates inter-strand pairwise correlations from a pocessive sequence wrap. Such methods may be applicable to recognizing other beta structures for which strand topology and profiles of residue accessibility are well conserved.