ABSTRACT
Frequencies of amino acid strings in globular protien sequences indicate suppression of blocks of consecutive hydrophobic residues.
Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long predominantly hydrophobic strings of 20-22 amino acids each are associated with transmembrane helices and have been used to identify such sequences (von Heijne, 1994). Much less attention has been paid to hydrophobic sequence within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation (Istrail et al., 1999), we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content. We report on an analysis of the frequencies of different lengths of contiguous blocks of hydrophobic residues in a database of amino acid sequences of proteins of known structure. Sequences of three or more consecutive hydrophobic residues are found to be significantly less common in actual globular proteins than would be predicted if residues were selected independently. The result may reflect selection against long blocks of hydrophobic residues within globular proteins relative to what would be expected if residue hydrophobicities were independent of those of nearby residues in the sequence.