Yang Shen

[ Home | Research | Publications | CAPRI | CV | Links ]

Research Interests

    Optimization and machine learning algorithms for structural bioinformatics, structural systems biology, and biomedical and health informatics. Some applications:

  • Protein or ligand design for binding affinity and specificity desired
  • Structural prediction of protein interactions
  • Systems biology, in particular, systems pharmacology
  • Synthetic biology, with applications in energy and therapeutics
  • Bioinformatics and big data
I am actively seeking experimental and computational collaborations.  My main motivation is to unravel molecular mechanisms and to modulate emergent behavior of biomolecular networks with the development and application of computational tools (including molecular modeling, network simulation, optimization, machine learning, graph theory, and systems and control theory).  To that end, I aim at an iterative process that models and experiments can feed each other.

Current Projects

Dimension Reduction and Optimization Methods for Flexible Refinement of Protein Docking (NSF CCF-1546278)

Past Projects

Design for Promiscuous Inhibitors

Keywords: Structure-based drug design, binding affinity and specificity, drug resistance, combinatorial optimization, machine learning

In this study we aim to understand molecular mechanisms by which small molecules can exhibit binding promiscuity and to develop design strategies to implement such promiscuity.  We chose to study inhibition of HIV-1 protease because robust HIV-1 protease inhibition in the face of an evolving viral population remains a tremendous challenge, yet significant structural, inhibitory, and evolutionary data exist upon which to base our investigation.  Using computational design we constructed small-molecule inhibitors targeting a set of wild-type and drug-resistant mutant HIV-1 proteases.  The resulting inhibitor library contained rich diversity in term of chemical structures and binding specificity profiles.  Subsequent statistical analysis of this large library of inhibitors revealed significant trends for promiscuous inhibitors targeting multiple variants of HIV-1 protease.  Accordingly, we came up with potential design strategies for promiscuous HIV-1 protease inhibitors.

Charge Optimization Theory for Flexible Ligands

Keywords: lead optimization, electrostatic optimization, conformational flexibility, continuum electrostatics, linear response theory, linearized Poisson–Boltzmann equation

The design of ligands with high affinity and specificity remains a fundamental challenge in understanding molecular recognition and developing therapeutic interventions.  Charge optimization theory addresses this problem by determining ligand charge distributions producing the most favorable electrostatic contribution to the binding free energy.  The  theory has been applied to design of binding specificity as well.  However, the  formulations described only treat a rigid ligand. Here we extend the theory to treat  flexible ligands.  We develop a thermodynamic pathway analysis for binding contributions relevant to the theory, and we illustrate application of the theory using HIV-1 protease with our previously designed and validated subnanomolar inhibitor.  The results show that flexible ligand binding is not an adaptation designed to enhance binding affinity, and lead to new insights into the sources of comformational changes that accompany binding.

Protein Docking
by Exploring and Exploiting the Binding Free Energy Landscape

Keywords: Structure prediction,
binding free energy landscape, binding funnel, global optimization, semi-definite programming, special Euclidean group, Riemannian manifold, exponential coordinates

protein interactions play a central role in various aspects of the structural and functional organization of the cell.  Protein docking, the computational prediction of such interactions at the atomic level, is crucial for better understanding cellular processes and provides valuable information for rational drug design.  Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state.  We develop a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3), a Riemannian manifold) while accounting for flexibility of the interface side chains.  The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling.

While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space (a manifold).  We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules that approximate $S^2 \times SO(3)$ with a Euclidean space.  This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions.  Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate good docking predictions (interface RMSD <= 5Å) while achieving at least 20-fold efficiency gain compared to Monte Carlo methods.

I have been continuously developing and applying SDU in CAPRI, a community-wide assessment of protein docking methods, with considerable success. (More details)

Metabolic Engineering

Keywords: Microbial fuel cell design, systems and synthetic biology, flux balance analysis, bi-level optimization, relaxations

Biological networks are abstract representations of biological systems that capture many essential characteristics of other networks.  Molecules in a living cell are nodes of such networks and their interactions form undirected edges or directed arcs.  Biological systems are proved to share structural principles with engineered networks.  The engineering analysis of biological networks help understand a cell's functional organization and manipulate biotransformations in these networks. Such biotransformations aim at various biochemical by-products varying from simple chemicals to even microbes. In this collaboration with synthetic biologists at BU, we develop computational methods to predict environmental conditions and genetic modulations to optimize the production efficiency / yield of metabolic by-products (electrons), based on flux balance analysis.

Binding Site Similarity of Analogous Enzymes

Keywords: Structure-function analysis, convergent evolution, binding site, fast Fourier transform

Most enzymes catalyze specific biochemical reactions of well-defined substrates.  Both the catalysis and recognition usually occur in localized cavities or binding sites.  The general goal of this study is to characterize the similarities and differences in local sequence and structure of the binding sites of enzymes that perform the same or similar chemical reactions.  The comparisons are based on the optimal structural superimpositions of the binding sites rather than the global sequence or structure similarity of the enzymes.

We study analogous enzymes that have the same enzyme commission (EC) number (or whose EC numbers differ only in the last digit), but are evolutionarily unrelated, i.e., they lack both sequence and structural similarity. Analogous enzyme pairs are relatively rare, but occur in all major classes, assumed to be the results of convergent evolution.  Research on analogous enzymes is very limited: it consists of searches for non-homologous enzymes with the same EC number and studies of specific cases of convergent evolution.  It is known that at least in a number of cases the spatial arrangement of the catalytic residues is conserved, but very little is known about the similarity of the binding sites that occur on different protein scaffolds.  In this work we use a new method developed to assess molecular similarity for the structural superimposition of enzyme binding sites.  The physicochemical properties of the cavity-flanking residues are represented by pseudocenters. Given two sets of such pseudocenters, our goal is finding the largest subset of pseudocenters in both clefts in direct correspondence with each other geometrically as well as chemically. The new method performs an exhaustive evaluation of the correlation function in the discretized 6D space of mutual orientations of the two point sets using a very efficient algorithm involving fast Fourier transforms. The method is applied to a number of analogous enzyme pairs and provides new insights into molecular mechanisms of their functions.

I was also involved in small-molecule solvent mapping study and its applications to hot spot prediction and fragment-based drug design.