- About
- Research
- Research Overview
- Chemical Biology and Medicinal Chemistry
- Chemical Biology and Medicinal Chemistry Overview
- Discovering Enzyme Substrates and Functions
- Discovering Protein Ligands to Probe and Alter Function
- Discovering Enzyme Activators
- Analyzing Mechanisms of Drug Resistance via Chemical Biology
- Analyzing Enzyme Conformational Dynamics, Substrate Binding, and Catalysis
- Effective Drug Targeting of Pathogens via Medicinal Chemistry
- Computational Chemistry and Biology
- Computational Chemistry and Biology Overview
- Modeling protein regulation via allostery and post-translational modifications
- Visualizing and integrating bioinformatics and biomolecular data
- Modeling membrane permeation to optimize pharmacokinetics
- Determining enzyme function by predicting substrate specificity
- Physical Biology
- Protein and Cellular Engineering
- Protein and Cellular Engineering Overview
- Monitoring enzyme activity and disease biomarkers
- Generating human proteome antibodies via phage display and directed evolution
- Globally analyzing and dissecting apoptosis
- Proximity tagging of protein-protein interactions
- Investigating cellular interactions in tissues
- Creating fluorescent probes targeting the genome and key bio-pathways
- De novo design of catalytic and membrane proteins
- Probing and modulating membrane proteins
- Education
- People
- News
- Events
Determining enzyme function by predicting substrate specificity
Examples of our research and methods include
Applying homology modeling and docking to determine substrates
Department scientists lead the computation core that is central to the Enzyme Function Initiative (EFI), a NIH-funded nine-institution effort to develop a large-scale, multidisciplinary sequence/structure-based strategy to determine the functions of unknown enzymes discovered in bacterial genome projects, partly via high-throughput prediction of their substrate specificity.
The EFI strategy is tested by selecting enzymes of unknown function that belong to one of several large and complex protein superfamilies. Superfamilies are groups of evolutionarily related enzymes that share a specific conserved chemical capability (for example, a partial reaction; that is, a single mechanistic step in catalysis or stabilizing the same type of reaction intermediate) performed by conserved active site features.

Elaine C. Meng
A depiction of mandelate racemase from Pseudomonas putida (1MDR)
Mandelate racemases, a family of enzymes within the functionally diverse enolase superfamily, catalyze interconversion of the (R-) right-handed and (S-) left-handed mirror image molecules (enantiomers) of mandelate, in a pathway for the latter’s metabolic breakdown (catabolism).
The protein backbone is shown as a ribbon colored by secondary structure (alpha helices turquoise, beta-strands purple, the rest gray). Also displayed are the side chains of functional residues (gold), a metal ion (bright green), and a ligand similar to mandelate (pink).
Yet enzymes in the same superfamily can catalyze very different overall reactions, making them “functionally diverse” and requiring further characterization. For example, enzymes in the enolase superfamily (about 25,000 known sequences, per the SFLD as of late 2013) share a metal ion in their actives sites and the chemical step of abstracting alpha-protons from carboxylic acids. But a superfamily member such as muconate lactonizing enzyme breaks down aromatic compounds for soil bacteria while its close relative, glucarate dehydratase, breaks down sugars for metabolism.
Selected enzymes are cloned, expressed, and up to 100 per year have their structures detailed via x-ray crystallography. Department-led computational efforts leverage and expand upon that structural information for high-throughput determination of function in two ways:
- Applying comparative structure (homology) modeling, using databases and software developed by scientists in the School’s Department of Bioengineering and Therapeutic Sciences effectively extend the number of determined enzyme structures.
- Employing in silico screening (docking) to rank virtual metabolite libraries, thus greatly winnowing the number of substrate candidates for in vitro testing. Such computational screening is much faster and cheaper than physical assays, casts a wider net beyond commercially available or readily synthesized compounds, and also provides details about even negative results (i.e., non-binding interactions) to further guide substrate selection.
Such in silico screening represents a new application of docking and a different challenge than seeking ligands as potential drug leads. These small molecules may merely bind to an enzyme’s active site such that they compete with and block endogenous ligands to serve as therapeutic inhibitors. But substrates must precisely orient and align their reactive, specificity-determining residues with enzyme catalytic residues.
Such substrate docking incorporates department modeling work that accounts for the interactive conformational flexibility in both ligand and active site (induced fit) as well as predicting and accounting for the role of hydrogen bonds in stabilizing enzyme-substrate complexes.
New computational tools being developed here to guide the selection and/or synthesis of candidate substrates are continuously tested and refined by comparing their results with biochemical assays, and, where possible, ligand-bound crystal structures. When possible, further testing is done for in vivo function via mutant knockout / overexpression bacteria, transcriptomics, and metabolomics.
Specific studies in this area have included
Metabolite docking to homology models
Department scientists applied virtual metabolite docking to homology models of active sites to guide the discovery of substrate specificities and biochemical function of a subset of enzymes from the enolase superfamily.
The study’s 65 target enzymes were representative of a group (more than 2600 sequences per SFLD as of June, 2014) that share key conserved active site residues and motifs indicating they epimerize dipeptides; that is, they invert the spatial orientation of atoms around the substrates’ asymmetric carbons.

Elaine C. Meng, PhD
A dipeptide epimerase from Enterococcus faecalis in complex with dipeptide L-isoleucine-L-tyrosine substrate. The protein backbone of the enzyme is shown as a tan ribbon and the active site metal ion as a yellow ball. The dipeptide and the side chains of selected active site residues are shown as sticks color-coded by element: carbons tan (enzyme) or brown (dipeptide), oxygens red, and nitrogens blue.

Elaine C. Meng, PhD
Closeup of the active site, with the dipeptide’s alpha-carbons shown as balls. Parts of the ribbon that would somewhat obscure the view have been made translucent.
Initially, there were two such characterized dipeptide epimerases from E. coli and B. subtilis found to be specific for L-Ala-D/L-Glu (AEEs). These are believed to be involved in the recycling of cell wall polymers, of which that substrate is a component.
The analysis—screening all possible dipeptides against models for dozens of related proteins—predicted an unexpected and notable diversity, including enzymes specific for hydrophobic dipeptides and a small group with specificity for positively charged dipeptides.
The findings underscored the synergistic benefit of combining computational modeling and bioinformatics sequence analysis. Predictions were investigated for some enzymes in vitro and in crystal structures, including substrate-ligand complexes, to confirm their accuracy and further detail the structural bases of the specificities.
Predicting chain building of unknown enzymes
Researchers here computationally predicted the chain-length specificity of a subgroup of the isoprenoid synthase (IS) superfamily—more than 9,000 trans-polyprenyl transferases (E-PTS) (per SFLD as of late 2013) which catalyze elongation of varied-length linear chains from 5-carbon molecule building blocks (isoprenes: isopentenyl diphosphate or IPP and dimethylallyl diphosphates or DMAPP); these serve, in turn, as “trunk” substrates for enzymes biosynthesizing the more than 55,000 known branching isoprenoid metabolites that play key roles in cells from all domains of life.

Proc Natl Acad Sci U S A. Mar 26, 2013; 110(13): E1196–E1202.
A depiction of the structure of an E-PTS enzyme (GGPP synthase, a drug target in certain malarial parasites) determined by x-ray crystallography with aspartartic acid-rich motifs. In its elongation cavity, active sites S1 and S2 bind 5-carbon building blocks (IPP, DMAPP), which link in an isoprenoid chain after cleavage of DMAPP’s diphosphate.
Department scientists and EFI colleagues used bioinformatics analysis to choose E-PTSs maximally distant in sequence from those with solved structures and functional annotations. Researchers generated new structures and homology models of 74 enzymes for docking that evaluated the steric complementarity of polyprenyl products and the elongation cavities.
This approach predicted chain-length specificities accurate upon biochemical verification to within one isoprene unit 94 percent of the time—and the enzymes’ products often vary as much. Such modeling might thus predict function of the complete E-PTS subgroup and also seems potentially corrective of substantial numbers of automated misannotations that relied on sequence similarity.