Determining enzyme function by predicting substrate specificity
Examples of our research and methods include
Applying homology modeling and docking to determine substrates
Department scientists lead the computation core that is central to the Enzyme Function Initiative (EFI), a NIH-funded nine-institution effort to develop a large-scale, multidisciplinary sequence/structure-based strategy to determine the functions of unknown enzymes discovered in bacterial genome projects, partly via high-throughput prediction of their substrate specificity.
The EFI strategy is tested by selecting enzymes of unknown function that belong to one of several large and complex protein superfamilies. Superfamilies are groups of evolutionarily related enzymes that share a specific conserved chemical capability (for example, a partial reaction; that is, a single mechanistic step in catalysis or stabilizing the same type of reaction intermediate) performed by conserved active site features.
Yet enzymes in the same superfamily can catalyze very different overall reactions, making them “functionally diverse” and requiring further characterization. For example, enzymes in the enolase superfamily (about 25,000 known sequences, per the SFLD as of late 2013) share a metal ion in their actives sites and the chemical step of abstracting alpha-protons from carboxylic acids. But a superfamily member such as muconate lactonizing enzyme breaks down aromatic compounds for soil bacteria while its close relative, glucarate dehydratase, breaks down sugars for metabolism.
Selected enzymes are cloned, expressed, and up to 100 per year have their structures detailed via x-ray crystallography. Department-led computational efforts leverage and expand upon that structural information for high-throughput determination of function in two ways:
- Applying comparative structure (homology) modeling, using databases and software developed by scientists in the School’s Department of Bioengineering and Therapeutic Sciences effectively extend the number of determined enzyme structures.
- Employing in silico screening (docking) to rank virtual metabolite libraries, thus greatly winnowing the number of substrate candidates for in vitro testing. Such computational screening is much faster and cheaper than physical assays, casts a wider net beyond commercially available or readily synthesized compounds, and also provides detailed information about even negative results (i.e., non-binding interactions) to further guide substrate selection.
Such in silico screening represents a new application of docking and a different challenge than seeking ligands as potential drug leads. These small molecules may merely bind to an enzyme’s active site such that they compete with and block endogenous ligands to serve as therapeutic inhibitors. But substrates must precisely orient and align their reactive, specificity-determining residues with enzyme catalytic residues.
Such substrate docking incorporates department modeling work that accounts for the interactive conformational flexibility in both ligand and active site (induced fit) as well as predicting and accounting for the role of hydrogen bonds in stabilizing enzyme-substrate complexes.
New computational tools being developed here to guide the selection and/or synthesis of candidate substrates are continuously tested and refined by comparing their results with biochemical assays, and, where possible, ligand-bound crystal structures. When possible, further testing is done for in vivo function via mutant knockout / overexpression bacteria, transcriptomics, and metabolomics.
Specific studies in this area have included
Metabolite docking to homology models
Department scientists applied virtual metabolite docking to homology models of active sites to guide the discovery of substrate specificities and biochemical function of a subset of enzymes from the enolase superfamily.
The study’s 65 target enzymes were representative of a group (more than 2600 sequences per SFLD as of June, 2014) that share key conserved active site residues and motifs indicating they epimerize dipeptides; that is, they invert the spatial orientation of atoms around the substrates’ asymmetric carbons.
Initially, there were two such characterized dipeptide epimerases from E. coli and B. subtilis found to be specific for L-Ala-D/L-Glu (AEEs). These are believed to be involved in the recycling of cell wall polymers, of which that substrate is a component.
The analysis—screening all possible dipeptides against models for dozens of related proteins—predicted an unexpected and notable diversity, including enzymes specific for hydrophobic dipeptides and a small group with specificity for positively charged dipeptides.
The findings underscored the synergistic benefit of combining computational modeling and bioinformatics sequence analysis. Predictions were investigated for some enzymes in vitro and in crystal structures, including substrate-ligand complexes, to confirm their accuracy and further detail the structural bases of the specificities.
Predicting chain building of unknown enzymes
Researchers here computationally predicted the chain-length specificity of a subgroup of the isoprenoid synthase (IS) superfamily—more than 9,000 trans-polyprenyl transferases (E-PTS) (per SFLD as of late 2013) which catalyze elongation of varied-length linear chains from 5-carbon molecule building blocks (isoprenes: isopentenyl diphosphate or IPP and dimethylallyl diphosphates or DMAPP); these serve, in turn, as “trunk” substrates for enzymes biosynthesizing the more than 55,000 known branching isoprenoid metabolites that play key roles in cells from all domains of life.
Department scientists and EFI colleagues used bioinformatics analysis to choose E-PTSs maximally distant in sequence from those with solved structures and functional annotations. Researchers generated new structures and homology models of 74 enzymes for docking that evaluated the steric complementarity of polyprenyl products and the elongation cavities.
This approach predicted chain-length specificities accurate upon biochemical verification to within one isoprene unit 94 percent of the time—and the enzymes’ products often vary as much. Such modeling might thus predict function of the complete E-PTS subgroup and also seems potentially corrective of substantial numbers of automated misannotations that relied on sequence similarity.