- About
- Research
- Research Overview
- Chemical Biology and Medicinal Chemistry
- Chemical Biology and Medicinal Chemistry Overview
- Discovering Enzyme Substrates and Functions
- Discovering Protein Ligands to Probe and Alter Function
- Discovering Enzyme Activators
- Analyzing Mechanisms of Drug Resistance via Chemical Biology
- Analyzing Enzyme Conformational Dynamics, Substrate Binding, and Catalysis
- Effective Drug Targeting of Pathogens via Medicinal Chemistry
- Computational Chemistry and Biology
- Computational Chemistry and Biology Overview
- Modeling protein regulation via allostery and post-translational modifications
- Visualizing and integrating bioinformatics and biomolecular data
- Modeling membrane permeation to optimize pharmacokinetics
- Determining enzyme function by predicting substrate specificity
- Physical Biology
- Protein and Cellular Engineering
- Protein and Cellular Engineering Overview
- Monitoring enzyme activity and disease biomarkers
- Generating human proteome antibodies via phage display and directed evolution
- Globally analyzing and dissecting apoptosis
- Proximity tagging of protein-protein interactions
- Investigating cellular interactions in tissues
- Creating fluorescent probes targeting the genome and key bio-pathways
- De novo design of catalytic and membrane proteins
- Probing and modulating membrane proteins
- Education
- People
- News
- Events
Visualizing and integrating bioinformatics and biomolecular data
Examples of our research, methods, and resources include
UCSF Resource for Biocomputing, Visualization and Informatics
The department is home to the UCSF Resource for Biocomputing, Visualization and Informatics (RBVI), which develops software and web-based resources for the visualization of molecular structures—from atomic-level details to large interacting complexes of molecules—drawn from multiple data sources. The Resource also creates computational tools to help visually map molecular interactions in biological pathways and systems, as well as to organize and analyze biological data to find meaningful similarities and correlations between protein sequences, structures, and functions.

The Chimera app ViewDock aids the interactive screening of compounds from the outputs of molecular docking programs. UCSF DOCK was the first of such software, developed by department scientists in the early 1980s and revised many times since. The programs virtually screen small molecules (ligands) for the relative strength (affinity) with which they bind to protein active sites—potentially altering their activity therapeutically.
The target structure shown here is H-Ras, a protein often mutated in human cancers. Selecting a potential ligand name in the dialog displays the molecule. The docked molecule pictured is ribose monophosphate. (Potential hydrogen bonds are shown as yellow lines.)
Established in 1970, the RBVI pioneered in the field of molecular graphics and is the nation’s oldest Biomedical Technology Research Resource (BTRR). The National Institutes of Health (NIH)-funded BTRC program supports the development of broadly applicable, dynamically evolving enabling technologies—in this case biomolecular / bioinformatics visualization—as compared to more typical NIH grants that fund more narrowly defined research projects. The program requires numerous collaborative test-bed projects, as well as user training and wide dissemination of the advanced methodologies to the scientific community.
UCSF Chimera
Integrating multi-source/scale data to visualize molecular complexes
The RBVI continually develops, updates, and provides support for UCSF Chimera, a widely used, highly extensible program for interactive 3-D visualization of macromolecular structures and related data. The package provides more than 110 tools for the interactive analysis of atomic-level models, density maps, and protein sequences.
Chimera can fetch molecular structures, sequences, and density maps from web-linked databases, then allow users to do structural analyses such as measuring distances and angles; identifying hydrogen bonds and contacts; using coloring or other stylizations to highlight properties such as sequence conservation and electrostatic potential. It also provides for modifying or building atomic-level structures, comparative (homology) modeling of protein structures, and fitting of atomic-level structures into lower-resolution data such as electron microscopy density maps of large assemblies.
Some calculations are performed directly within the program, whereas others make use of web services provided by the Resource. The graphical scenes can be manipulated interactively in 3-D, with many options for labeling, display style, and visual effects (e.g., shadows) to enhance clarity.

Chimera allows properties of molecules such as electrostatic potential to be visualized with coloring. A highly negatively charged small molecule, inositol hexakisphosphate (depicted in stick mode) binds to a protease enzyme domain, which has its molecular surface colored red to indicate electrostatic potential: red for negative charge, white for neutral, and blue for positive.
This animation shows the binding interface between porcine pancreatic trypsin (left) and a trypsin inhibitor from soybean (right). In the open position, each protein is rotated 90 degrees outward from the closed (bound) position. This video does not include audio.
The Resource-developed software also provides simple graphical interfaces for animating molecular dynamics simulations of atomic-level interactions within and between molecules over time based on different input trajectories, plus interactive morphing of proteins between conformations.
Most proteins operate as part of larger molecular assemblies. Thus Chimera can generate interactive depictions of complexes and cellular machinery comprised of dozens of interacting molecules, such as ribosomes, microtubules, proteasomes, transmembrane channels, virus capsids, etc. by importing and fitting multi-scale data (including experimental and theoretical models).

As depicted using Chimera, the shell (or capsid) of a hepatitis B virus, about 420 angstroms in diameter, is an assembly of many separate proteins. Each colored blob is a separate protein, with multiple copies of the same protein shown in the same color. A copy of each type of protein is shown in ribbon format near the center.
Chimera is distinguished by interface design and detailed documentation that allows for the provision or end-user coding of app extension features tailored to specific needs.
Since its initial release 21 versions ago in 2004, Chimera has been referenced in more than 6,000 journal articles and downloaded by more than 370,000 users as of early 2014.
Examples of key features developed for UCSF Chimera by RBVI staff and collaborators include:
- Reading more molecular data formats than any other program. Users can readily visualize atomic-level structure data from dozens of databases, such as the nearly 100,000 experimentally determined structures (per early 2014) in the Protein Data Bank to the nearly five million unique sequences homology modeled in ModBase, a database of comparative protein structure models determined by the program Modeller, developed by colleagues in the Department of Bioengineering and Therapeutic Sciences.
- Importing and visualizing electron and other 3D microscopy data (density maps) from multiple formats, with a broad selection of interactive tools for their analysis. Such electron micrographs capture larger biological complexes such as virus particles, ribosomes, and microtubules in which large numbers of proteins interact—albeit at coarser resolutions. Chimera tools enable interactive segmentation to sort out substructures, fit atomic-level components into density maps, measure lengths and volumes to suggest potential molecular components for fitting, and automated coloring of surfaces to indicate probable locations of various types of macromolecules.

Chimera visualization shows electron tomography of a human immune system T-cell attacking another cell. Vesicles (blue) containing serine protease enzymes (which destroy cells by lysis, rupturing their cell membranes) are shown being transported to the adjacent cell membranes (in orange) along microtubules (red) to kill the target cell. The experimental data is from a 2006 Nature study by University of Oxford researchers testing a theory that killer T-cell’s use centriole organelles (yellow) to organize microtubules that are used to tow lytic vesicles to the interface with the target cell. The data did not provide clear support for this theory and instead reported a different mechanism for the delivery of such secreted vesicles to the cell membrane.
- Direct interfacing with web services (some hosted by RBVI) that allow sequence-structure data retrieval and computational services (e.g., BLAST searches of sequence similarity, sequence alignment, homology modeling, docking calculations, etc.).
- Performing combinatorial multi-scale modeling and visualization of large molecular assembles, fitting molecules into density maps of larger complexes either directly in Chimera or via web services. The latter’s calculations are launched automatically, with the results returned to Chimera for interactive adjustment and analysis. This work may tap a diverse set of the software’s features and web services including:
- multiple simultaneous fitting of atomic-level structures into EM density maps by the MultiFit module of the IMP integrative modeling platform
- small angle X-ray scattering (SAXS) profiles of atomic-level structures calculated by FoXS, another IMP module, to model flexible conformations and those in solution (i.e., native conditions)
- graphical user interfaces to simplify set up of input data/parameters, evaluate results, and perform iterative refinements
- evaluation of side chain conformations from backbone-dependent and backbone independent-rotamer libraries
- “peel-back” animation to explore and convey interrelations between layers in multi-scale models of larger complexes such as muscle fibers
- any of the tools for general molecular analysis, including identification of hydrogen bonds and contacts, structure comparison by superposition and morphing, and coloring to show properties like sequence conservation
An animation made with UCSF Chimera provides structural analysis of an electron microscopy density map of the thick filaments that perform muscle contractions. These filaments are made up of complexes of myosin protein molecules. The animation shows that (in order):
- The structure has rotational symmetry (or uniformity) if turned 90 degrees.
- It also has translational symmetry in 43.5 nanometer sections.
- It is a four-strand helix, with the myosin molecules’ J-shaped “heads” on the surface and “tails” comprising the muscle filament body.
- Atomic-level details of myosin molecule heads can be hand-placed (by computer mouse) in the larger assembly, then optimized and replicated by the software.
- The assembly’s 12 subfilaments can be dissected and rotated by the software.
This video does not include audio.
Source: Data from John Woodhead, PhD, and Roger Craig, PhD, both of University of Massachusetts Medical School.
Cytoscape
Using network visualizations to merge systems and structural biology
Cytoscape is the most commonly used open source network visualization program. It is routinely applied in proteomics to map biological systems and protein interactions in metabolic, signaling, and regulatory pathways within and between cells. The RBVI, along with UCSF colleagues in the Bioinformatics Core of the Gladstone Institutes, is one of seven institutions providing core Cytoscape development for that purpose.
An RBVI focus is using Cytoscape apps to bridge the complementary data sets of systems and structural/molecular biology. This approach reflects the increasing overlap between the two: Systems approaches are becoming more granular, posing hypotheses about the interactions of individual proteins or the roles of specific metabolites in a pathway. Meanwhile, molecular and structural biologists increasingly investigate the impact of regulatory pathways on the transcription of proteins as well as how larger complexes of proteins work together to perform biological functions.
Examples of the 22 Cytoscape apps that RBVI has developed and supports include:
- structureViz, which allows Cytoscape users to select nodes representing proteins in a given biological network and interactively display and analyze the detailed 3-D structures associated with those proteins in UCSF Chimera.
Cytoscape users can also use Chimera sequence-structure analysis tools, such as Matchmaker (which superimposes structures) and Multalign Viewer (which displays sequence alignments and automatically associates them with structures). Thus functional residues and positions of conservation or divergence in the sequence alignment are easily mapped onto structures for further analyses. This also allows researchers to explore the possible structural implications of neighboring proteins in a pathway.A screen capture of structureViz in action. A sequence similarity network (SSN) of the phosphotriesterase family of enzymes is shown in Cytoscape. Once the user has selected some protein nodes of interest in Cytoscape, structureViz is used to open the corresponding structures in Chimera and spatially align them. The three structures in the image are colored white, cyan, and magenta, with selected parts outlined in green.
To demonstrate the potential application of structureViz and associated tools, RBVI researchers examined a metabolic enzyme called isocitrate dehydrogenase (IDH1) that is mutated in glioblastoma mutiforme, the most common and aggressive brain tumor in humans.
The Chimera MatchMaker tool superimposed the structures of the IDH1 wild-type and mutant forms to visually assess changes in its active site. The Chimera FindHBond tool (below) revealed that the mutant structure lost a hydrogen bond to its usual substrate. This structural change yields an altered product, 2-hydroxyglutarate (2HG), thought to be cancer-promoting (oncometabolite).The wild-type IDH1 enzyme with the substrate isocitrate, cofactor NADP (both dark green) and calcium ion (purple) bound in active site. Dashed red lines show H-bonds, identified with Chimera FindHBond tool, between its Arginine 132 residue and isocitrate. The inset shows the mutated protein, in which residue 132 is a histidine, with ligands modeled into the structure via MatchMaker superposition. As assessed by FindHBond, the histidine side-chain is too far from substrate to H-bond with it resulting in an altered product.
By using Cytoscape to view a network of proteins organized by their ligand-binding specificity, the researchers found five proteins known to bind to small molecules similar to the 2HG oncometabolite. (Another RBVI-developed app, chemViz, was used to inspect the chemical properties of the ligands.) Those proteins included glutamyl aminopeptidase, an enzyme previously implicated in regulating brain tumor-associated blood vessels - clusterMaker unifies a wide variety of clustering techniques (algorithms) and visualization styles in a single interface for recognizing biologically meaningful patterns in large data sets and for confirming or generating hypotheses about biological function.
For example, the app facilitates combined analyses of potentially complementary data sets from different types of experiments (i.e., yeast two-hybrid screening, high-throughput mass spectrometry protein complex identification) in order to more accurately identify stable complexes from clustered protein-protein interactions.
It also notably interconnects cluster and network analysis of multiple types of biological data, including expression, genetic interaction, and physical interaction. For example, combining purported protein complex findings with data on gene expression in response to particular stimuli can suggest regulatory roles for particular proteins in a complex.
SFLD: Using bioinformatics and visualization to classify enzyme function
The RBVI hosts and develops infrastructure for the Structure-Function Linkage Database (SFLD) in collaboration with colleagues in the Department of Bioengineering and Therapeutic Sciences (BTS).
The SFLD addresses the exponentially growing gap between the tens of millions of protein amino acid sequences discovered via genomics and the accurate knowledge of these biomolecules’ functions. The database focuses on enzymes, which catalyze the chemical reactions essential to life and are thus key therapeutic targets.
There is no simple way of correlating enzymes’ primary structures (chains of hundreds or thousands of amino acids) with their functions. Indeed, many enzymes are assigned the wrong function (misannotated) based solely on their overall sequence similarity to others with known functions. However, certain similarities in their composition can indeed reveal vital clues toward functional classification. These similarities reveal evolutionary relatedness as well as amino acid configurations (sequence motifs) experimentally shown to carry out specific chemical functions.
The SFLD is a hierarchical classification resource that describes sequence-structure-function relationships within functionally diverse enzyme superfamilies. The members of such a superfamily can catalyze very different overall reactions, but share a common ancestor and an aspect of chemical function, such as a partial reaction, carried out by a conserved set of active site residues. Superfamilies are further subdivided into families, sets of enzymes that catalyze the same overall reaction.

Web page for the dipeptide epimerase family in the Structure-Function Linkage Database (SFLD).
The SFLD’s core focus is a dozen functionally diverse superfamilies (comprising about 365,000 enzymes as of early 2014) manually curated by BTS scientists such that they are reliably annotated (with functional evidence coding) and can serve as a “gold standard” for developing and evaluating the more automated methods that are ultimately needed.
The RBVI implements and maintains SFLD searchability by superfamily, reaction, or enzyme, and its hierarchy of specific sequence similarity and associated functional aspects (superfamily, then subgroups with more shared features, then families that catalyze the same overall reaction). The Resource also provides crucial tools and web services for comparing unknown enzyme sequences with those in the SFLD core dataset such as:
- Searching for SFLD sequences similar to the query using BLAST and/or comparison to Hidden Markov Models (HMMs) representing SFLD families, subgroups, and superfamilies. The unknown sequence can be added to the pre-existing alignments for the HMM hits of interest.
- Chimera mapping of sequence motifs (via the Multalign Viewer tool) in alignments from the SFLD onto structures, and calculation of measures of sequence conservation (entropy, variability, etc.) that can be shown as histograms above the alignment.
- Comparative (homology) modeling of unknown structures using the Modeller program developed by BTS colleagues, launched from a Multalign Viewer window containing an alignment of the template (known structure) sequence with the target (unknown structure) sequence.
- Chimera visualization of active site properties including shape, size, hydrophobicity, and electrostatic potential to help infer possible ligands and thus guide the selection of virtual substrates for docking.
Case study: Combining sfld data and tools with chimera to correct enzyme misannotation

In this Cytoscape-visualized sequence similarity network (SSN) the unknown protein sequence from M. capsulatus (yellow rectangle) clusters more with the dipeptide epimerases (light green) than with the chloromuconate cycloisomerases (pink), or several other family subsets of the enolase superfamily.
Color | Indicates |
---|---|
unknown | |
dipeptide epimerase | |
chloromuconate cycloisomerase | |
muconate cycloisomerase (syn) | |
muconate cycloisomerase (anti) | |
N-succinylamino acid racemase 2 | |
o-succinylbenzoate synthase | |
unclassified |
The sequence was annotated in two major protein databases as a supposed chloromuconate cycloisomerase. That family is a subset of the SFLD’s enolase superfamily, all of which carry out the partial reaction of removing a proton from a carbon adjacent to a carboxylic acid. In addition, they all contain specific conserved active site residues that bind a divalent metal ion which, in turn, stabilizes the reaction intermediate.
However, incorporation of this sequence into a sequence similarity network (SSN) for analysis in Cytoscape revealed that it clusters more closely with the dipeptide epimerases, a different family within the enolase superfamily. (In the SFLD, families are synonymous with the catalyzed reaction. Thus, beyond their shared partial reaction, dipeptide epimerases catalyze the structural inversion of dual amino acid peptides, while chloromuconate cycloisomerases open or close the ring structures of chlorinated muconates.)
While the SSNs give a broad visual perspective on how sequences compare, further analysis of the M. capsulatus sequence using the SFLD’s Hidden Markov Models (HMMs) confirmed its most statistically significant sequence similarity to be with dipeptide epimerase family.
As expected, the query sequence shares certain conserved sequence patterns characteristic of the enolase superfamily. However, aligning the query to SFLD sequences and displaying the results in Chimera’s Multalign Viewer reveals that the query sequence also contains a DXD motif characteristic of the dipeptide epimerase family.

Chimera depiction of the active site of a representative dipeptide epimerase from the SFLD, with the Ala-Glu substrate and the side chains of several active site residues shown as sticks. The family-conserved Asp-X-Asp (DXD) residues are selected in both the structure (green outlines) and sequence alignment. These Asp residues interact with the substrate. Three superfamily-conserved residues and the substrate bind the metal ion (light green sphere). Conservation is shown as a histogram above the sequences in the alignment.
Chimera includes an interface to comparative (homology) modeling with the Modeller program, developed by BTS colleagues and run on an RBVI web service. The homology modeling can be launched given at least one known structure to use as a template and a sequence alignment containing the sequences of both the target (query) and the template. (The structure and alignment in the figure above are from the SFLD.)
The active sites of the resulting homology models can be examined to infer substrate specificity. For example, Chimera interactive tools can be used to visualize active site pocket volumes by representing the van der Waals radii of atoms as solid surfaces, and to color the surfaces to indicate electrostatic potential, hydrophobicity, and other properties.

Chimera 3-D visualization of Modeller results. The experimentally determined dipeptide epimerase structure used as a template (beige) and the five models of the target (query) sequence (blue, magenta, etc.) have been superimposed with MatchMaker.

Active site pocket surfaces show query enzyme has about a 50 percent greater pocket volume and is more predominantly negatively charged (red), thus suggesting dipeptide substrates of larger size with positively charged side chains, as was experimentally confirmed.