Proteomics, post-translational modifications, and epigenetics

There are four research areas in the Department of Pharmaceutical Chemistry. Proteomics, post-translational modifications, and epigenetics is a research challenge within physical biology.

The challenge

The human genome has about 20,000 protein-encoding genes. But our bodies can make hundreds of thousands of different proteins thanks to the splicing and rearrangements of the messenger RNAs that carry genetically expressed messages to the ribosome protein factories, but especially due to subsequent chemical modifications forming new physiologically active isoforms.

While our inherited genome is fixed and present in all our cells, the types and quantities of proteins expressed will vary greatly depending on cell type (a neuron vs. blood cell), tissue (lung vs. eye), particular stages of development or cell replication, and environmental inputs and stresses.

In addition, after RNA messages are translated, proteins are routinely modified (post-translational modifications or PTMs) in specific places with small molecular moieties by enzymes. For example, proteins are cleaved, or joined to chemical groups or carbohydrates (sometimes at multiple locations). These PTMs can activate, inactivate, or otherwise alter function, allowing cells to rapidly respond to stresses and stimuli. Multiple modifications to the same protein can yield crosstalk—modulated effects—while modifications to DNA and to histone proteins that package DNA can alter gene expression itself.

Since protein molecules carry out the activities of life—and when harboring defects can fail to function properly causing disease—determining the proteome, that is, the complete set of proteins in a given cell, or tissue or our bodies, under particular conditions, their locations, relative quantities, and the type/extent of their modifications, would yield new ways to detect and decipher vital biological processes. This includes their relative quantities, and the type and extent of their modifications.

Proteomics provides the dynamic molecular underpinnings of health and disease. Thus it can potentially reveal and detect biomarkers—indicators to diagnose and determine sub-types of disease and stage at the molecular level, as well as to more rapidly detect if a given therapy is proving effective in an experimental or clinical setting.

Examples of our research, methods, and resources include

National bio-organic biomedical mass spectrometry resource at UCSF

Mass spectrometry (MS) has been developed into a primary technology for detecting, characterizing, quantifying, and deciphering the complex biomolecular stew of the cellular, tissue, or organism proteome—identifying different proteins ranging greatly in concentrations (by up to dozens of orders of magnitude), including those of infinitesimally low abundance but potentially great biological significance, and determining the extent and location of post-translational modifications (PTMs), including those on histones and DNA (epigenetics).

The department is home to the National Bio-Organic Biomedical Mass Spectrometry Resource, one of five such facilities nationwide funded by the National Institutes of Health (NIH). The resource here is a world leader in proteomic analysis, with a particular focus on post-translational modifications, including regulatory cross-talk between different PTMs and epigenetics such as histone modifications.

Burlingame mass spec

Mass spectrometry facility at UCSF’s Mission Bay campus.

Located at UCSF since 1978, the facility is now home to a dozen mass spectrometers and the applied expertise of a staff of 20. As part of the NIH’s Biomedical Technology Research Resource (BTRR) program, the facility seeks to extend the capabilities of its enabling technology by applying it to the needs of leading edge biomedical test bed projects—as many as 75 collaborations at any given time, at least half of them outside of UCSF.

Put simply, MS vaporizes and adds electrical charges to (ionizes) molecules in order to identify their chemical compositions and quantities based on spectra of their mass-to-charge ratios. State-of-the-art MS technology, combined with bioinformatics, can thus rapidly identify and differentiate among the spectra of myriad proteins.

Fractionation and enrichment

Before undergoing such analysis, proteins in complex mixtures might undergo a first-order sorting from cells and tissues by techniques such as 2-D gel electrophoresis, which places proteins in a gel and applies an electrical field in two directions to separate them by size and charge. However, usually separations are more effectively carried out at the peptide level, and complexity is reduced by a variety of separation techniques. For example, particular proteins or peptides may be separated and concentrated (enriched) for analysis by passing the sample through a fixed medium containing antigens, metal ions, affinity tags such as biotin (which bind to avidin for extraction), or lectins (proteins that bind sugars). Final analyses are achieved through high performance liquid chromatography (HPLC) on-line to various mass spectrometers. This process pumps samples through thin columns (microns in diameter) of porous polymer (monolithic) to separate molecular constituents for sequential mass analysis.

Quantitative MS

To determine the relative quantities of proteins, a label-free approach compares spectra from different samples (e.g., contents of healthy, diseased, and/or drug-treated cells). Alternatively, isotope labels of known mass or other affinity tags are used to determine relative or absolute quantities between samples, or of a targeted peptide in a particular sample.

Top-Down & Bottom-Up MS

The resource applies both bottom-up and top-down proteomics: In the former, proteins are first cleaved into pieces by enzymes, and broken into still smaller peptides by colliding them with an inert gas (collision-induced dissociation). Successive (tandem) MS analysis generates so-called peptide sequence ion series, which are compared with large databases of known protein sequences (using algorithms such as the resource’s Protein Prospector), akin to identifying a poem from its compiled syllables. If necessary, rare proteins or modifications can be sequenced de novo.

wikipedia

An illustrated example of a mass spectrometry protocol in which proteins from cells or tissue are sorted by gel electrophoresis (1DE), digested by enzymes into their constituent peptides, further sorted and enriched by chromatography, then charged and rendered airborne via a process called electrospray ionization. Here, the peptides are further broken down by collision with collision with an inert gas, the mass spectrometry spectrums represent the results of this analysis: the distribution of ions in the sample by their mass-to-charge ratio.

In top-down proteomics, MS instrumentation able to analyze at higher mass ranges with high accuracy mass-to-charge measurement (resolution) is employed on intact proteins or large fragments of proteins (10,000 to 100, 000 daltons). Sequence ion series are generated by electron energy deposition. As compared to collision-induced dissociation, such electron-transfer dissociation preserves smaller and labile post-translational modifications and their locations for identification, akin to disassembling a car to see what options were added to the basic model.

Improving ms methodology: sample preparation, bioinformatics, and cross-linking reagents

Examples of how the resource continuously advances sample preparation techniques, web-accessible bioinformatics software, and reagents include:

  • Examples of how the resource continuously advances sample preparation techniques, web-accessible bioinformatics software, and reagents include:

  • Optimizing chromatography so detection and peptide sequencing can be done routinely at that the low femtomole level (quadrillionth of a mole) for some components. This is especially vital for the identification and analysis of biologically active molecules in rare and minute samples, from exotic venoms to stem cells. Such detection can be done at the sub-femtomole level (trillionth of a mole) for phosphopeptides—products of one of the most common and important post-translational modifications, the addition of phosphate groups by kinase enzymes, often used to switch proteins on and off in biological pathways.
  • Iteratively improving software automating the assignment of peptide mass spectra for faster and more accurate identification of proteins in experimental samples.
  • mass spec burlingame

    Scientists view MS spectrum at the UCSF mass spectrometry facility.

    Synthesizing new reactive compounds (reagents) and pioneering new methods to extend the use of the chemical crosslinking of proteins—that is, covalently binding either intra- or inter-molecular non-covalent or transient interactions to allow for low-resolution structure analysis of molecules or complexes via mass spectrometry.
  • Maintaining and developing Protein Prospector, the resource’s web-based search engine and suite of bioinformatics software for analyzing mass spectrometry data from a wide range of instruments. (The site is used for over a million searches a year.) Protein Prospector tools allow for the comparison and combination of multiple proteomic datasets and analyses by integrating search functionality with information about experimental design. For example, comparing quantities of protein components across samples using isotopic labeling or label-free strategies. The resource is also developing improved search strategies to allow identification of chemically cross-linked peptides in complex mixtures

Applications of department mass spectrometry include:

Neurological proteomics: assessing protein populations and PTMs in the brain

One goal of proteomics is move beyond the reductive analysis of individually acting protein isoforms to examine how the composition of protein populations changes in response to natural stimuli and stressors, disease states, and drug therapies. Thus the aim is to conduct broadly encompassing (global) measurement of the relative quantities of the hundreds of different proteins that interact to generate biological outcomes, moving beyond the activities of individual neurotransmitter and receptor species

The synapses of the brain and central nervous system (CNS)—dynamic junctions where neurons exchange chemical signals that are the molecular underpinnings of phenomena such as learning and memory—are the subject of such quantitative and qualitative (PTM) proteomic analysis by department researchers and their neuroscientist collaborators. For example:

Tracking coordinated changes in synaptic proteome during CNS stimulation

MS combined with isotopic tagging of peptides was used to analyze variations in the relative abundance of nearly 900 proteins in mouse post-synaptic densities (PSD) at distinct time intervals in response to broad CNS drug stimulation. The density is a protein-rich structure on the receiving end (dendrite terminal) of inter-neuronal chemical communications across the synaptic cleft. PSDs contain concentrations of receptors that bind neurotransmitters as well as numerous associated synaptic signaling and regulatory proteins, assembled by a variety of scaffold proteins.

By analyzing the changes in relative quantities of the post-synaptic proteins, the study found evidence of the co-regulated activation of certain groups. Thus they were able to identify core functional complexes in which proteins displayed coordinated activity even when not known to physically interact.

Quantifying synaptic phosphorylation and protein expression by brain region

Global MS quantification and isotopic labeling was applied to examine differences in protein expression and phosphorylation at post-synaptic densities by brain region. The analysis of more than 2,100 proteins and 1,500 phosphorylation sites in a mouse model suggested roles for previously unannotated proteins whose greater expression clustered with known functional complexes.

This study also revealed relatively more phosphorylation sites on NMDA than AMPA receptors, which are both involved in synaptic plasticity—changes in the efficacy of synaptic transmissions that underlie learning, memory, and other neurological phenomena. Indeed, the study found higher average levels of enzymes associated with reversible phosphorylation (kinases, phosphatases) in the hippocampus, a region associated with memory formation and spatial orientation is among the first to suffer damage in Alzheimer’s disease.

Graph of relative abundance of a specific protein, chapsyn-110 (a membrane-associated kinase) in post-synaptic densities by brain region, showing its greatest prevalence in the cortex, mid-brain, cerebellum, and hippocampus. (Individual peptides are represented by dots, protein averages by horizontal bars.) This research was originally published in Molecular & Cellular Proteomics. Trinidad, et al.,  Quantitative Analysis of Synaptic Phosphorylation and Protein Expression. Molecular and Protein Expression. 2008; Vol 7:684-696

Graph shows higher relative expression of enzymes associated with reversible phosphorylation in the hippocampus. This research was originally published in Molecular & Cellular Proteomics. Trinidad et al., Quantitative Analysis of Synaptic Phosphorylation and Protein Expression. Molecular and Protein Expression. 2008; Vol 7:684-696

First large-scale mapping of challenging carbohydrate modifications in synaptic proteome

Department scientists applied liquid chromatography and electron-transfer dissociation MS to conduct the first large-scale study characterizing the specific sites and modifier structures of more than 2,500 unique N- and O-linked glycopeptides from 453 proteins in mouse synaptosomes—membrane-bound sacks of vesicles from neuronal axon terminals.

Extra-cellular glycosylation is a common, complex suite of structurally related PTMs in which one or more carbohydrates (glycans) are attached to a membrane or secreted protein. The N- or O- designation refers to the nitrogen or oxygen in amino acid (residue) side chains where the glycan is attached. These PTMs aid and stabilize protein folding among other functions.

Such analyses are challenging because glycosidic bonds are more readily broken in collision-induced dissociative processes than the peptide bonds of other covalent PTMs. Also, there is limited software for automating identification of glycan-modification spectra.

Researchers here used two forms of affinity chromatography (lectin, TiO2) to sort and enrich the concentration of glycopeptides for identification. Then, having previously identified thousands of synaptosome proteins, they focused on annotated transmembrane/secreted proteins and used the resource’s Protein Prospector to search those proteins, allowing for modifications with specific mass values corresponding to potential glycan components. They thus identified the most common carbohydrate modifications and matched their masses to sugar (oligosaccharide) structures.

Such MS analyses also identified the number of unique glycans for each N-linked glycosylation site, including a single site with 19 glycan modifications.

Pioneering the identification of sites and roles of a common, important sugar PTM: O-GlcNAcylation

O-GlcNAcylation is a common, vital, and reversible post-translational modification that occurs in all animals and plants(metazoans), in both cellular nuclei and cytosol. It entails the modification of serine and threonine residues of nuclearc and cytosolic proteins with a single oxygen-linked N-acetylglucosamine (GlcNAc). It is implicated in the modulation of:

  • gene regulation (including via modification of transcription factors)
  • responses to cellular stress and nutrient levels, including the effects of the latter on cellular circadian clocks
  • the targeting of proteins for destruction by the proteasome
  • intra-cellular signaling pathways regulated by phosphorylation (the most-studied PTM and commonly an on-off switch of proteins)—via cross-talk on co-modified residues or competition for modification sites
  • diseases such as diabetes and Alzheimer’s (enzymes that add/remove O-GlcNAc are highly expressed in the brain)

While this PTM was discovered three decades ago, detailed determination of its biological functions had been limited by the difficulty of identifying precisely which serine or threonine residues on a protein were modified. The use of collision-induced dissociation to fragment proteins into peptides for analysis typically broke the weaker sugar-oxygen glycosidic bonds before the modified protein’s peptide backbone, thus losing modification site data.

At a time when less than 80 exact residues of O-GlcNAcylation were known on all proteins, mostly via chemical dissection (Edman degradation), scientists here pioneered new methodology, combining lectin (wheat germ agglutin) affinity chromatography with electron-transfer dissociation (ETD) tandem MS. This was applied to the proteomes of mouse brain cells, where high rates of O-GlcNAcylation had been noted. Specifically, researchers analyzed post-synaptic densities—protein-rich structures at the receiving end (dendrite terminal) of inter-neuronal chemical communications across the synaptic cleft.

Upon initially applying the new method, department researchers determined 58 modification sites from a single experiment, including 28 on Bassoon—a protein in the synaptic active zone—matching the number of then-known phosphorylation sites on that protein and suggesting that O-GlcNAcylation might be play an equivalent role in regulating its function. Indeed, the study also found further evidence that the sugar modification might interact (crosstalk) with phosphorylation in the synapse, as several O-GlcNAc-modified sites on Bassoon were previously reported as phosphate modified.

Post-translational modifications on the synaptic protein Bassoon, which is comprised of 3940 amino acids from its N to C terminus. Positions of the 28 O-GlcNAC modifications (top) are indicated relative to phosphorylation sites (bottom). O-GlcNAc sites also reported as phosphorylation sites are indicated in red. From Proceedings of the National Academy of Sciences, Identification of protein O-GlcNAcylation sites using electron transfer dissociation mass spectrometry on native peptides, Chalkley et al., Vol 106 no. 22, 2009, 8894–8899

Combined global characterization of two major modifications

The potential for modulating effects, reciprocal regulation, and other interplay (crosstalk) between O-GlcNAcylation and phosphorylation is suggested by factors that include:

  • Both reversibly modify serine and threonine residue side chains
  • Both are common (most cellular proteins may be phosphorylated, while thousands of O-GlcNAcylation sites are now known)
  • In vitro modulation of global phosphorylation levels changes the O-GlcNAcylation levels of many proteins and vice versa.

To examine the interplay of these two common PTMs, researchers here developed an approach allowing both the combined detection and site determinations of O-GlcNAcylatoin and phosphorylation, in the same biological sample. This applies affinity chromatography (a weak interaction between O-GlcNAc and lectin what germ agglutinin as well as between phosphates and titanium dioxide) for the enrichment of O-GlcNAc- and phosphate-modified peptides with electron capture/transfer dissociation MS. Examples of this dual PTM characterization include:

Global identification, location of O-GlcNAcylation and phosphorylation in same synapse proteome samples

Department scientists and their collaborators applied this approach to detect both modifications and their locations on proteins in the same biological samples from mouse synaptosomes—membrane-bound sacks of vesicles from the delivering side (axon terminals) of inter-neuronal chemical communications across the synaptic cleft

The study identified more than 6000 proteins with either or both modifications and yielded estimates of 19% and 63% of the synaptosome proteins being O-GlcNAcylated or phosphorylated, respectively. It also found that proteins extensively modified by O-GlcNAc were almost always phosphorylated to a similar or greater extent, indicating that O-GlcNAc-transferase (OGT, which catalyzes the PTM), is targeting certain phosphorylated proteins. In addition, kinases (which phosphorylate proteins) were the class of proteins most extensively modified by O-GlcNAc, suggesting a form of crosstalk in which O-GlcNAcylation regulates the enzymes’ activity.

O-GlcNAcylatoin and phosphorylation interplay in circadian clock regulation

The resource provided MS analysis of the interplay between these modifications in the regulation of 24-hour cellular circadian clocks, which coordinate biological processes in organisms from bacteria to humans. The clocks’ rhythms can be affected by nutrient levels as well as by light.

Prior studies had found that O-GlcNAcylation of circadian-related transcription factors and proteins functions as a nutrient sensor and affects the clocks (altering period length in mice and fruit flies). MS analysis of proteins from mouse brains and livers by department scientists showed OGT, the enzyme that catalyzes O-GlcNAcylation, is phosphorylated and thus upregulated by a kinase (glycogen synthase Kinase 3β) but can also be O-GlcNAc modified at the same locations such that the two enzymes compete with and “fine tune” one another’s effects.

A] MS/MS spectrum of a doubly O-GlcNAc-modified region of the brain protein PER2, which regulates circadian clock speed—or time of sleep onset. B] MS/MS identified sites of O-GlcNAc modification of a kinase(CK1) binding domain (amino acids 557 to 771). The identified serines (numbers 662, 668, and 671) are also CK1 phosphorylation sites suggesting competing modifications. From Cell Metabolism, Glucose Sensor O-GlcNAcylation Coordinates with Phosphorylation to Regulate Circadian Clock, Kaasik et. al, Febraury 2013, PP 291-302.

The same study used tandem MS analysis to find that O-GlcNAcylation and phosphorylation are also competing modifications of a specific region of a brain protein (PER2) critical for regulating clock “speed”—a mutation reducing phosphorylation of the region leads to a syndrome of early evening sleep onset.

Deciphering epigenetics: detecting and characterizing modifications to histones and chromatin proteins (DNA)

There are roughly 20,000 protein-encoding genes in every cell in our bodies. Yet gene expression differs not only by cell type and developmental stage, but also during healthy response to stimuli and stresses and via dysfunction and disease, converting genetic make-up (genotype) to observable traits (phenotype).

One way that gene expression is regularly modified is by the attachment of chemical groups and carbohydrates to the combined complex of DNA and its associated proteins (chromatin) in the cell nucleus, leading to a physical remodeling that alters gene accessibility.

Determining how and where such modifications occur is an example of epigenetics—the study of such non-sequence-related changes in gene expression that can be inherited by subsequent generations. A key subset of epigenetic modifications is made to histones—several protein isoforms that package and stabilize DNA in a spool-like manner (a repeating basic unit called a nucleosome is a DNA segment wound around eight histone cores). Upon histones’ modification by multiple chemical groups (e.g., acetyl, methyl, ubiquitin, phosphate), typically on their unstructured N-terminals (dubbed “tails”), they remodel part of the chromatin, regulating the binding affinities of transcriptional proteins and ultimately altering gene expression.

MS/MS spectrum of a histone (H4) peptide (GKGGKGLGKGGAKR) from maize (aka corn) showing its post-translational modifications by acetyl (ac) groups (acetylation). M/Z indicates mass-to-charge ratios.

Department scientists and their collaborators have pioneered in applying mass spectrometry to histone modifications as well epigenetic changes such as DNA methylation. Such research could help decipher the hypothetical histone and epigenetic codes—which propose that complex combinations of multiple modifications and their interactions (cross-talk) are correlated with changes in gene expression in modulated responses to endogenous and environmental stimuli and, when disrupted, with dysfunctional expression in disease.

Histone modifications represent a challenging model for the application of mass spectrometry to analyzing intact proteins, given their complex variety of modifications. More than 60 residues on the four core nucleosome histones have been found to be subject to one or more modifications. Top-down electron transfer/capture dissociation techniques allow the analysis of larger peptides, but modifications to different sites and in different quantities (e.g., trimethylation) must be precisely detected despite varying in frequency and abundance by orders of magnitude.

Examples of department research and applications of MS methodologies into epigenetic modifications include:

Quantifying effects of carcinogenic arsenic exposure on histone modifications

Scientists here applied quantitative MS analysis of comparative histone modifications to a cell culture model for the development of bladder cancer due to arsenic exposure. While arsenic exposure, especially through contaminated groundwater, is known to be associated with a several types of cancers in humans, the underlying mechanisms have been elusive. Working with collaborators, scientists here analyzed histone modifications in human uroepithelium cells (which line the urinary tract) exposed to doses of arsenic over time. The study detected a reduction in acetylation levels on H3 and H4 histone isoforms at specific lysine residues. Noting related findings of arsenic-altered histone acetylation and malignant transformation, the study authors suggested that such epigenetic disturbances could silence key growth-controlling genes, leading to tumor development.

Assessing gene regulatory demethylation of key nucleosome sites

Using liquid chromatography-tandem MS to monitor the removal of methyl groups (demethylation) from a histone substrate over time, researchers here were able to develop a broadly applicable method to assess demethylation of specific nucleosome sites. By reconstituting the process in vitro, the study looked specifically at reaction rates (kinetics) of a histone demethylase JMJD2A catalytic domain which, among its various histone substrates, removes a third methyl group from a trimethylated H3 histone’s ninth lysine residue (H3 K9). This modification represses transcription by altering chromatin structure to inactivate genes. While trimethylation of that histone site has been seen at genes silenced in cancer, overexpression of the JMJD2 enzyme family is also implicated in disease, thus indicating the importance of precise demethylation regulation.