Systems biology and proteomics

David J. States Research Group

Sarofim Research Building


Aberrations in alternative splicing occur in many cancers, and aberrant processing of transcripts can lead to the inactivation of tumor suppressors. For example, the critical regulatory molecule and tumor suppressor P53 has multiple transcript isoforms leading to P47 and P53 proteins with differential binding to MDM2. Further, tumor suppressor genes such as RBM5 may act through the splicing process. Recently, Cantely et al have demonstrated that the Warburg Effect shifting cancer cells to aerobic glycolysis is controlled by a change in the splicing of the PKM2 gene in cancer cells.

Aberrant transcript splicing can lead to novel translation products by introducing alternative exons or by altering the reading frame and/or boundaries of existing exons in transcript coding sequences. These novel translation products are new to the body and may trigger autoimmune phenomena and contribute to the paraneoplastic syndromes seen in many cancers.

Transcript splicing is a complex and highly regulated process. Cells contain both ubiquitous and cell type specific splicing factors and splice enhancer proteins. With the mutations and genomic rearrangements that occur in cancer, normal splicing processes can be disrupted in many ways. To understand the implications of aberrant splicing in cancer, we need to characterize the protein products found in cancer cells, and one of the key goals of this projects.

The novel proteins produced by translation of aberrant transcripts also present a novel and potentially valuable source of biomarkers for cancer. Because these are proteins are normally not produced in the body, they may be both sensitive and specific as biomarkers. Surveying cancers to identify candidate biomarkers produced by aberrant transcript processing is another goal of this project.


We apply a combination of top down and bottom up proteomics. Bottom up or shotgun proteomics has proven to be a powerful approach for protein identification, but identifying alternative splice products presents some unique challenges. In bottom up proteomics, the protein mixture is digested, typically with trypsin, the peptides are fractionated and inidividual peptides are identified by matching their tandem mass spectra with a target protein sequence database. There are several difficulties to overcome in applying this approach to analyze alternative splicing translation products. First, exons are often shared between transcript isoforms. Unless a peptide happens to span a unique splice juction, identifying the peptide does not tell you which splice isoform it was derived from, and the same peptide may be present in several different splice isoform translation products. We fractionate the intact proteins using a 2-dimensional fractionation approach based on charge and hydrophobicity to resolve the translation products as much as possible. Then each fraction is subject to tryptic digest and tandem mass spectral analysis.

The second challenge is that our knowledge of alternative splicing is currently limited and many transcript isoforms are not annoated in the molecular sequence databases. Spectral matching will fail to identify translation products if the sequence is missing from the target database. We overcome this problem using two approaches. The first is to construct a target database containing both know and hypothetical transcript isoforms. With modern mass spectrometry, the quality of the data is sufficient to distinguish bona fide identifications from false positives even when the target database is many times larger than the standard protein sequence collections. Another solution to the problem of incomplete target databases is the use of de novo and partial de novo sequence determination in which a high resolution MS/MS spectrum is analyzed without reference to a target database to determine what sequences or partial sequences could account for the observed spectral data. Like large database search, these are computational demanding calculations.

The combination of intact protein fractionation and searching large target databases makes the search task very computationally intensive. We overcome this by using high performance parallel computing resources and distributed computing techniques.

Recent Results

We have identifed translation products for the mouse PKM2 gene for transcript isoforms homologous to both of the isoforms identified by Cantley et al. as being responsible for the Warburg shift in aerobic glyolysis in human cancers. Alternative splicing had not previously been characterized in the mouse PKM2 gene. Our work makes possible the development of mouse models for the Warurg effect.

Opportunities for Collaboration

We are interested in working with investigators who are profiling the proteomics of cancer cells. If you already have deep tandem mass spectral data sets, we are interested in collaborations to apply our database and search techniques to identify novel translation products that would not be identified in the standard database search approaches.