Systems biology and proteomics

David J. States Research Group

Sarofim Research Building

Top Down Bottome Up Analysis

Mass spectrometry has emerged as a powerful new technology for the analysis of complex protein mixtures. Key recent advances include the development of electrospray ionization and high resolution nano scale chromatography; ion trap mass spectrometers; Orbitrap and Fourier Transform mass analyzers capable of extremely high resolution and non-destructive detection of large ions; and improvements in computing power, statistics and algorithms for the analysis of MS data. As powerful as modern mass spectrometry is, we still face many challenges. Mass spectrometers are limited in dynamic range, and the presence of one ion may suppress the detection of other ions ("ion suppression"). MS analysis of intact proteins is extremely difficult, and most work is done on peptide digests, but it is often difficulties in decide which peptides came from which proteins. Further, many high quality mass spectra can not be identified using current techniques.

Bottom up or shotgun proteomics is a powerful approach for protein identification in which a protein mixture is digested to peptides, typically using trypsin. The peptide mixture is fractionated in one- or two-dimensions, or in a clever approach, two-dimensions on a single column (MuDPIT). The peptide containing fractions are introduced directly into the mass spectrometer using electrospray ionization, and a first dimension or parent ion spectrum is acquired, usually using an ion trap. The ion trap allows the analysis to be repeated selecting parent ions of interest for further analysis by fragmenting them and analyzing the mass spectra of the fragment ions (tandem MS or MS/MS). Spectra are assigned peptide identifications by matching their tandem mass spectra with a target protein sequence database.

There are several difficulties to overcome in applying this approach to analyze to complex mixtures of proteins such as the cellular proteome. Many spectra (half or more in many experiments) can not be confidently assigned a peptide identification. Spectra derived from peptides not present in the database used as the target for the search will be missed. There may not be time to acquire spectra from all of the parent ions of interest. Even when peptides are confidently identified, the presence of alternative splicing and duplicated genes (paralogs) may make it difficult to know from what protein a peptide was derived. Finally, even with deep sampling, it is very difficult to acheive complete coverage of a protein so that the complete covalent structure can be determined.

Top down Proteomics is an alternative strategy in which analysis begins at the level of the intact protein. While it is possible to ionize intact proteins and to fragment them in the mass spectrometer, this remains extremely challenging technically and has not proven to be a robust strategy for the analysis of complex biological protein mixtures. Instead, we apply a chromatographic or electrophoretic fractionation of the intacts proteins. There are many technologies available for the analysis of intact proteins. Ion exchange chromatography and isoelectric focusing electrophoresis separate proteins by charge. Reverse phase chromatography separates proteins by hydrophobicity. SDS electrophoresis or gel filtration separate proteins by molecular weight. The techniques can be applied serially offering a rich selection of multi-dimensional fractionation techniques.

An advantage of multi-dimensional strategies is that the resolution of different dimensions is effectively multiplied. While it is increasingly difficult to increase the resolution of a single fractionation step, by combining two and three dimensions of protein fractionation, enormous resolving powers can be acheived.

Integrated Top Down Bottom Up Proteomics couples the Bottom Up approach for protein identification with Top Down approaches to resolve protein mixtures. Top Down analysis fractionates a single starting sample into hundreds of fractions, each of which needs to be subject to Bottom Up anlysis. Robotics and laboratory automation are used to make this process feasible and reproducible. We use a combination of 2-dimensional fractionation approach based on charge and hydrophobicity to resolve complex cellular protein mixtures followed by tryptic digest and MS/MS analysis. The resulting data sets are very large (hundreds of gigabytes) and analyzing these data sets presents a substantial high performance computing challenge.

Integrated Computational and Experimental Strategies

To gain maximal insight from complex data sets produced by integrated top down bottom up proteomics analysis, we need to develop database systems to manage these large data sets, improved statistics to improve the power of protein identification and quantification, improved algorithms to make the computations feasible, and integrated knowledge repositories to assemble and interpret the results.