Post-translational modification (PTM) is a key step in protein biosynthesis, critical for the correct trafficking and function of the protein. However, the high-throughput identification of additional biochemical functional groups, such as phosphate and glycans, involved in PTM remains a challenge in proteomics. The search strategy based on protein sequence database is in widespread use, but it is time-consuming and prone to false positives because of its exponentially increased search space and incomplete theoretical fragmentation model. Due to its advantages in efficiency and sensitivity, spectral library searching is a promising alternative to conventional sequence database searching. Our work aims to facilitate PTM identification in the spectral library search approach. In particular, we first applied the approach on two important and challenging PTMs, phosphorylation and glycosylation, and extended the method to other modifications. In phosphorylated peptide identification, the largest collision-induced dissociation (CID) tandem mass (MS2) spectral libraries of phosphorylated peptides in human and other model organisms to date have been built in an automatic platform which consists of multiple state-of-art search engines (e.g. X!Tandem and MSGF+) and site-localization tools (e.g. PhosphoRS and PTMProphet) with strict quality control. Spectral library searching using this library significantly outperforms existing methods for detecting phosphosites in a variety of datasets. In glycopeptide identification, a spectral library searching method was developed to identify intact N-linked glycopeptides from the MS2 spectra, based upon an existing spectrum prediction tool, MassAnalyzer (Zhang, Z., Anal. Chem. 2010), to account for the special fragmentation patterns of glycopeptides. We evaluated the scoring functions, developed methods to analyze ambiguous candidates and clustered the predicted spectral library to reduce the searching cost. A novel query decoy strategy was further applied to estimate the false discovery rate (FDR) of glycopeptides. The spectral library searching strategy was successfully verified in the searching of standard N-linked glycoproteins. In multiple PTM identification, we extended the spectral library searching method to utilize known modifications sites in UniProtKB to achieve multiple PTMs searching at one time. A predicted spectral library was built using the software MassAnalyzer, which contained all possible tryptic modified peptides generated based on the PTMs reported in MOD_RES fields and all reviewed proteins in UniProtKB. The search results of 4 human tissues samples against the spectral library showed that our spectral library is able to realize multiple PTM profiling, but there are still several challenges in both experimental and computational methods, such as enrichment of multiple PTMs and prediction models of novel PTMs. Keywords: Tandem Mass Spectrometry, Shotgun Proteomics, Protein Posttranslational Modification (PTM), Phosphorylation, Glycosylation, PTMs Profiling, Spectral Library Searching, SpectraST, UniProtKB
| Date of Award | 2015 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
Building and searching predicted spectral libraries for identification of protein post-translational modifications
HU, Y. (Author). 2015
Student thesis: Doctoral thesis