Skip to main navigation Skip to search Skip to main content

Beyond conventional peptide identification-cross-linked peptides identification and unlimited post-translational modification identification

  • Fengchao Yu

Student thesis: Doctoral thesis

Abstract

Liquid chromatography–mass spectrometry (LC-MS) based proteomics has achieved great success in recent years, and is becoming one of the major approaches to studying biological problems. In the dry lab workflow, peptide identification is the first step. Its outcome is the source of many downstream analyses, including post-translational modification (PTM) analysis, protein expression analysis, biological signaling pathway analysis, and protein-protein interaction analysis. Researchers have proposed various methods to identify peptides. According to the targets, there are two types of peptide identification tasks: cross-linked peptides identification and linear peptide identification. Cross-linked peptides identification is a new topic that appeared a few years ago. Its targets are pairs of peptides linked by certain chemical compounds. Thus, it’s search space is quadratic with respect to the number of peptides in a database. Identifying cross-linked peptides by searching all peptide-peptide pairs is still an open question. Linear peptide identification has been well studied and widely used in biological research. However, most proposed methods only support a limited number of PTMs due to the large computational complexity. Identifying peptides without limiting PTMs is also an open question. In this thesis, we try to solve these two open questions by proposing computational methods. First, we solve the cross-linked peptides identification problem by proposing two methods. The first method, called ECL, can exhaustively search all peptide-peptide pairs from a database. To our knowledge, there is no existing tool that can search all peptide-peptide pairs due to the large computational complexity. Existing methods for cross-linked peptides identification use heuristic filtering procedures to reduce the search space. However, non-exhaustive search will cause considerable missed findings. Experiments show that ECL identifies more nonredundant cross-linked peptides than non-exhaustive search methods, including xQuest, pLink, and ProteinProspector. The running speed comparison shows that ECL is much faster than xQuest, pLink, and ProteinProspector even though it searches many more peptide-peptide pairs than these tools. We show that ECL has a quadratic time complexity, which results in a long running time when the database is large. Thus, we propose another method, called ECL 2.0, to achieve a linear time complexity. This method takes advantage of the score functions’ additive property to convert a score into the summation of two chain scores. It couples such a conversion with a digitization-based approach to achieve the linear time complexity. Experiments show that ECL 2.0 has the highest sensitivity among state-of-the-art tools, including pLink, StavroX, ProteinProspector, Kojak, and ECL. It is also much faster than pLink, StavroX, ProteinProspector, and ECL. Second, we propose a method, called PIPI, that can identify peptides with unlimited number of PTMs. This method codes peptide sequences and tandem mass spectra into vectors. The coding approach ensures that the coded vectors are invariant to PTM. Then, it searches the coded spectra against the coded peptide sequences. Since the coded spectra and peptide sequences are invariant to PTM, the search procedure can find peptide-spectrum matches (PSMs) with unspecified PTMs. Finally, it infers PTMs, calculates a fine score, and estimates the false discovery rate for each PSM. Experiments show that PIPI has a higher sensitivity than Mascot, Comet, MS-GF+, MS-Alignment, and MODa. It is also much faster than most of these tools.
Date of Award2017
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'