Abstract
Conventional computational proteomic methods face the challenge of an exponentially growing search space when identifying peptides with multiple post-translational modifications (PTMs) from mass spectrometry data. This task is crucial for advancing our understanding of PTM crosstalk in proteomics. Existing methodologies often encounter limitations due to the high computational complexity, thus resorting to sub-optimal solutions.To overcome this challenge, we adopt a combinatorial formulation that significantly enhances both precision and recall in PTM identification. This is achieved in three successive pieces of work. First, we developed PIPI2, a search engine utilizing a greedy algorithm that simplifies the PTM characterization problem into a linear formulation and demonstrates the possibility of viewing the challenge from a combinatorial perspective. This method performs better than existing techniques, providing high accuracy, even with lower-quality data. Second, we further propose a mixed integer linear programming (MILP) feasibility model to efficiently reduce the search space to feasible PTM combinations only, thus improving performance. Third, we propose PIPI3 to formulate the PTM identification challenge as a combinatorial optimization task within a single MILP model. This model is the first theoretical formulation for the problem of identifying peptides with multiple PTMs.
Notably, PIPI3 is applied to lung squamous cell carcinoma data, successfully identifying numerous potential PTM crosstalks and offering valuable insights into their roles in cancer biology. Overall, this thesis contributes to proteomics by providing robust and scalable tools for PTM characterization. The methodologies can potentially deepen our understanding of PTM crosstalks and offer a framework for future research in disease-related PTM studies.
| Date of Award | 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Weichuan YU (Supervisor) & Ning LI (Supervisor) |
Cite this
- Standard