PST: parallel simulation tool to study open search methods for the identification of peptides with post-translational modifications

About PST

Analyzing tandem mass spectrometry data to identify peptides in a sample is the fundamental task in computational proteomics. Traditional peptide identification algorithms perform well when facing unmodified peptides. However, when peptides have post-translational modifications (PTMs), these methods cannot provide satisfactory results. Recently, Chick et al. and Yu et al. proposed the spectrum-based and tag-based open search methods, respectively, to identify peptides with PTMs. While the performance of these two methods is promising, the identification results vary greatly with respect to the quality of tandem mass spectra and the number of PTMs in peptides. This motivates us to systematically study the relationship between the performance of open search methods and quality parameters of tandem mass spectrum data, as well as the number of PTMs in peptides.

Through large-scale simulations, we obtain the performance trend when simulated tandem mass spectra are of different quality. We propose an analytical model to describe the relationship between the probability of obtaining correct identifications and the spectrum quality as well as the number of PTMs. Based on the analytical model, we can quantitatively describe the necessary conditions to effectively apply open search methods.


Related Publication
J. Dai*, F. Yu*, and W. Yu. *Contributed equally to this work.
"Understanding the limit of open search in the identification of peptides with post-translational modifications — A simulation-based study",
Under review.
Part of the preliminary result has appeared as a poster in The 21st Annual International Conference on Research in Computational Molecular Biology (RECOMB 2017).

Where to download PST

Source code (updated 22 Mar 2018): PST_src.zip
(Downloads = )


Environment configuration

Please install Python 3 and NumPy to use this tool.


How to use it?

Please change the directory to the root of the package before starting simulations.
To start the simulations using tag-based open search, please use the following commands.

python3 scripts/run.py config/sample.json
python3 scripts/local_control.py output/Sample.Output.dat

To start the simulations using spectrum-based open search, please use the following commands.

python3 scripts/run_spec.py config/sample.spec.json
python3 scripts/local_control.py output/Sample.Spectrum.Output.dat

To view the content of the result file (.dat), please use

python3 scripts/echo.py output/Sample.Output.dat