Peptide re-ranking with protein-peptide correspondence and precursor peak intensity information

Chao Yang (yorkey@ust.hk)

 

Source codes:

In our packages, we use configuration files to store the path of resources and parameters of our programs. The configuration files are human readable text files. Please edit configuration files before running the programs. The configuration files used in our experiments are included in the supplementary documents in the following section.

The source code can be downloaded at: PeptideReranking.

 

Supplementary Files:

The configuration files of our programs in the experiment and UtilityTool are available. UtilityTool provides some Ruby libraries that can be used to extract MS1/MS2 spectra, parse X!Tandem result to generate input for our program and build the protein-peptide map matrix.

The supplementary documents can be downloaded at: Supplementary.

The PeptideProphet results (i.e. interact.pep.xml) and ProteinProphet results (interact.prot.xml) obtained by TPP (v4.4) can be downloaded at: PeptideProphetAndProteinProphet.

 

How to try the program:

In the following section, the way to run MIRanker is shown.

(1) Prerequisite

Matlab, TPP , X!Tandem, SLEP and Ruby should be installed on your computer. Our experiments are conducted on a computer running Windows 7 32bit enterprise version. We use:

Matlab: 2009a 32bit version

TPP: v4.4 http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP

X!Tandem: v2008.12.01.1 http://www.thegpm.org/TANDEM/

SLEP: http://www.public.asu.edu/~jye02/Software/SLEP/

Ruby: http://www.ruby-lang.org/en/

(2) Preparation

a) Run X!Tandem and TPP to get PSMs and peptide probabilities.

In the above figure, "4.mzXML" is a raw data; "pep_4.2011_01_13_17_33_21.t.xml" and "interact.pep.xml" are the X!Tandem identification result and the PeptideProphet result, respectively. The organization of my experimental directory is shown in the figure.

b) Create a configuration file for the program. The following file "initFile_18proteinMixture.txt" is created for MIRanker. The configuration files of other programs can be found in the supplementary documents.

# Init file content: each item should be placed in one single row.
# The current implementation of the model is developed and tested
# based on label free ESI-LC-MS data. Lines with "#" at the begining
# are comments and they will not be parsed by the program.

# Note: the following keywords are case sensitive. Please specify the path
# of each resource below.

# the xtandem database search result
xtandemXML=E:\18mixExp\data\pep_4.2011_01_13_17_33_21.t.xml

# The file can be generated by the Trans-Proteomic Pipeline (TPP)
prophetFile=E:\18mixExp\data\interact.pep.xml

# Save the xtandem search xml parsing result
tandemSaveName=E:\18mixExp\data\xtandem_identified_list.txt

# EvaluationResult obtained from the xtandem search result
evaXtandemSaveName=E:\18mixExp\data\xtandemroc.txt

# the raw mzXML file
mzXMLFileName=E:\18mixExp\data\4.mzXML

# the MS1 spectra save dir
outputDir=E:\18mixExp\data\raw_ms1\

# the model data directory: Save the regression model build by the program. If
# the regression parameter (e.g. lambda) changes, the model does not need to rebuild
# and thus it saves time.
modelSaveDir=E:\18mixExp\data\model_data\

# the program will save the protein peptide map here
ProteinPeptideMapSave=E:\18mixExp\data\pp_map.txt

# protein peptide map excel file contains proteins. This file is generated for debugging.
# Use "ProteinPeptideLinker.setcsvstate("no")" to turn this function off.
PPMapProteinCSV=E:\18mixExp\data\proteins.csv

# the protein peptide map excel file contains pp_map. This file is generated for debugging.
# Use "ProteinPeptideLinker.setcsvstate("no")" to turn this function off.
PPMapContentCSV=E:\18mixExp\data\pp_map.csv

# output the re-ranking result
rerankedSeqList=E:\18mixExp\data\re_ranked_seq.txt

# output the q-value vs num_of_hit curve
rerankedEvaSaveName=E:\18mixExp\data\18_mix_curve.txt

# ------------- parameters of the program ---------------
# define the decoy key word used in the decoy database construction. This is just used for
# the performance evaluation purpose.
decoyKeyWord=decoy

# ------------- parameters used to build the model ------
# regularized parameter, the maximal value could be "max(abs(protein_basis'*target_data))"
# the ratio below should be within [0, 1]. Generally, this parameter can be set around
# 0.05
regularizedParameterRatio=0.06

# data mz range
mzRangesLow=300
mzRangesHigh=1340

# when the following parameter is true and there exists "model.mat" in the directory
# modelSaveDir, then the program will not try to generate a new model by loading ms
# spectra, estimating isotopic distribution and preparing bases for the model. This
# choice could be a way to speed up your program when you only have to change
# "scoreCombineWeight" and "regularizedParameter"
regenerateModel=false

# low resolution or not.
lowResolutionData=false

# enable log tranform on raw intensity.
enableLogTranform=true

# the number of isotopes considered. For high resolution data, this could be 3 or 4;
# for low resolution data, this could be 1
numOfIsotopes=4

# mass error in Da. For high resolution data, this could be 0.1Da
massError=0.1

# charges to be considered, each charge is separated by ",". When you only want to consider
# charge state "1, 2, 3", then "chargeList=1,2,3". Note: "chargeList=,1,2,3" and
# "chargeList=1,2,3," are invalid and will produce "NaN" in the program. "," can only be
# placed between numbers.
chargeList=2,3

# -------------- parameters used to recompute scores -------
# Typical values can be 0.8 ~ 0.99. The default value is 0.99 (please do not use 1).
# Generally, you do not have to change this parameter.
sigValue=0.99

(3) Run MIRanker

a) Run "AnalyzeMS.rb" to extract the X!Tandem result, analyze the ProteinProphet result and prepare the input of MIRanker:

Extract MS1 data:

AnalyzeMS.rb --init_file initFile_18proteinMixture.txt --run_type ms

Analyze X!Tandem and ProteinProphet results:

AnalyzeMS.rb --init_file initFile_18proteinMixture.txt --run_type xtandem

Create Matrix L

AnalyzeMS.rb --init_file initFile_18proteinMixture.txt --run_type create_pp_map

b) Edit and run "pep_reranking.m".

Make sure that the following two lines are correctly assigned:

re_ranking_method = 'MIRanker';
init_file_name = 'E:\PeptideReranking\initFile_18proteinMixture.txt'; % specify the full path of the init file

Note:

(1) You can try other programs such as PPMRanker by using a different configuration file and changing the value of "re_ranking_method". If you encounter any problem in running the program, please send me an email: yorkey@ust.hk.

(2) In MIRanker, there is a parameter "lambda" (i.e. regularizedParameterRatio). If only this value is changed, you can specify "regenerateModel=false" to avoid regenerating the model. In this case, "pep_reranking.m" finishes quickly. If values such as "numOfIsotopes", "massError" and "chargeList" are changed, please specify "regenerateModel=true" to rebuild the model. Here, "regenerateModel" is introduced to make the parameter tuning process efficient.