Chao Yang (yorkey@ust.hk)
Source codes:
In our packages, we use configuration files to store the path of resources and parameters of our programs. The configuration files are human readable text files. Please edit configuration files before running the programs. The configuration files used in our experiments are included in the supplementary documents in the following section.
The source code can be downloaded at: PeptideReranking.
Supplementary Files:
The configuration files of our programs in the experiment and UtilityTool are available. UtilityTool provides some Ruby libraries that can be used to extract MS1/MS2 spectra, parse X!Tandem result to generate input for our program and build the protein-peptide map matrix.
The supplementary documents can be downloaded at: Supplementary.
The PeptideProphet results (i.e. interact.pep.xml) and ProteinProphet results (interact.prot.xml) obtained by TPP (v4.4) can be downloaded at: PeptideProphetAndProteinProphet.
How to try the program:
In the following section, the way to run MIRanker is shown.
(1) Prerequisite
Matlab, TPP , X!Tandem, SLEP and Ruby should be installed on your computer. Our experiments are conducted on a computer running Windows 7 32bit enterprise version. We use:
Matlab: 2009a 32bit version
TPP: v4.4 http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP
X!Tandem: v2008.12.01.1 http://www.thegpm.org/TANDEM/
SLEP: http://www.public.asu.edu/~jye02/Software/SLEP/
Ruby: http://www.ruby-lang.org/en/
(2) Preparation
a) Run X!Tandem and TPP to get PSMs and peptide probabilities.
In the above figure, "4.mzXML" is a raw data; "pep_4.2011_01_13_17_33_21.t.xml" and "interact.pep.xml" are the X!Tandem identification result and the PeptideProphet result, respectively. The organization of my experimental directory is shown in the figure.
b) Create a configuration file for the program. The following file "initFile_18proteinMixture.txt" is created for MIRanker. The configuration files of other programs can be found in the supplementary documents.
# Init file content: each item should be placed
in one single row.
# The current implementation of the model is developed and tested
# based on label free ESI-LC-MS data. Lines with "#" at the begining
# are comments and they will not be parsed by the program.
# Note: the following keywords are case
sensitive. Please specify the path
# of each resource below.
# the xtandem database search result
xtandemXML=E:\18mixExp\data\pep_4.2011_01_13_17_33_21.t.xml
# The file can be generated by the Trans-Proteomic
Pipeline (TPP)
prophetFile=E:\18mixExp\data\interact.pep.xml
# Save the xtandem search xml parsing result
tandemSaveName=E:\18mixExp\data\xtandem_identified_list.txt
# EvaluationResult obtained from the xtandem
search result
evaXtandemSaveName=E:\18mixExp\data\xtandemroc.txt
# the raw mzXML file
mzXMLFileName=E:\18mixExp\data\4.mzXML
# the MS1 spectra save dir
outputDir=E:\18mixExp\data\raw_ms1\
# the model data directory: Save the regression
model build by the program. If
# the regression parameter (e.g. lambda) changes, the model does not need to
rebuild
# and thus it saves time.
modelSaveDir=E:\18mixExp\data\model_data\
# the program will save the protein peptide
map here
ProteinPeptideMapSave=E:\18mixExp\data\pp_map.txt
# protein peptide map excel file contains
proteins. This file is generated for debugging.
# Use "ProteinPeptideLinker.setcsvstate("no")" to turn this
function off.
PPMapProteinCSV=E:\18mixExp\data\proteins.csv
# the protein peptide map excel file contains
pp_map. This file is generated for debugging.
# Use "ProteinPeptideLinker.setcsvstate("no")" to turn this
function off.
PPMapContentCSV=E:\18mixExp\data\pp_map.csv
# output the re-ranking result
rerankedSeqList=E:\18mixExp\data\re_ranked_seq.txt
# output the q-value vs num_of_hit curve
rerankedEvaSaveName=E:\18mixExp\data\18_mix_curve.txt
# ------------- parameters of the program
---------------
# define the decoy key word used in the decoy database construction. This is just
used for
# the performance evaluation purpose.
decoyKeyWord=decoy
# ------------- parameters used to build the
model ------
# regularized parameter, the maximal value could be
"max(abs(protein_basis'*target_data))"
# the ratio below should be within [0, 1]. Generally, this parameter can be set
around
# 0.05
regularizedParameterRatio=0.06
# data mz range
mzRangesLow=300
mzRangesHigh=1340
# when the following parameter is true and
there exists "model.mat" in the directory
# modelSaveDir, then the program will not try to generate a new model by
loading ms
# spectra, estimating isotopic distribution and preparing bases for the model.
This
# choice could be a way to speed up your program when you only have to change
# "scoreCombineWeight" and "regularizedParameter"
regenerateModel=false
# low resolution or not.
lowResolutionData=false
# enable log tranform on raw intensity.
enableLogTranform=true
# the number of isotopes considered. For high
resolution data, this could be 3 or 4;
# for low resolution data, this could be 1
numOfIsotopes=4
# mass error in Da. For high resolution data,
this could be 0.1Da
massError=0.1
# charges to be considered, each charge is separated
by ",". When you only want to consider
# charge state "1, 2, 3", then "chargeList=1,2,3". Note:
"chargeList=,1,2,3" and
# "chargeList=1,2,3," are invalid and will produce "NaN" in
the program. "," can only be
# placed between numbers.
chargeList=2,3
# -------------- parameters used to recompute
scores -------
# Typical values can be 0.8 ~ 0.99. The default value is 0.99 (please do not
use 1).
# Generally, you do not have to change this parameter.
sigValue=0.99
(3) Run MIRanker
a) Run "AnalyzeMS.rb" to extract the X!Tandem result, analyze the ProteinProphet result and prepare the input of MIRanker:
Extract MS1 data:
AnalyzeMS.rb --init_file initFile_18proteinMixture.txt --run_type ms
Analyze X!Tandem and ProteinProphet results:
AnalyzeMS.rb --init_file initFile_18proteinMixture.txt --run_type xtandem
Create Matrix L
AnalyzeMS.rb --init_file initFile_18proteinMixture.txt --run_type create_pp_map
b) Edit and run "pep_reranking.m".
Make sure that the following two lines are correctly assigned:
re_ranking_method = 'MIRanker';
init_file_name = 'E:\PeptideReranking\initFile_18proteinMixture.txt'; % specify the full path of the init file
Note:
(1) You can try other programs such as PPMRanker by using a different configuration file and changing the value of "re_ranking_method". If you encounter any problem in running the program, please send me an email: yorkey@ust.hk.
(2) In MIRanker, there is a parameter "lambda" (i.e. regularizedParameterRatio). If only this value is changed, you can specify "regenerateModel=false" to avoid regenerating the model. In this case, "pep_reranking.m" finishes quickly. If values such as "numOfIsotopes", "massError" and "chargeList" are changed, please specify "regenerateModel=true" to rebuild the model. Here, "regenerateModel" is introduced to make the parameter tuning process efficient.