Skip to content

CSV result

bitJoy edited this page Jan 30, 2019 · 6 revisions

A typical reports folder contains following CSV files:

All files with prefix database_name_search_date, for example the uniprot-ecoli-20171023_2017.12.22 means the data is searched against the uniprot-ecoli-20171023 database, and the search date is 2017.12.22.

There are mainly four types of result files:

  • uniprot-ecoli-20171023_2017.12.22.csv with the shortest file name, it contains all unfiltered PSMs, each line with one PSM, it maybe cross-linked, loop-linked, mono-linked, or regular PSM.
  • uniprot-ecoli-20171023_2017.12.22.filtered_X_Y.csv contains filtered results for different peptide types (X) at different level (Y). X can be cross-linked, loop-linked, mono-linked, or regular; Y can be spectra, peptides, or sites.
  • uniprot-ecoli-20171023_2017.12.22.precursor_error_distribution.csv and uniprot-ecoli-20171023_2017.12.22.filtered_precursor_error_distribution.csv contain precursor errors from unfiltered and filtered PSMs respectively. They are visualized on the web page result, so they can be skipped when reading this page.
  • uniprot-ecoli-20171023_2017.12.22.summary.txt contains summary information about the search, such as the number of identified PSMs, the search time, etc.

uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv contains all cross-linked PSMs filtered by TDA-FDR and without decoy results, one PSM per line. uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_peptides.csv and uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_sites.csv are directly inferred from the uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv.

There are 21 columns in uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv:

  1. Order: the order of PSMs, start from 1.
  2. Title: the title of this spectrum. If RAW file is used, the scheme of title is RAWName.Scan.Scan.Charge.pParseID.dta . For example RD_pH_8point3_step2.7566.7566.3.0.dta means the MS2 scan 7566 from RAW RD_pH_8point3_step2, the charge is 3 and the pParseID is 0. pParseID is the order of precursor ID extracted from MS1 by pParse, the lower the higher credibility, 0 is the best. For more details about pParse, please see pParse.
  3. Charge: the charge of this spectrum.
  4. Precursor_Mass: the experimental [MH+] of precursor.
  5. Peptide: the peptide sequence of identification. AKLESLVEDLVNR(2)-HMNIKVTR(5) means peptide AKLESLVEDLVNR cross-link with HMNIKVTR in site 2 and 5 respectively. For mono-linked and loop-linked peptides, there are one or two cross-linked sites on one peptide.
  6. Peptide_Type: the peptide type of identification, it can be Cross-Linked, Loop-Linked, Mono-Linked, or Regular/Common.
  7. Linker: the cross-linker name identified. For regular results, it is null.
  8. Peptide_Mass: the theoretical [MH+] of peptide.
  9. Modifications: the identified modifications on this peptide. For example, Carbamidomethyl[C](6) means Carbamidomethyl happens on 6th site, which is a Cysteine. If more than one modification, they are splitted by semicolon. null means no modifications.
  10. Evalue: the E-value for the entire peptide(-pair), the smaller the more confident.
  11. Score: the SVM score of this peptide, the smaller the more confident. It is the prime measure for FDR estimation.
  12. Precursor_Mass_Error(Da): precursor mass error in Da.
  13. Precursor_Mass_Error(ppm): precursor mass error in ppm.
  14. Proteins: inferred proteins from this peptide. For example, sp|P0A6Y8|DNAK_ECOLI (304)-sp|P0A6Y8|DNAK_ECOLI (299)/ means sp|P0A6Y8|DNAK_ECOLI cross-link with sp|P0A6Y8|DNAK_ECOLI in site 304 and 299 respectively. If more than one protein pair is inferred, they are splitted by slash.
  15. Protein_Type: the protein type of this identification. Whether it is a Intra-protein or Inter-protein cross-link. For mono-linked, loop-linked, and regular results, it is None.
  16. FileID: Which RAW file was this PSM identified from? Start from 1. The ID of one RAW file is decided by the order when added. The map of RAW file and FileID is shown in the parameter file.
  17. LabelID: the ID of labeling in quantitation. Start from 1. The map of labeling and LabelID is shown in the parameter file.
  18. Alpha_Matched: the number of matched fragment ion for alpha peptide.
  19. Beta_Matched: the number of matched fragment ion for beta peptide. Suppose Alpha_Num and Beta_Num mean the number of peaks matched to alpha peptide and beta peptide respectively. But some peaks may match both alpha and beta peptide, suppose there are Share_Num shared peaks. Then, the final Alpha_Matched=Alpha_Num-0.5*Share_Num, Beta_Matched=Beta_Num-0.5*Share_Num. As a result, 1.5 or 0.5 may appear.
  20. Alpha_Evalue: the E-value for alpha peptide only, the smaller the more confident.
  21. Beta_Evalue: the E-value for beta peptide only, the smaller the more confident.

pLink2 won't calculate three E-values (Evalue, Alpha_Evalue, and Beta_Evalue) by default, in this case, all E-values will be 1. If the Compute E-value checkbox in Identification panel is selected, pLink2 will calculate three E-values only for PSMs that pass the FDR threshold. For E-value, the smaller the more confident, it is similar to the score in pLink1.

The columns in other 2 levels (peptides, sites) have the same meaning as in spectra level described above.

From the experience of pLink1, PSM with E-value less than 1E-2 or 1E-3 is good. pLink2 uses SVM scores to estimate FDR, as SVM scores are flexible for different datasets, so there is no such a threshold for SVM scores. The Spectrum_Number >=2 or 3 might be a good indicator for a confident cross-linked site. The Spectrum_Number of one cross-linked site means how many PSMs supports the cross-linked site. It can be found in the *.filtered_cross-linked_sites.csv file.

As the unfiltered CSV contains unfiltered PSMs, it contains some additional columns:

  • Peptide_Type: the same as the Peptide_Type in spectra level described above, but with 0 for Regular/Common, 1 for Mono-Linked, 2 for Loop-Linked, and 3 for Cross-Linked.
  • Refined_Score: the refined score calculated by KSDP algorithm.
  • SVM_Score: the same as the Score in spectra level described above.
  • Target_Decoy: the identification is target or decoy. 0 for Decoy-Decoy, 1 for Target-Decoy (or Decoy-Target), and 2 for Target-Target.
  • Q-value: the smoothed FDR value.
  • Protein_Type: the same as the Protein_Type in spectra level described above, but with 0 for Regular/Common, 1 for Intra-protein, and 2 for Inter-protein.

Getting Started

  • Hardware requirement
  • Software requirement
  • pLink2 activation
  • Quick start

Result report

Advancement

Information

Clone this wiki locally