Skip to content

Auto.MS.MS.match.instructions

Will Edmands edited this page Jan 6, 2014 · 3 revisions

Auto.MS.MS.match.R

Automated biomarker – experimental MS/MS fragmentation spectra matching function based upon retention time and mass accuracy tolerance windows. The function also annotates possible common neutral losses and fragments based upon high mass accuracy MS/MS fragmentation data and outputs figures and results tables. The function was designed primarily for data-dependent MS/MS fragmentation method data although would in principle work equally for targeted MS/MS (.mzXML) data files.

> Auto.MS.MS.match(MSfeatures="Features_Above_threshold.csv",  mode="negative", wd="D:\\R_data_processing\\STUDY NAME\\",  mzXML.dir="D:\\R_data_processing\\STUDY NAME\\MS_MS_mzXML\\", TICfilter=5000,Precursor.ppm=10,Frag.ppm=20,ret=5, Parent.tol=0.1, Fragment.tol=0.5) 

Arguments: -

  1. MSfeatures – Significant features identified by the previous Auto.MV.Regress.R function or the output of the DBAnnotate.R function, however it is flexible enough to work with other data files (.csv) originating from XCMS. (default = “Features_Above_threshold.csv ").
  2. mode – mass spectrometer ionisation mode either positive or negative (NB. Must be lower case).
  3. wd - the address of your parent study directory (“STUDY NAME”). The function will automatically go to the previous Auto.MV.Regress.results folder so the user does not need to state this explicitly, therefore do not move the result from this directory
  4. mzXML.dir – the directory address of the location of the MS/MS .mzXML files with which to match against potential biomarkers identified by the Auto.MV.Regress.R function. If the standardised subdirectory layout has been followed then the address would be in the text string form “D:\R_data_processing\STUDY NAME\MS_MS_mzXML\".
  5. TICfilter – total ion current filter, this filter is applied to ensure that high quality MS/MS spectra with a total ion current above the value are used to match to the unknown biomarkers (default = 5000).
  6. Precursor.ppm – mass accuracy in parts per million with which to match potential biomarkers to precursor ions of MS/MS data (default =10)
  7. Frag.ppm - mass accuracy in parts per million with which to match fragments and neutral losses in MS/MS data matches to common fragment masses, adducts and conjugates. The Frag.ppm argument should be set higher as with data-dependent MS/MS algorithms there can be a loss of mass accuracy compared to MS mode (default =20).
  8. ret – retention time window ± in seconds with which to match potential biomarker mass spectral features to MS/MS data (default = ±2 seconds).
  9. Parent.tol – delta mass accuracy window for parent/precursor ion for matching of unknown fragmentation spectra to the Human Metabolome Database (HMDB) experimental MS/MS database. (default = ±0.1 m/z).
  10. Fragment.tol - delta mass accuracy window for top six fragment ions (ranked by relative intensity) for matching of unknown fragmentation spectra to the Human Metabolome Database (HMDB) experimental MS/MS database. (default = ±0.5 m/z). A broad tolerance is necessary as experimental MS/MS spectra is mainly acquired using nominal mass with lower mass accuracy.

USAGE INSTRUCTIONS:

Automated unknown matching and fragment annotation

This part of the MetMSline pipeline concerns the substantial bottleneck of metabolite identification. The Auto.MS.MS.match function was created to allow rapid matching of potential biomarkers identified within the MS experimental dataset to fragmentation spectra to facilitate identification automatically.

The function works by taking the output of the functions Auto.MV.Regress or DBAnnotate (or feasibly any XCMS output file/subset of features of interest) and matching these potential biomarker mass spectral features against experimental MS/MS spectra. We found that data-dependent MS/MS methods proved an excellent starting point for unknown annotation and resulted in broad coverage of unknown features. The function, cycles through each MS/MS file in turn in order of acquisition and matches MS/MS spectra above a user defined intensity threshold (e.g. TICfilter=5000) to potential biomarkers based on mass accuracy (Parent.tol=±10 ppm) and retention time (ret = ±5 seconds).

(NB. The run order of each MS/MS file should be included at the start of each file name for clarity and for the function to operate properly)

To aid user interpretation, when a match is made between a potential biomarker and MS/MS spectrum the function takes the 6 highest relative intensity fragments and attempts to annotate the MS/MS spectra against a list of common masses losses (n=66) considering fragment masses, precursor to fragment differences and also inter-fragment mass losses. The results of this matching process are then saved in a results table (.csv) in the same file directory containing the MS/MS .mzXML files.

Complementary to this to aid visual analysis of the fragmentation patterns figures are automatically saved for each MS/MS spectrum match in the MS/MS mzXML file data directory and a hyperlink in the results table provided with which to locate the spectrum.

The figures are annotated (an example is shown below) with the top 6 fragments indicated with a red diamond and the precursor ion indicated with a blue diamond. The top 6 intensity fragments of each ionisation energy are furthermore annotated with 3 differentially coloured numbers, with black representing the fragment mass, blue representing the precursor ion to fragment mass difference and red representing the mass difference between the fragment and the next lowest m/z fragment (of the top 6) or the inter-fragment mass difference. In the case that a mass loss or fragment is found to correspond to a common mass loss the peak is annotated with a potential identity. All of this fragmentation data for the top 6 fragments of each matched spectrum is also incorporated in the results table for inspection by the experimenter (MS_MS_Matched_sign_features.csv), particularly if the figure presentation is crowded or unclear. Also incorporated in the results table are columns corresponding to the presence or absence with the MS/MS spectra of mass losses associated with common Phase II conjugates found in human biofluids, glucuronides and sulfates for rapid confirmation of conjugates however this could well be expanded to even more Phase II conjugates and modifications of interest in future.

We found this automated annotation method efficient and helpful for rapid visual interpretation and tentative identity confirmation. Also included is a hyperlink within the results table to the experimental MS/MS spectra database of HMDB to aid identification particularly if there is no known biologically plausible annotation of a potentially novel biomarker.

When Auto.MS.MS.match nears completion the function also includes a dummy matrix within the input MSfeatures table supplied by the user indicating which unknown biomarker features have been matched to MS/MS spectra and which have not. The function also returns a further results table of all potential biomarker features which have not been matched by Auto.MS.MS.match.R (unmatched_sign_features.csv) within the MS/MS mzXML file containing directory.

MS MS example

An example of Auto.MS.MS.match.R fragmentation spectra output figures. At each fixed collision energy (in this example 10eV, 20eV and 40eV) the precursor ion is indicated with a blue diamond and the top 6 (by relative intensity) fragments are indicated with a red diamond. The precursor to fragment mass losses (in blue), inter-fragment mass losses (in red) and fragment masses (in black) are labelled on the figure and when a common mass shift from a list of 66 is identified based on high mass accuracy it is labelled by the function. Where visualisation is difficult or images are crowded the annotation and mass differences are saved within the results table for reference by the experimenter.


Results are saved as .csv files and figures within the MS/MS mzXML file directory:-

  1. “MS_MS_Matched_sign_features.csv” all potential biomarkers with MS/MS spectra matches within user defined parameters are recorded in the results table along with the fragmentation data for each spectra, a hyperlink to both experimental spectra on HMDB and to the corresponding MS/MS fragmentation spectra figure produced by Auto.MS.MS.match.R. This table can then be either further submitted to DBAnnotate for monoisotopic mass based annotation or visually examined to provide identification clues for novel biomarkers (an example table is shown below).

Example MS MS match results

  1. “unmatched_sign_matched.csv” remaining potential biomarker features from the MSfeatures input file are saved in a separate results table, the absence of matches for these mass spectral features is usually due to insufficient ion statististics/concentration for MS/MS fragmentation.

  2. “MS_MS_spectrum_match.png” for each potential biomarker with a MS/MS spectrum match an annotated MS/MS spectrum is saved in the mzXML file containing directory for visual examination. The file names are in the format (“MS/MS file name”“MS/MS scan number”“XCMS EIC no.”“XCMS feature name”“ionisation energy”.png, e.g. 1_Auto_MS_MS_QC.Ur_NEG_02_08_13.mzXML_2866_XCMS_EIC_3113_M261T304_eV_20.png).