Aug 12, 2022

Public workspaceImage Visualization and Proteoform Assignment of MALDI-MSI from LCMS Experimental Databases

  • 1Pacific Northwest National Laboratory
Icon indicating open access to content
QR code linking to this content
Protocol CitationDavid J Degnan, Kevin J Zemaitis, Dusan Velickovic, Mowei Zhou, Ljiljana.PasaTolic 2022. Image Visualization and Proteoform Assignment of MALDI-MSI from LCMS Experimental Databases. protocols.io https://dx.doi.org/10.17504/protocols.io.4r3l2ode3v1y/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: August 05, 2022
Last Modified: August 12, 2022
Protocol Integer ID: 68267
Funders Acknowledgement:
National Institutes of Health (NIH) Common Fund, Human Biomolecular Atlas Program (HuBMAP)
Grant ID: UG3CA256959-01
Abstract
Scope:

The protocol describes a workflow for processing mass spectrometry imaging (MSI) data of high resolution intact proteins/proteoforms by MALDI. It involves the visualization and image generation using commercial software and also peak annotation using open source code.

Expected Outcomes:

Tabular output of matched proteoforms, as well as a trelliscope display of overlaid isotopic distributions allowing for the annotation of high-resolution accurate mass distributions of proteoforms.
Before start
Both MALDI imaging and TDP by LCMS/MS needs to be completed and processed per other outlined protocols, packages need to be ready to use on desktop or laptop as well.
Data pre-processing
Data pre-processing
The .xml file from the MALDI source registering pixel coordinates is renamed to that of the .RAW file generated from the instrument, both files are then transferred to a separate imaging workstation after being backed up to an internal database.
These files are then imported into SCiLS Lab Pro (v.2021c) for preliminary visualization.
Binning parameters within other versions have been noted to alter isotopic distributions and over- or under-bin peaks over the broad mass range, automatic parameters within 2021c were found to be ideal in most cases, as the peak width within these analyses change over the broad range of mass-to-charge values.
For consistency, peak lists are imported into SCiLS Lab to export a .imzML for ingestion.
Proteoform image visualization
Proteoform image visualization
Proteoforms are visualized from the most abundant isotopologue, the monoisotopic mass from the centroid of this peak is used for determining the mass error of the annotated proteoform.
Proteoform assignment via ProteoMatch
Proteoform assignment via ProteoMatch
Next, the ProteoMatch tool is utilized to calculate and match isotoping profiles to spectra. Below is a general description of the pipeline:

  1. Required: Calculate molecular formulas and mass shifts with calculate_molform()
  2. Optional: Filter noisy peaks and the peak MZ range with filter_peaks()
  3. Required: Match reference isotope profiles to experimental data with match_proteoform_to_ms1(). Protein sequences without PTMs are accepted as well.
  4. Optional: Visualize results with plot_Ms1Match() or proteomatch_trelliscope().

Software
ProteoMatch
NAME
David Degnan
DEVELOPER
Note: This utilizes spatially resolved LCM-TDP obtained via LCMS/MS, input files for experimental databases are outlined in other protocols and need to be processed prior to completing this annotation.

A main pipeline function called run_proteomatch() can be used, and requires three files, which are all described below:

  • A proteoform .csv file (see section 4.1)
  • A .mzML file (see section 4.2)
  • A .xlsx settings file (see section 4.3)
Prepare a .csv proteoform file with a “Proteoform” and “Protein” (any string). Proteoform annotation generally follow the ProForma convention.
CITATION
LeDuc RD, Schwämmle V, Shortreed MR, Cesnik AJ, Solntsev SK, Shaw JB, Martin MJ, Vizcaino JA, Alpi E, Danis P, Kelleher NL, Smith LM, Ge Y, Agar JN, Chamot-Rooke J, Loo JA, Pasa-Tolic L, Tsybin YO (2018). ProForma: A Standard Proteoform Notation.. Journal of Proteome Research.

Post-translation modifications (PTMs) can be annotated either by name (UniMod definition) or mass shifts. An example proteoform annotation:

"M.(S)[Acetyl]GRGKGGKGLGKGGAKRHRK(VLRDNIQGITKPAIRRLAR)[28.0315]RGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG."

The periods are marking the starting and ending residues of the protoeform, which is used by TopPIC. The residues left to the first period and right to the second period (if any) are removed in the formula generation. Custom definition can be created by updating the backend glossary on github (ProteoMatch/inst/extdata/Unimod.csv) or by submitting an issue request on the github page.


Prepare a. mzML file by converting the .RAW with MSConvert and the `peak picking -1` flag enabled.
A .xlsx settings file will also need to be provided. For a template, see github (ProteoMatch/inst/extdata/Defaults.xlsx).
The last parameter in the run_proteomatch() function is to indicate the output directory. Note that the main pipeline function run_proteomatch() will execute only if the correct files are added.
As the pipeline runs, the following files will be generated: a .csv with molecular formulas, a .csv with filtered peaks, a .csv with matched peaks, and the trelliscope display of best matches (Pearson Correlation >= 0.7). This value can be modified in the settings .xlsx file row “CorrelationMinimum.” Further details about each of the functions can be explored in the R documentation or by exploring the ProteoMatch vignettes (using the Vignettes() function).
Citations
Step 4.1
LeDuc RD, Schwämmle V, Shortreed MR, Cesnik AJ, Solntsev SK, Shaw JB, Martin MJ, Vizcaino JA, Alpi E, Danis P, Kelleher NL, Smith LM, Ge Y, Agar JN, Chamot-Rooke J, Loo JA, Pasa-Tolic L, Tsybin YO. ProForma: A Standard Proteoform Notation.
https://doi.org/10.1021/acs.jproteome.7b00851