What is IsoForma?

IsoForma is a package for quantifying positional isomers (QPI) in MS2 spectra data. Currently, analysis of this type of data requires the use of several separate tools which is inconvenient and time-consuming. This goal of this software is to offer all the functionality needed for this analysis in a streamlined package.

Much of the back-end functionality is drawn from the pspecterlib package. More information about the backend package can be found here.

IsoForma was built to ingest two main types of data: 1) an MS file (XML-based or ThermoFisher raw) or 2) a list of peak_data objects that can be generated with pspecterlib. If an MS file is provided, automatic MS2 peak detection options are provided. Otherwise, the provided peak data is simply summed together.

Here are the general steps of the IsoForma algorithm and their respective functions:

  1. Select scan numbers: Either manually or with pull_scan_numbers()

  2. Sum peaks: sum_ms2_spectra()

  3. Match experimental and literature fragments for every proteoform: fragments_per_ptm()

  4. Sum isotopes and charge states per fragment per proteoform: sum_isotopes()

  5. Calculate an abundance matrix: abundance_matrix()

  6. Calculate proteoform relative proportions: calculate_proportions()

Steps 3-6 can be run all together with our main pipeline function.

We will first walk through an example with the Pasavento histone dataset and then the Brunner valine dataset.

Installation Instructions

How to install

devtools::install_github("EMSL-Computing/isoforma-lib")

To get started, read our vignette

You may need to install pspecterlib separately, though it should auto-install with the above command. If not, try the command below. Check that both packages install with library(isoforma) and library(pspecterlib). If one of the packages does not install,

devtools::install_github("EMSL-Computing/pspecterlib")

Usage (Windows)

  1. Download and install R: https://ftp.osuosl.org/pub/cran/
  2. Download, install, and open RStudio (Free version): https://www.rstudio.com/products/rstudio/download/
  3. If re-installing, remove any previous versions of pspecter remove.packages(“pspecterlib”)
  4. Clone/download IsoForma-paper repository

Pasavento Histone Dataset

1. Select scan numbers

In this dataset, the MS2 scan numbers of peak data that needs to be summed together is unknown. Here, use the pull_scan_numbers() function to automatically detect and suggest MS2 peaks.

# Load raw mzML data 
xml_data <- pspecterlib::get_scan_metadata(MSPath = system.file("extdata", "Example.mzML", package = "isoforma"))

# Pull scan numbers
Sequence <- "SGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRTVLKTFLENVIRDSVTYTEHARRKTVTAMDVVYALKRQGRTLYGFGG"
Modifications <- "Acetyl,(1^,5,8,12,16)[2];Methyl,(79)[1];Oxidation,(84)[1]"
Modified_Sequences <- pspecterlib::multiple_modifications(Sequence, Modifications, ReturnUnmodified = TRUE)
Scan_Numbers <- pull_scan_numbers(Sequence = Modified_Sequences[5], ScanMetadata = xml_data, RTStart = 100, RTEnd = 110)

head(Scan_Numbers)
## [1] 3932 3936 3951 3961

2. Sum Peaks

Now sum the peaks using the peak summing function, the scan_metadata object from pspecterlib, and the selected scan_numbers from pull_scan_numbers().

# Sum Peaks
Summed_Peaks <- sum_ms2_spectra(ScanMetadata = xml_data, ScanNumbers = Scan_Numbers)

head(Summed_Peaks)
##         M/Z Intensity Abundance
## 1: 156.6573  360.0577    1.5583
## 2: 159.1234  561.1546    2.4286
## 3: 160.3468  343.6376    1.4872
## 4: 160.3579  373.4652    1.6163
## 5: 161.9439  351.6964    1.5221
## 6: 162.9848  340.1909    1.4723

3-6: Main Pipeline Function

Since steps 3-6 are the same for either pre-selected scans or if pull_scan_numbers() is used, we do have an “isoforma_pipeline” function to run them all together. Here, we will use the main pipeline function. To see more details about each of these substeps, see the sections below “Brunner Valine Dataset.”

Here, a list of 5 objects are returned:

  1. A list of matched peak objects from pspecterlib showing all the identified fragments

  2. Summed fragment intensities from isoforma

  3. The abundance matrix where rows are ions and columns are isomers. Each element is the summed intensity.

  4. The calculated proportions with confidence intervals

  5. A plot of the calculated proportions

IsoForma_Example <- isoforma_pipeline(
     Sequences = Modified_Sequences,
     SummedSpectra = Summed_Peaks,
     PrecursorCharge = 16, 
     ActivationMethod = "ECD",
     IonGroup = "c",
     IsotopeAlgorithm = "isopat", # Rdisop is preferred, is faster, and is more accurate, but it tends to crash on Windows
     Message = T 
)

Brunner Valine Dataset

1. Select scan numbers

Here, peaks have been selected by a different software. To generate peak data objects, see ?pspecterlib::make_peak_data or ?pspecterlib::get_peak_data. Below, we generate a list of peak data objects:

# Make a list of pspecterlib peak_data objects
PeakDataList <- list(
  readRDS(system.file("extdata", "PeakData_1to1to1_1.RDS", package = "isoforma")),
  readRDS(system.file("extdata", "PeakData_1to1to1_2.RDS", package = "isoforma")),
  readRDS(system.file("extdata", "PeakData_1to1to1_3.RDS", package = "isoforma"))
)

head(PeakDataList[[1]])
##         M/Z Intensity Abundance
## 1: 151.5681  483.0363    0.0313
## 2: 151.5682  930.2599    0.0603
## 3: 151.5683 1144.0471    0.0742
## 4: 151.5684 1003.7305    0.0651
## 5: 151.5686  631.3395    0.0409
## 6: 151.5922  461.2453    0.0299

2. Sum peaks

Now use the list of peak data objects to generate a summed peak_data object.

# Sum selected peaks together 
PeaksSum <- sum_ms2_spectra(
  PeakDataList = PeakDataList,
  PPMRound = 5,
  MinimumAbundance = 0.01
)
head(PeaksSum)
##         M/Z Intensity Abundance
## 1: 150.2732  693.4443    0.0449
## 2: 150.2733 1062.9706    0.0689
## 3: 150.2734 1190.8636    0.0772
## 4: 150.2736  962.2302    0.0624
## 5: 150.2737  505.4271    0.0328
## 6: 150.9515  597.4518    0.0387

3. Match experimental and literature fragments for every proteoform

To generate all proteoforms to test, use the pspecterlib::multiple_modifications function. Then, pass that list of sequences to the fragments_per_ptm function. If the isotoping algorithm crashes, considering switching the to IsotopeAlgorithm = “isopat”. This function will return a list of matched_peak objects from pspecterlib.

# Generate a list of PTMs to test
MultipleMods <- pspecterlib::multiple_modifications(
  Sequence = "LQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG",
  Modification = "6.018427,(17,26,70)[1]",
  ReturnUnmodified = TRUE
)

# Calculate fragments per proteform
AllFragments <- fragments_per_ptm(
   Sequences = MultipleMods,
   SummedSpectra = PeaksSum,
   PrecursorCharge = 11, 
   ActivationMethod = "ETD", 
   Messages = FALSE
)

head(AllFragments[[2]])
##       PPM Error Ion Z Isotope      M/Z M/Z Experimental M/Z Tolerance
## 1: -0.141368299  c2 1       M 259.1765         259.1764   0.002591765
## 2: -7.802339134  c2 1     M+1 260.1851         260.1831   0.002601851
## 3:  0.237254416 z23 9       M 295.6129         295.6130   0.002956129
## 4:  6.125573907 c21 7       M 334.7684         334.7705   0.003347684
## 5: -0.008800326  c3 1       M 372.2605         372.2605   0.003722605
## 6: -7.215488468  c3 1     M+1 373.2692         373.2665   0.003732692
##    Isotopic Percentage Intensity Experimental Correlation Score Type
## 1:           100.00000            446879.9062                NA    c
## 2:            13.80402               870.3063                NA    c
## 3:            68.07372               683.7947                NA    z
## 4:            76.52361              1072.7388                NA    c
## 5:           100.00000            301661.3438                NA    c
## 6:            21.04789               818.8123                NA    c
##    General Type         Modifications Molecular Formula Position N Position
## 1:            c                              C11H21N3O4        2          2
## 2:            c                              C11H21N3O4        2          2
## 3:            z                          C116H197N37O35       23         54
## 4:            c 6.018427=6.018427@V17    C106H178N24O34       21         21
## 5:            c                              C17H32N4O5        3          3
## 6:            c                              C17H32N4O5        3          3
##    Residue                Sequence
## 1:      Q2                      LQ
## 2:      Q2                      LQ
## 3:     R54 RTLSDYNIQKESTLHLVLRLRGG
## 4:     D21   LQIFVKTLTGKTITLEVEPSD
## 5:      I3                     LQI
## 6:      I3                     LQI

4. Sum isotopes and charge states per fragment per proteoform

This function will return a table of summed intensities per fragment.

IsotopesSum <- sum_isotopes(IsoformaFragments = AllFragments)

head(IsotopesSum)
##    Ion Summed Intensity         Proteoform
## 1: c10       690158.971 UnmodifiedSequence
## 2: c11       268760.378 UnmodifiedSequence
## 3: c12         9067.441 UnmodifiedSequence
## 4: c13        40932.644 UnmodifiedSequence
## 5: c14       220293.070 UnmodifiedSequence
## 6: c15        66658.895 UnmodifiedSequence

5. Calculate an abundance matrix

This function will return an abundance matrix for a selected ion, where each row is a fragment and each column is a proteoform. The values are summed intensities.

# Select your ion group of choice when calculating the abundance matrix
AbunMat <- abundance_matrix(
  SummedIsotopes = IsotopesSum,
  IonGroup = "c"
)

head(AbunMat)
##    Ion 6.018427@V17 6.018427@V26 6.018427@V70
## 1:  c2     447750.2     447750.2     447750.2
## 2:  c3     302480.2     302480.2     302480.2
## 3:  c4     296315.5     296315.5     296315.5
## 4:  c5     213922.3     213922.3     213922.3
## 5:  c6     138634.0     138634.0     138634.0
## 6:  c7     370152.2     370152.2     370152.2

6. Calculate proteoform relative proportions

This function returns both a table and a plot.

Proportions <- calculate_proportions(AbundanceMatrix = AbunMat)
## Profiling...
## Profiling...
Proportions[[1]]
## # A tibble: 3 × 4
##   Modification Proportion LowerCI UpperCI
##   <fct>             <dbl>   <dbl>   <dbl>
## 1 6.018427@V17      0.260   0.197   0.338
## 2 6.018427@V26      0.312   0.274   0.348
## 3 6.018427@V70      0.429   0.392   0.466
Proportions[[2]]

Accessory Functions

annotated_spectrum_ptms_plot

Visualize multiple PTM fragment identifications over one plot in a large, interactive plotly display with annotated_spectrum_plots.

ptm_heatmap()

PPM errors per fragment and ppm combination can easily be visualized with this heatmap function.

ptm_heatmap(IsoformaFragments = AllFragments)

write_mgf_simple()

This simple MGF writer function can be used to generate MGF files of peak data for use with external tools.

Additional Notes

Modificiation Formats

Below are examples of different modification specifications and their results.

# Single modification of a single PTM examples 
multiple_modifications("TRICITIES", "Methyl,(3)[1]")
## [1] "TRI[Methyl]CITIES"
# Single modification of a single PTM examples 
multiple_modifications("TRICITIES", "1.00727,(3,5,7)[1]")
## [1] "TRI[1.00727]CITIES" "TRICI[1.00727]TIES" "TRICITI[1.00727]ES"
# Multiple modifications of a single PTM example
multiple_modifications("TRICITIES", "Methyl,(3,5,7)[2]") 
## [1] "TRI[Methyl]CI[Methyl]TIES" "TRI[Methyl]CITI[Methyl]ES"
## [3] "TRICI[Methyl]TI[Methyl]ES"
# Multiple modifications with a fixed position
multiple_modifications("TRICITIES", "Methyl,(3^,5,7)[2]") 
## [1] "TRI[Methyl]CI[Methyl]TIES" "TRI[Methyl]CITI[Methyl]ES"
# Multiple modifications with two fixed positions 
multiple_modifications("TRICITIES", "Methyl,(3^,5,7^)[2]")
## [1] "TRI[Methyl]CITI[Methyl]ES"
# Multiple modifications of a single PTM with any "X" residue
multiple_modifications("TRICITIES", "Methyl,(1,2,3,4,5,6,7,8,9)[2]")
##  [1] "T[Methyl]R[Methyl]ICITIES" "T[Methyl]RI[Methyl]CITIES"
##  [3] "T[Methyl]RIC[Methyl]ITIES" "T[Methyl]RICI[Methyl]TIES"
##  [5] "T[Methyl]RICIT[Methyl]IES" "T[Methyl]RICITI[Methyl]ES"
##  [7] "T[Methyl]RICITIE[Methyl]S" "T[Methyl]RICITIES[Methyl]"
##  [9] "TR[Methyl]I[Methyl]CITIES" "TR[Methyl]IC[Methyl]ITIES"
## [11] "TR[Methyl]ICI[Methyl]TIES" "TR[Methyl]ICIT[Methyl]IES"
## [13] "TR[Methyl]ICITI[Methyl]ES" "TR[Methyl]ICITIE[Methyl]S"
## [15] "TR[Methyl]ICITIES[Methyl]" "TRI[Methyl]C[Methyl]ITIES"
## [17] "TRI[Methyl]CI[Methyl]TIES" "TRI[Methyl]CIT[Methyl]IES"
## [19] "TRI[Methyl]CITI[Methyl]ES" "TRI[Methyl]CITIE[Methyl]S"
## [21] "TRI[Methyl]CITIES[Methyl]" "TRIC[Methyl]I[Methyl]TIES"
## [23] "TRIC[Methyl]IT[Methyl]IES" "TRIC[Methyl]ITI[Methyl]ES"
## [25] "TRIC[Methyl]ITIE[Methyl]S" "TRIC[Methyl]ITIES[Methyl]"
## [27] "TRICI[Methyl]T[Methyl]IES" "TRICI[Methyl]TI[Methyl]ES"
## [29] "TRICI[Methyl]TIE[Methyl]S" "TRICI[Methyl]TIES[Methyl]"
## [31] "TRICIT[Methyl]I[Methyl]ES" "TRICIT[Methyl]IE[Methyl]S"
## [33] "TRICIT[Methyl]IES[Methyl]" "TRICITI[Methyl]E[Methyl]S"
## [35] "TRICITI[Methyl]ES[Methyl]" "TRICITIE[Methyl]S[Methyl]"
# Multiple modifications with multiple PTMs examples and the base sequence returned
multiple_modifications("TRICITIES", "Methyl,(1)[1];Acetyl,(2,4,9)[1]", ReturnUnmodified = TRUE)
## [1] "TRICITIES"                 "T[Methyl]R[Acetyl]ICITIES"
## [3] "T[Methyl]RIC[Acetyl]ITIES" "T[Methyl]RICITIES[Acetyl]"
# Multiple modifications with multiple PTMs examples and the base sequence returned
multiple_modifications("TRICITIES", "Methyl,(1,2,3,4,5,6,7,8,9)[1];Acetyl,(2,4,9)[1]", ReturnUnmodified = TRUE)
##  [1] "TRICITIES"                 "T[Methyl]R[Acetyl]ICITIES"
##  [3] "TR[Acetyl]ICITIES"         "TR[Acetyl]I[Methyl]CITIES"
##  [5] "TR[Acetyl]IC[Methyl]ITIES" "TR[Acetyl]ICI[Methyl]TIES"
##  [7] "TR[Acetyl]ICIT[Methyl]IES" "TR[Acetyl]ICITI[Methyl]ES"
##  [9] "TR[Acetyl]ICITIE[Methyl]S" "TR[Acetyl]ICITIES[Methyl]"
## [11] "T[Methyl]RIC[Acetyl]ITIES" "TR[Methyl]IC[Acetyl]ITIES"
## [13] "TRI[Methyl]C[Acetyl]ITIES" "TRIC[Acetyl]ITIES"        
## [15] "TRIC[Acetyl]I[Methyl]TIES" "TRIC[Acetyl]IT[Methyl]IES"
## [17] "TRIC[Acetyl]ITI[Methyl]ES" "TRIC[Acetyl]ITIE[Methyl]S"
## [19] "TRIC[Acetyl]ITIES[Methyl]" "T[Methyl]RICITIES[Acetyl]"
## [21] "TR[Methyl]ICITIES[Acetyl]" "TRI[Methyl]CITIES[Acetyl]"
## [23] "TRIC[Methyl]ITIES[Acetyl]" "TRICI[Methyl]TIES[Acetyl]"
## [25] "TRICIT[Methyl]IES[Acetyl]" "TRICITI[Methyl]ES[Acetyl]"
## [27] "TRICITIE[Methyl]S[Acetyl]" "TRICITIES[Acetyl]"
# Multiple modifications with multiple PTMs examples and the base sequence returned
multiple_modifications("TRICITIES", "Methyl,(1^,2,3,4,5,6,7^,8,9)[3];1.00727,(2,4,9)[1]", ReturnUnmodified = TRUE)
##  [1] "TRICITIES"                                 
##  [2] "T[Methyl]R[1.00727]ICITI[Methyl]ES"        
##  [3] "T[Methyl]R[1.00727]I[Methyl]CITI[Methyl]ES"
##  [4] "T[Methyl]R[1.00727]IC[Methyl]ITI[Methyl]ES"
##  [5] "T[Methyl]R[1.00727]ICI[Methyl]TI[Methyl]ES"
##  [6] "T[Methyl]R[1.00727]ICIT[Methyl]I[Methyl]ES"
##  [7] "T[Methyl]R[1.00727]ICITI[Methyl]E[Methyl]S"
##  [8] "T[Methyl]R[1.00727]ICITI[Methyl]ES[Methyl]"
##  [9] "T[Methyl]R[Methyl]IC[1.00727]ITI[Methyl]ES"
## [10] "T[Methyl]RI[Methyl]C[1.00727]ITI[Methyl]ES"
## [11] "T[Methyl]RIC[1.00727]ITI[Methyl]ES"        
## [12] "T[Methyl]RIC[1.00727]I[Methyl]TI[Methyl]ES"
## [13] "T[Methyl]RIC[1.00727]IT[Methyl]I[Methyl]ES"
## [14] "T[Methyl]RIC[1.00727]ITI[Methyl]E[Methyl]S"
## [15] "T[Methyl]RIC[1.00727]ITI[Methyl]ES[Methyl]"
## [16] "T[Methyl]R[Methyl]ICITI[Methyl]ES[1.00727]"
## [17] "T[Methyl]RI[Methyl]CITI[Methyl]ES[1.00727]"
## [18] "T[Methyl]RIC[Methyl]ITI[Methyl]ES[1.00727]"
## [19] "T[Methyl]RICI[Methyl]TI[Methyl]ES[1.00727]"
## [20] "T[Methyl]RICIT[Methyl]I[Methyl]ES[1.00727]"
## [21] "T[Methyl]RICITI[Methyl]E[Methyl]S[1.00727]"
## [22] "T[Methyl]RICITI[Methyl]ES[1.00727]"
# Combine single and multiple modification examples
multiple_modifications("TRICITIES", "Methyl,(1,2,3,4,5)[1,2]")
##  [1] "T[Methyl]RICITIES"         "TR[Methyl]ICITIES"        
##  [3] "TRI[Methyl]CITIES"         "TRIC[Methyl]ITIES"        
##  [5] "TRICI[Methyl]TIES"         "T[Methyl]R[Methyl]ICITIES"
##  [7] "T[Methyl]RI[Methyl]CITIES" "T[Methyl]RIC[Methyl]ITIES"
##  [9] "T[Methyl]RICI[Methyl]TIES" "TR[Methyl]I[Methyl]CITIES"
## [11] "TR[Methyl]IC[Methyl]ITIES" "TR[Methyl]ICI[Methyl]TIES"
## [13] "TRI[Methyl]C[Methyl]ITIES" "TRI[Methyl]CI[Methyl]TIES"
## [15] "TRIC[Methyl]I[Methyl]TIES"