Introduction

FT-MS Analysis

Fourier-transform mass spectrometry (FT-MS) is a type of mass spectrometery for determining the mass-to-charge ratio (m/z) of ions based on the cyclotron frequency of the ions in a fixed magnetic field. Chemical composition can be determined for a portion of the observed peaks/mass-to-charge ratios. FT-MS instrument data can be interpreted as peak intensities for each observed peak. FT-MS analysis has been used to examine a wide range of complex mixtures, including soils, plants, aquatic samples, petroleum and various beverages.

The ftmsRanalysis package was designed to help with various steps of processing FT-MS data, including:

  • data formatting and manipulation
  • reproducible analysis pipeline
  • filtering data based on various properties
  • calculating meta information for each peak (e.g. nominal oxidation state of Carbon)
  • data visualization and summary
    • one sample
    • multiple samples
    • group comparisons

Example data

An example dataset has been included with the ftmsRanalysis package. This dataset is a subset of an experiment to assess differences in soil organic matter between multiple locations and crop types. The data were collected from two locations (M and W) for two crop flora (S and C). The data were analyzed with a 12T FTICR (Fourier-transform ion cyclotron resonance) mass spectrometer.

Data loading

Experimental data

Data required for the ftmsRanalysis package is comprised of three data tables:

  • Expression Data - observed data for each peak (rows) and sample (columns)
    • values of each cell represent the peak intensity observed
  • Sample Data - data capturing relevant experimental factors (columns) for each sample (rows) * e.g. samples and their sampling locations, treatment applied, etc.
  • Molecular Identification Data - other characteristics or quantified values (columns) for each peak (rows)
    • e.g. molecular formulae

e_data (Expression Data)

The edata object is a data frame with one row per peak and one column per sample. It must have one column that is a unique ID (e.g. Mass).

library(ftmsRanalysis)
## 
## Attaching package: 'ftmsRanalysis'
## The following object is masked from 'package:stats':
## 
##     heatmap
data("ftms12T_edata")
str(ftms12T_edata)
## 'data.frame':    24442 obs. of  21 variables:
##  $ Mass         : num  98.5 98.8 98.8 101.7 103.3 ...
##  $ EM0011_sample: num  0 0 5524739 0 0 ...
##  $ EM0013_sample: num  0 13070372 0 0 0 ...
##  $ EM0015_sample: num  0.0 0.0 2.4e+07 0.0 0.0 ...
##  $ EM0017_sample: num  0 16120890 0 0 0 ...
##  $ EM0019_sample: num  0 21228496 0 0 0 ...
##  $ EM0061_sample: num  1197974 0 30656158 0 0 ...
##  $ EM0063_sample: num  0 12305626 0 0 0 ...
##  $ EM0065_sample: num  0.0 1.1e+07 0.0 0.0 0.0 ...
##  $ EM0067_sample: num  0 0 12664590 0 0 ...
##  $ EM0069_sample: num  2535836 38329628 0 0 0 ...
##  $ EW0111_sample: num  0 0 21416774 0 0 ...
##  $ EW0113_sample: num  0 8070914 0 0 0 ...
##  $ EW0115_sample: num  3636046 0 38608164 0 0 ...
##  $ EW0117_sample: num  0 3965230 0 0 0 ...
##  $ EW0119_sample: num  0 0 2439325 0 1153547 ...
##  $ EW0161_sample: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EW0163_sample: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EW0165_sample: num  0 0 0 16443347 0 ...
##  $ EW0167_sample: num  0 1598118 0 0 0 ...
##  $ EW0169_sample: num  0 0 0 0 0 0 0 0 0 0 ...

f_data (Sample Data)

The fdata object is a data frame with one row per sample with information about experimental conditions. It must have a column that matches the sample column names in edata.

data("ftms12T_fdata")
str(ftms12T_fdata)
## 'data.frame':    20 obs. of  4 variables:
##  $ SampleID  : Factor w/ 20 levels "EM0011_sample",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Location  : Factor w/ 2 levels "M","W": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Block     : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ Crop.Flora: Factor w/ 2 levels "C","S": 2 2 2 2 2 1 1 1 1 1 ...

e_meta (Molecular Identification Data)

The emeta object is a data frame with one row per peak and columns containing other meta data. Either a column giving the molecular formula or elemental count columns (currently the elements C, H, O, S, N, and P are supported) are required. It must have an ID column corresponding to the ID column in edata. If information about isotopic peaks is available and specified, these peaks are currently filtered from the data upon peakData object creation.

data("ftms12T_emeta")
str(ftms12T_emeta)
## 'data.frame':    24442 obs. of  10 variables:
##  $ Mass       : num  98.5 98.8 98.8 101.7 103.3 ...
##  $ C          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ H          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ O          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ N          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ C13        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ S          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ P          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Error      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ NeutralMass: num  99.5 99.8 99.8 102.7 104.4 ...

Constructing a peakData object

peakObj <- as.peakData(ftms12T_edata, ftms12T_fdata, ftms12T_emeta,
                       edata_cname="Mass", fdata_cname="SampleID",
                       mass_cname="Mass", c_cname="C", h_cname="H",
                       o_cname="O", n_cname="N", s_cname="S",
                       p_cname="P", isotopic_cname = "C13",
                       isotopic_notation = "1")
peakObj
## peakData object
## # Peaks: 23060
## # Samples: 20
## Meta data columns: [Mass, C, H, O, N, C13, S, P, Error, NeutralMass, MolForm]

The as.peakData function also allows for the following (optional) parameters:

  • data_scale - assumed to be 'abundance' or peak intensity. Other options include: log2, log10, log, presence/absence (0/1)
  • instrument_type - assumed to be 12T/15T. The option is 21T for which data is displayed differently in visualizations due to the high resolution of the data
  • extraction_cname - name of column in f_data specifying extraction (e.g. water)
  • isotopic_cname - name of column in e_meta which indicates if a peak is isotopic
  • isotopic_notation - character string in isotopic_cname which indicates a peak is isotopic. Isotopes are currently filtered out of the data

The resulting peakData object contains three elements, named e_data, f_data, and e_meta:

names(peakObj)
## [1] "e_data" "f_data" "e_meta"

During object construction, the molecular formula is calculated from the elemental columns (and elemental columns would be created in the case that molecular formulae were provided):

tail(peakObj$e_meta)
##              Mass  C  H  O N C13 S P     Error NeutralMass     MolForm
## 24437 897.1796269  0  0  0 0   0 0 0 0.0000000    898.1869        <NA>
## 24438 897.2209292  0  0  0 0   0 0 0 0.0000000    898.2282        <NA>
## 24439 897.3973977 36 69 22 1   0 0 1 0.2345417    898.4047 C36H69O22NP
## 24440  898.812526  0  0  0 0   0 0 0 0.0000000    899.8198        <NA>
## 24441 899.0458907  0  0  0 0   0 0 0 0.0000000    900.0532        <NA>
## 24442 899.3370941  0  0  0 0   0 0 0 0.0000000    900.3444        <NA>

There is a summary method:

summary(peakObj)
## Samples: 20
## Molecules: 23060
## Percent Missing: 81.739%

... and a default plot method:

plot(peakObj)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.