vignettes/ftmsRanalysis.Rmd
ftmsRanalysis.Rmd
Fourier-transform mass spectrometry (FT-MS) is a type of mass spectrometery for determining the mass-to-charge ratio (m/z) of ions based on the cyclotron frequency of the ions in a fixed magnetic field. Chemical composition can be determined for a portion of the observed peaks/mass-to-charge ratios. FT-MS instrument data can be interpreted as peak intensities for each observed peak. FT-MS analysis has been used to examine a wide range of complex mixtures, including soils, plants, aquatic samples, petroleum and various beverages.
The ftmsRanalysis
package was designed to help with various steps of processing FT-MS data, including:
An example dataset has been included with the ftmsRanalysis
package. This dataset is a subset of an experiment to assess differences in soil organic matter between multiple locations and crop types. The data were collected from two locations (M and W) for two crop flora (S and C). The data were analyzed with a 12T FTICR (Fourier-transform ion cyclotron resonance) mass spectrometer.
Data required for the ftmsRanalysis
package is comprised of three data tables:
The edata object is a data frame with one row per peak and one column per sample. It must have one column that is a unique ID (e.g. Mass).
library(ftmsRanalysis)
##
## Attaching package: 'ftmsRanalysis'
## The following object is masked from 'package:stats':
##
## heatmap
## 'data.frame': 24442 obs. of 21 variables:
## $ Mass : num 98.5 98.8 98.8 101.7 103.3 ...
## $ EM0011_sample: num 0 0 5524739 0 0 ...
## $ EM0013_sample: num 0 13070372 0 0 0 ...
## $ EM0015_sample: num 0.0 0.0 2.4e+07 0.0 0.0 ...
## $ EM0017_sample: num 0 16120890 0 0 0 ...
## $ EM0019_sample: num 0 21228496 0 0 0 ...
## $ EM0061_sample: num 1197974 0 30656158 0 0 ...
## $ EM0063_sample: num 0 12305626 0 0 0 ...
## $ EM0065_sample: num 0.0 1.1e+07 0.0 0.0 0.0 ...
## $ EM0067_sample: num 0 0 12664590 0 0 ...
## $ EM0069_sample: num 2535836 38329628 0 0 0 ...
## $ EW0111_sample: num 0 0 21416774 0 0 ...
## $ EW0113_sample: num 0 8070914 0 0 0 ...
## $ EW0115_sample: num 3636046 0 38608164 0 0 ...
## $ EW0117_sample: num 0 3965230 0 0 0 ...
## $ EW0119_sample: num 0 0 2439325 0 1153547 ...
## $ EW0161_sample: num 0 0 0 0 0 0 0 0 0 0 ...
## $ EW0163_sample: num 0 0 0 0 0 0 0 0 0 0 ...
## $ EW0165_sample: num 0 0 0 16443347 0 ...
## $ EW0167_sample: num 0 1598118 0 0 0 ...
## $ EW0169_sample: num 0 0 0 0 0 0 0 0 0 0 ...
The fdata object is a data frame with one row per sample with information about experimental conditions. It must have a column that matches the sample column names in edata.
## 'data.frame': 20 obs. of 4 variables:
## $ SampleID : Factor w/ 20 levels "EM0011_sample",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Location : Factor w/ 2 levels "M","W": 1 1 1 1 1 1 1 1 1 1 ...
## $ Block : int 1 2 3 4 5 1 2 3 4 5 ...
## $ Crop.Flora: Factor w/ 2 levels "C","S": 2 2 2 2 2 1 1 1 1 1 ...
The emeta object is a data frame with one row per peak and columns containing other meta data. Either a column giving the molecular formula or elemental count columns (currently the elements C, H, O, S, N, and P are supported) are required. It must have an ID column corresponding to the ID column in edata. If information about isotopic peaks is available and specified, these peaks are currently filtered from the data upon peakData object creation.
## 'data.frame': 24442 obs. of 10 variables:
## $ Mass : num 98.5 98.8 98.8 101.7 103.3 ...
## $ C : int 0 0 0 0 0 0 0 0 0 0 ...
## $ H : int 0 0 0 0 0 0 0 0 0 0 ...
## $ O : int 0 0 0 0 0 0 0 0 0 0 ...
## $ N : int 0 0 0 0 0 0 0 0 0 0 ...
## $ C13 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ S : int 0 0 0 0 0 0 0 0 0 0 ...
## $ P : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Error : num 0 0 0 0 0 0 0 0 0 0 ...
## $ NeutralMass: num 99.5 99.8 99.8 102.7 104.4 ...
peakObj <- as.peakData(ftms12T_edata, ftms12T_fdata, ftms12T_emeta, edata_cname="Mass", fdata_cname="SampleID", mass_cname="Mass", c_cname="C", h_cname="H", o_cname="O", n_cname="N", s_cname="S", p_cname="P", isotopic_cname = "C13", isotopic_notation = "1") peakObj
## peakData object
## # Peaks: 23060
## # Samples: 20
## Meta data columns: [Mass, C, H, O, N, C13, S, P, Error, NeutralMass, MolForm]
The as.peakData
function also allows for the following (optional) parameters:
The resulting peakData
object contains three elements, named e_data, f_data, and e_meta:
names(peakObj)
## [1] "e_data" "f_data" "e_meta"
During object construction, the molecular formula is calculated from the elemental columns (and elemental columns would be created in the case that molecular formulae were provided):
tail(peakObj$e_meta)
## Mass C H O N C13 S P Error NeutralMass MolForm
## 24437 897.1796269 0 0 0 0 0 0 0 0.0000000 898.1869 <NA>
## 24438 897.2209292 0 0 0 0 0 0 0 0.0000000 898.2282 <NA>
## 24439 897.3973977 36 69 22 1 0 0 1 0.2345417 898.4047 C36H69O22NP
## 24440 898.812526 0 0 0 0 0 0 0 0.0000000 899.8198 <NA>
## 24441 899.0458907 0 0 0 0 0 0 0 0.0000000 900.0532 <NA>
## 24442 899.3370941 0 0 0 0 0 0 0 0.0000000 900.3444 <NA>
There is a summary method:
summary(peakObj)
## Samples: 20
## Molecules: 23060
## Percent Missing: 81.739%
... and a default plot method:
plot(peakObj)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.