Introduction

This vignette will show examples of how to develop interactive visualizations of FT-MS data with the ftmsRanalysis and Trelliscope packages.

Trelliscope is a Shiny-based visualization package that allows a developer to generate multiple plots of subsets of a dataset (e.g. a plot for each sample or for each molecule) and for a user to sort and filter plots to find items of interest. More information about Trelliscope may be found here: http://deltarho.org/.

Setup

First, we need to install the necessary supporting packages: datadr and Trelliscope.

library(devtools)
install.packages("datadr")
install_github("delta-rho/trelliscope")

Samples

For the first example. we'll construct VanKrevelen and Kendrick plots for each sample. The first step is to construct an object (a ddo or distributed data object) where each subset contains the data for one sample.

library(ftmsRanalysis)
library(trelliscope)

data("exampleProcessedPeakData")

bySample <- divideBySample(exampleProcessedPeakData)
bySample
#> 
#> Distributed data object backed by 'kvMemory' connection
#> 
#>  attribute      | value
#> ----------------+----------------------------------------------------------------
#>  size (stored)  | 80.17 MB
#>  size (object)  | 80.17 MB
#>  # subsets      | 20
#> 
#> * Other attributes: getKeys()
#> * Missing attributes: splitSizeDistn

Each element of bySample is a key/value pair. They key names correspond to the sample names, and may be used to index bySample to pull out a single element.

getKeys(bySample)[1:5]
#> [[1]]
#> [1] "SampleID=EM0011_sample"
#> 
#> [[2]]
#> [1] "SampleID=EM0013_sample"
#> 
#> [[3]]
#> [1] "SampleID=EM0015_sample"
#> 
#> [[4]]
#> [1] "SampleID=EM0017_sample"
#> 
#> [[5]]
#> [1] "SampleID=EM0019_sample"
bySample[["SampleID=EM0011_sample"]]
#> $key
#> [1] "SampleID=EM0011_sample"
#> 
#> $value
#> peakData object
#> # Peaks: 8427
#> # Samples: 1
#> Meta data columns: [Mass, C, H, O, N, C13, S, P, Error, NeutralMass, MolForm, AI, AI_Mod, DBE, DBE_O, DBE_AI, GFE, kmass, kdefect, NOSC, OtoC_ratio, HtoC_ratio, NtoC_ratio, PtoC_ratio, NtoP_ratio]
#> Group info: main effects=[Location, Crop.Flora]
#> A mass filter was applied to the data, removing Masses that had a mass less than 200 or a mass greater than 900. A total of 3733 Masses were filtered out of the dataset by this filter.
#> A molecule filter was applied to the data, removing Masses that were present in fewer than 2 samples. A total of 10900 Masses were filtered out of the dataset by this filter.

Trelliscope relies on the user defining a panel function and a cognostics function that are applied to each subset of data. A panel function is simply a function that takes a data subset and constructs a plot (or panel) from it. Panel functions may construct plots in any plotting package used in R, including base R graphics, ggplot2, and plots that extend htmlwidgets such as plotly. Most of the plotting methods in ftmsRanalysis produce plotly plots.

The ftmsRanalysis package provides a wrapper function (panelFunctionGenerator) which can be used with the packages plotting methods to make panel functions for Trelliscope.

A cognostics function is a function that calculates summary statistics on each subset of data. These statistics are then provided in the user interface for sorting and filtering. Example cognostics could include data quantiles, related meta-data or even links to external web resources if desired.

There are a few cognostics functions in ftmsRanalysis designed to provide default cognostics relevant to Van Krevelen plots, Kendrick plots and density plots. These are called, respectively, vanKrevelenCognostics, kendrickCognostics and densityCognostics. Each of these functions is designed to accept parameters similar to their respective plotting functions (e.g. Van Krevelen boundary set, or which variable is used for density plots), and they each return a function which may be passed to Trelliscope. This function will be applied to each data subset individually. Cognostics function examples are shown below.

In order to create a Trelliscope display, we need to define an output directory to store it. (The output will be a Shiny app, and may be then transferred to a Shiny server if desired.) For this vignette we'll just create a directory under R's temporary directory

vdbDir <- vdbConn(file.path(tempdir(), "trelliscope_vignette"), autoYes = TRUE)

To produce a Van Krevelen plot of each sample, construct a panel function using panelFunctionGenerator. The output of panelFunctionGenerator is a function that will produce a plot when applied to a single value from bySample's list of key-value pairs:

panelFn1 <- panelFunctionGenerator("vanKrevelenPlot", vkBoundarySet="bs1")
panelFn1(bySample[[1]]$value)

To apply the panel function to each sample and generate a Trelliscope display, use the makeDisplay command.

makeDisplay(bySample,
            panelFn=panelFn1,
            cogFn=vanKrevelenCognostics(vkBoundarySet="bs1"),
            name = "Van_Krevelen_plots_for_each_sample",
            group = "Sample")

Use the view() command to open the Trelliscope app in a browser and browse through the plots. Important: when returning to the R console after viewing a Trelliscope app, press Ctrl+C or Esc to return focus to the console.

You may have noticed in the call to panelFunctionGenerator we provided a parameter (vkBoundarySet) that is a parameter to the vanKrevelenPlot function. This is how additional parameters beyond the ftmsData object may be provided. For example, we could choose to color the points by a column giving a molecular property such as NOSC. Trelliscope is designed to allow multiple displays (or sets of plots) in one session.

In the makeDisplay call above, also note the use of vanKrevelenCognostics with the same vkBoundarySet parameter used above to define panelFn1.

panelFn2 <- panelFunctionGenerator("vanKrevelenPlot", colorCName="NOSC", vkBoundarySet="bs2", showVKBounds=TRUE)

makeDisplay(bySample,
            panelFn=panelFn2,
            cogFn=vanKrevelenCognostics(vkBoundarySet="bs2"),
            name = "Van_Krevelen_plots_colored_by_NOSC",
            group = "Sample")
view()

Next we will construct a Kendrick plot for each sample. We can also augment the cognostics function wtih additional values if desired. For example, kendrickCognostics already calculates the mean of the Kendrick mass and Kendrick defect but we could add the medians of both fields.

panelFn3 <- panelFunctionGenerator("kendrickPlot")

customCogFn <- function(vkBoundarySet="bs1", uniquenessColName=NA) {
  defKendrick <- kendrickCognostics(vkBoundarySet, uniquenessColName)
  fn <- function(ftmsObj) {
    cogs <- defKendrick(ftmsObj)

    sample_colnames <- as.character(ftmsObj$f_data[, getFDataColName(ftmsObj)])
    sample_colnames <- sample_colnames[sample_colnames %in% colnames(ftmsObj$e_data)]
    presInd <- ftmsRanalysis:::n_present(ftmsObj$e_data[, sample_colnames],
                                         ftmsRanalysis:::getDataScale(ftmsObj)) > 0

    massColname <- ftmsRanalysis:::getKendrickMassColName(ftmsObj)
    defectColname <- ftmsRanalysis:::getKendrickDefectColName(ftmsObj)

    cogs <- c(cogs, list(
      median_kendrick_mass = trelliscope::cog(val=median(ftmsObj$e_meta[presInd, massColname], na.rm=TRUE),
                                            desc="Median observed Kendrick mass"),
      median_kendrick_defect = trelliscope::cog(val=median(ftmsObj$e_meta[presInd, defectColname], na.rm=TRUE),
                                            desc="Median observed Kendrick defect")
    ))
    return(cogs)
  }
}

makeDisplay(bySample,
            panelFn=panelFn3,
            cogFn=customCogFn(),
            name = "Kendrick_plots_for_each_sample",
            group = "Sample")
view()

Groups

We can also divide by and construct plots for treatment groups. The example dataset exampleProcessedPeakData has 4 groups, which are defined according to the Location and Crop.Flora columns of f_data.

getGroupDF(exampleProcessedPeakData)
#>         SampleID Group Location Crop.Flora
#> 1  EM0011_sample   M_S        M          S
#> 2  EM0013_sample   M_S        M          S
#> 3  EM0015_sample   M_S        M          S
#> 4  EM0017_sample   M_S        M          S
#> 5  EM0019_sample   M_S        M          S
#> 6  EM0061_sample   M_C        M          C
#> 7  EM0063_sample   M_C        M          C
#> 8  EM0065_sample   M_C        M          C
#> 9  EM0067_sample   M_C        M          C
#> 10 EM0069_sample   M_C        M          C
#> 11 EW0111_sample   W_C        W          C
#> 12 EW0113_sample   W_C        W          C
#> 13 EW0115_sample   W_C        W          C
#> 14 EW0117_sample   W_C        W          C
#> 15 EW0119_sample   W_C        W          C
#> 16 EW0161_sample   W_S        W          S
#> 17 EW0163_sample   W_S        W          S
#> 18 EW0165_sample   W_S        W          S
#> 19 EW0167_sample   W_S        W          S
#> 20 EW0169_sample   W_S        W          S

We will divide by group and construct a density plot for NOSC, comparing each sample distribution to the group distribution. (Note that group=NA tells densityPlot to use all groups found its input data object.)

byGroup <- divideByGroup(exampleProcessedPeakData)

panelFn4 <- panelFunctionGenerator("densityPlot", variable="NOSC", groups=NA)

makeDisplay(byGroup,
            panelFn=panelFn4,
            cogFn=densityCognostics(variable="NOSC"),
            name = "NOSC_density_for_each_group",
            group = "Group")
view()

We could also generate a custom panel function for each group. Let's say we want to see a barplot of how many peaks were observed for each sample in a group. With Trelliscope, the input to the panel function is the value of each key-value pair, so in this case an ftmsData object containing samples from one group. The panel function must return the plot object.

customPanelFn <- function(v) {
  v2 <- edata_transform(v, "pres")
  peaks_obs <- colSums(dplyr::select(v2$e_data, -!!getEDataColName(v)))


  require(plotly)
  p <- plot_ly(x=names(peaks_obs), y=peaks_obs, type="bar")
  return(p)
}

makeDisplay(byGroup,
            panelFn=customPanelFn,
            name = "Peaks_observed",
            group = "Group")
view()

Group Comparisons

For group conparisons, we're going to redefine our groups so we can directly compare the two locations regardless of crop type, and the two crop types regardless of location. To do this we need to redefine the group designation and then use divideByGroupComparisons to construct two ddos with the comparison information. The concat function will let us join together these two comparison objects so the resulting plots are one Trelliscope display.

exampleProcessedPeakData <- group_designation(exampleProcessedPeakData, main_effects = "Location")
grpComp1 <- divideByGroupComparisons(exampleProcessedPeakData, comparisons = "all")

exampleProcessedPeakData <- group_designation(exampleProcessedPeakData, main_effects = "Crop.Flora")
grpComp2 <- divideByGroupComparisons(exampleProcessedPeakData, comparisons = "all")

allGroupComp <- concat(grpComp1, grpComp2)

Now we'll create a panel function that compares NOSC distribution between each pair of groups.

panelFn5 <- panelFunctionGenerator("densityPlot", variable="NOSC", groups=NA, samples=FALSE)
panelFn5(allGroupComp[[1]]$value)

Next, create a Trelliscope display and view it.

makeDisplay(allGroupComp,
            panelFn=panelFn5,
            cogFn=densityCognostics(variable="NOSC"),
            name = "NOSC_distribution_comparisons",
            group = "Group_Comparison")
view()

Group Comparison Summaries

The last type of display is of group comparison summaries. We'll use the group comparison object created above and apply a summary function to it, then construct a Van Krevelen plot for each comparison showing which peaks are unique to each group and which are shared according to a G-test statistic.

grpCompSummary <- summarizeGroupComparisons(allGroupComp,
                      summary_functions = "uniqueness_gtest",
                      summary_function_params = list(uniqueness_gtest=list(
                        pres_fn="prop",
                        pres_thres=0.5,
                        pvalue_thresh=0.05
                      )))
summary(grpCompSummary[[1]]$value$e_data)
#>      Mass                   uniqueness_gtest
#>  Length:8427        Unique to M     : 625   
#>  Class :character   Unique to W     : 207   
#>  Mode  :character   Observed in Both:2295   
#>                     NA's            :5300

Create a panel function and test on one subset:

panelFn6 <- panelFunctionGenerator("vanKrevelenPlot", colorCName="uniqueness_gtest")
panelFn6(grpCompSummary[[1]]$value)

Now construct a display in Trelliscope:

makeDisplay(grpCompSummary,
            panelFn=panelFn6,
            cogFn=vanKrevelenCognostics(uniquenessColName="uniqueness_gtest"),
            name = "Van_Krevelen_group_comparisons",
            group = "Group_Comparison_Summary")
view()

For group comparison summaries, the vanKrevelenCognostics and kendrickCognostics accept a parameter called uniquenessColName which is the column used to determine if a peak is observed in each group. Generally that should be the same as the column used to color the points.