vignettes/mapping_to_metacyc.Rmd
mapping_to_metacyc.Rmd
The ftmsRanalysis
package facilitates mapping observed
peaks with formulas to compounds found in MetaCyc1 (https://metacyc.org), a
curated database of primary and secondary metabolism. Mapping the
observed compounds to metabolites and the associated metabolic pathways
provides insight into the pathways that are active, and can suggest new
hypotheses about the biochemical processes occurring in a biological or
environmental system.
PNNL has created an R package of data from MetaCyc to support this functionality, which is available here: https://github.com/EMSL-Computing/MetaCycData. It can be installed with the command:
devtools::install_github("EMSL-Computing/MetaCycData")
The first step is to map the peak data to compounds in MetaCyc. This is done by comparing the molecular formula assignments for each peak to compound molecular formulas from the database. Not all peaks will have molecular formulas, and many molecular formulas will map to multiple compounds.
library(ftmsRanalysis)
#>
#> Attaching package: 'ftmsRanalysis'
#> The following object is masked from 'package:stats':
#>
#> heatmap
library(MetaCycData)
data("examplePeakData")
compoundData <- mapPeaksToCompounds(examplePeakData, db="MetaCyc")
#> Warning: `mutate_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `mutate()` instead.
#> ℹ See vignette('programming') for more help
#> ℹ The deprecated feature was likely used in the ftmsRanalysis package.
#> Please report the issue at
#> <https://github.com/EMSL-Computing/ftmsRanalysis/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: `filter_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `filter()` instead.
#> ℹ See vignette('programming') for more help
#> ℹ The deprecated feature was likely used in the ftmsRanalysis package.
#> Please report the issue at
#> <https://github.com/EMSL-Computing/ftmsRanalysis/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning in mapPeaksToCompounds(examplePeakData, db = "MetaCyc"): Multiple
#> masses map to the same molecular formula/compound. Constructing a unique ID for
#> e_data.
The db
parameter specifies what database to use;
currently “MetaCyc” is the only valid option, but in the future other
databases may be added. (And “MetaCyc” is the default so it does not
have to be specified.)
Notice that a warning message was printed above; this is because
multiple peaks in the original data are assigned the same molecular
formula. This is not uncommon but could create a problem for downstream
analysis unless it’s resolved. The function
combinePeaksWithSameFormula
will combine rows with the same
molecular formula by adding the abundance values (or for
presence/absence data, ‘or’-ing the rows).
peak2 <- combinePeaksWithSameFormula(examplePeakData)
#> 770 mass formulas with duplicate values found
summary(peak2)
#> Samples: 20
#> Molecules: 22290
#> Percent Missing: 81.108%
compoundData <- mapPeaksToCompounds(peak2, db="MetaCyc")
summary(compoundData)
#> Samples: 20
#> Molecules: 747
#> Percent Missing: 42.751%
There are many fewer rows of data there in compoundData
than there are in peak2
. This is due to the fact that so
few peaks have molecular formulas assigned. When this mapping is
performed, the compound ID is added to the e_meta
table
(Compound
column):
head(compoundData$e_meta[, c("Mass", "Compound")])
#> Mass Compound
#> 1 129.0557399 PANTOYL-LACTONE
#> 2 129.0557399 ETHYL-ACETOACETATE
#> 3 129.0557399 PROPIONIC-ANHYDRIDE
#> 4 129.0557399 CPD-15810
#> 5 129.0557399 CPD-15381
#> 6 129.0557399 6-HYDROXYHEXAN-6-OLIDE
These IDs can be used to map back to the MetaCycData
package–the mc_compounds
data frame contains compound
information. This information includes the URL and common name among
other fields.
dplyr::filter(mc_compounds, COMPOUND=="CPD-16467")
#> COMPOUND URL
#> 1 CPD-16467 http://metacyc.ai.sri.com/compound?orgid=META&id=CPD-16467
#> COMMON-NAME MONOISOTOPIC-MW MOLECULAR-WEIGHT MF
#> 1 3-[(1-carboxyvinyl)oxy]benzoate 208.0372 206.154 C10H6O5
#> CHEMICAL-FORMULA
#> 1 (C 10) (H 6) (O 5)
#> DBLINKS
#> 1 (PUBCHEM "25767865" NIL |kothari| 3619399262 NIL NIL) (CHEBI "76981" NIL |kothari| 3608052289 NIL NIL)
MetaCyc contains information to map compounds to reactions and
biological pathways. MetaCyc’s pathyways database includes what they
call super-pathways, which are linked sets of smaller pathways. For
biological purposes, we wanted to distinguish between base pathways and
super-pathways, so the ftmsRanalysis
and
MetaCycData
packages refer to base pathways as ‘modules’.
There is a mc_modules
data frame in the
MetaCycData
package and the associated functions in
ftmsRanalysis
refer to mapping to ‘modules’.
The functions mapCompoundsToReactions
and
mapCompoundsToModules
perform these mappings. The resulting
objects produced by these functions indicate compounds observed per
reaction or module. First we will explore mapping to reactions.
rxnData <- mapCompoundsToReactions(compoundData)
#> Warning: `cols` is now required when using `unnest()`.
#> ℹ Please use `cols = c(Reaction)`.
#> Warning: `select_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `select()` instead.
#> ℹ The deprecated feature was likely used in the ftmsRanalysis package.
#> Please report the issue at
#> <https://github.com/EMSL-Computing/ftmsRanalysis/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: `rename_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `rename()` instead.
#> ℹ The deprecated feature was likely used in the ftmsRanalysis package.
#> Please report the issue at
#> <https://github.com/EMSL-Computing/ftmsRanalysis/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: `funs()` was deprecated in dplyr 0.8.0.
#> ℹ Please use a list of either functions or lambdas:
#>
#> # Simple named list: list(mean = mean, median = median)
#>
#> # Auto named with `tibble::lst()`: tibble::lst(mean, median)
#>
#> # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
#> ℹ The deprecated feature was likely used in the ftmsRanalysis package.
#> Please report the issue at
#> <https://github.com/EMSL-Computing/ftmsRanalysis/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
rxnData
#> reactionData object
#> # Reactions: 1923
#> # Samples: 20
#> Database: MetaCyc
#> Meta data columns: [Reaction, EC_Number, Compounds_in_Dataset, N_Observable_Compounds]
Since compounds and reactions have a many to many relationship, the
number of rows in e_data
is different from that of
compoundData2
. The columns of e_meta
have also
changed. Previous metadata that was applicable to peaks and compounds is
not applicable to reactions. Instead, there are four columns:
Reaction
: database IDEC_Number
: Enzyme Commission numberCompounds_in_Dataset
: semi-colon delimited string of
which compounds in the source dataset were mapped to the reactionN_Observable_Compounds
: number of compounds in the
database which could have been observed, subject to any mass filters
previously applied to the source datasetThe proportion of compounds observed per reaction can be calculated
by parsing the Compounds_in_Dataset
field and dividing its
element-wise length by the value in
N_Observable_Compounds
.
n_cmp_observed <- unlist(lapply(strsplit(rxnData$e_meta$Compounds_in_Dataset, ";"), length))
prop_cmp_observed <- n_cmp_observed/rxnData$e_meta$N_Observable_Compounds
We could also look at which reactions are observed in different
treatment groups, using divideByGroupComparisons
and
sumamrizeGroupComparisons
. First, we define treatment
groups using group_designation
. For this example we’ll
compare crop floras (Corn vs Switchgrass).
rxnData2 <- group_designation(rxnData, main_effects="Crop.Flora")
rxnGroupCompData <- divideByGroupComparisons(rxnData2, comparisons="all")
rxnCompSummary <- summarizeGroupComparisons(rxnGroupCompData, summary_functions = "uniqueness_gtest",
summary_function_params=list(uniqueness_gtest=
list(pres_fn="nsamps", pres_thresh=2,
pvalue_thresh=0.05)))
getKeys(rxnCompSummary)
#> [[1]]
#> [1] "Group_Comparison=C vs S"
The rxnCompSummary
object is a distributed data object
(ddo) (see the datadr
package
for more information about ddo’s). A ddo is a list of key-value pairs,
where each key defines the groups under comparison, and each value is
reactionData
object. In this case we have one comparison (C
vs S) but if we had more than two groups, the ddo
could
have many comparison objects.
Suppose we wanted to examine the reactions that were observed to be
unique to only one of the S or C groups. We would look at the
reactionData
value and filter the e_data
element to rows that contain the word ‘Unique’.
x <- rxnCompSummary[["Group_Comparison=C vs S"]]$value
summary(x$e_data)
#> Reaction uniqueness_gtest
#> Length:1923 Unique to S : 52
#> Class :character Unique to C : 26
#> Mode :character Observed in Both:1449
#> NA's : 396
ind <- grep("Unique", x$e_data$uniqueness_gtest)
x$e_data[ind, ] %>% dplyr::arrange(uniqueness_gtest)
#> Reaction uniqueness_gtest
#> 1 1.2.1.55-RXN Unique to S
#> 2 1.2.1.56-RXN Unique to S
#> 3 2.3.1.103-RXN Unique to S
#> 4 2.3.1.151-RXN Unique to S
#> 5 2.3.1.91-RXN Unique to S
#> 6 RXN-10896 Unique to S
#> 7 RXN-10897 Unique to S
#> 8 RXN-12070 Unique to S
#> 9 RXN-12073 Unique to S
#> 10 RXN-12270 Unique to S
#> 11 RXN-12802 Unique to S
#> 12 RXN-12803 Unique to S
#> 13 RXN-12899 Unique to S
#> 14 RXN-12933 Unique to S
#> 15 RXN-13788 Unique to S
#> 16 RXN-13830 Unique to S
#> 17 RXN-13967 Unique to S
#> 18 RXN-14008 Unique to S
#> 19 RXN-14355 Unique to S
#> 20 RXN-14435 Unique to S
#> 21 RXN-14436 Unique to S
#> 22 RXN-15322 Unique to S
#> 23 RXN-15332 Unique to S
#> 24 RXN-15333 Unique to S
#> 25 RXN-15350 Unique to S
#> 26 RXN-15691 Unique to S
#> 27 RXN-16828 Unique to S
#> 28 RXN-16829 Unique to S
#> 29 RXN-18192 Unique to S
#> 30 RXN-18193 Unique to S
#> 31 RXN-18200 Unique to S
#> 32 RXN-18317 Unique to S
#> 33 RXN-19597 Unique to S
#> 34 RXN-19598 Unique to S
#> 35 RXN-7482 Unique to S
#> 36 RXN-7483 Unique to S
#> 37 RXN-7484 Unique to S
#> 38 RXN-8006 Unique to S
#> 39 RXN-8169 Unique to S
#> 40 RXN-8170 Unique to S
#> 41 RXN-8176 Unique to S
#> 42 RXN-8268 Unique to S
#> 43 RXN-8270 Unique to S
#> 44 RXN-8449 Unique to S
#> 45 RXN-9691 Unique to S
#> 46 RXN-9692 Unique to S
#> 47 RXN0-6677 Unique to S
#> 48 RXNQT-4161 Unique to S
#> 49 RXNQT-4175 Unique to S
#> 50 RXNQT-4176 Unique to S
#> 51 RXNQT-4178 Unique to S
#> 52 SINAPATE-1-GLUCOSYLTRANSFERASE-RXN Unique to S
#> 53 1.13.11.10-RXN Unique to C
#> 54 2.4.1.195-RXN Unique to C
#> 55 2.4.1.220-RXN Unique to C
#> 56 AMYGDALIN-BETA-GLUCOSIDASE-RXN Unique to C
#> 57 DEOXYLIMONATE-A-RING-LACTONASE-RXN Unique to C
#> 58 PRUNASIN-BETA-GLUCOSIDASE-RXN Unique to C
#> 59 RXN-15219 Unique to C
#> 60 RXN-15220 Unique to C
#> 61 RXN-15221 Unique to C
#> 62 RXN-15222 Unique to C
#> 63 RXN-15223 Unique to C
#> 64 RXN-17367 Unique to C
#> 65 RXN-17371 Unique to C
#> 66 RXN-18094 Unique to C
#> 67 RXN-18095 Unique to C
#> 68 RXN-18829 Unique to C
#> 69 RXN-18834 Unique to C
#> 70 RXN-20077 Unique to C
#> 71 RXN-2947 Unique to C
#> 72 RXN-4606 Unique to C
#> 73 RXN-7022 Unique to C
#> 74 RXN-7082 Unique to C
#> 75 RXN-9075 Unique to C
#> 76 RXN-9077 Unique to C
#> 77 RXN-9102 Unique to C
#> 78 RXN-9498 Unique to C
We could then join this table to the e_meta
component to
see the compounds observed, and join it to the mc_reactions
data frame in MetaCycData
to see other information
(including URLs) about these reactions.
unique_rxn_info <- x$e_data[ind, ] %>%
dplyr::left_join(x$e_meta) %>%
dplyr::left_join(mc_reactions, by=c(Reaction='REACTION')) %>%
dplyr::arrange(uniqueness_gtest)
#> Joining with `by = join_by(Reaction)`
head(unique_rxn_info)
#> Reaction uniqueness_gtest EC_Number
#> 1 1.2.1.55-RXN Unique to S 1.1.1.279
#> 2 1.2.1.56-RXN Unique to S 1.1.1.280
#> 3 2.3.1.103-RXN Unique to S 2.3.1.103
#> 4 2.3.1.151-RXN Unique to S 2.3.1.151
#> 5 2.3.1.91-RXN Unique to S 2.3.1.91
#> 6 RXN-10896 Unique to S <NA>
#> Compounds_in_Dataset N_Observable_Compounds
#> 1 ETHYL-3-OXOHEXANOATE;ETHYL-R-3-HYDROXYHEXANOATE 5
#> 2 ETHYL-S-3-HYDROXYHEXANOATE;ETHYL-3-OXOHEXANOATE 5
#> 3 1-O-SINAPOYL-BETA-D-GLUCOSE 2
#> 4 2346-TETRAHYDROXYBENZOPHENONE 6
#> 5 1-O-SINAPOYL-BETA-D-GLUCOSE 3
#> 6 CPD-11853 5
#> URL
#> 1 http://metacyc.ai.sri.com/META/new-image?type=REACTION&object=1.2.1.55-RXN
#> 2 http://metacyc.ai.sri.com/META/new-image?type=REACTION&object=1.2.1.56-RXN
#> 3 http://metacyc.ai.sri.com/META/new-image?type=REACTION&object=2.3.1.103-RXN
#> 4 http://metacyc.ai.sri.com/META/new-image?type=REACTION&object=2.3.1.151-RXN
#> 5 http://metacyc.ai.sri.com/META/new-image?type=REACTION&object=2.3.1.91-RXN
#> 6 http://metacyc.ai.sri.com/META/new-image?type=REACTION&object=RXN-10896
#> SYSTEMATIC-NAME EC-NUMBER IN-PATHWAY REACTION-LIST
#> 1 <NA> 1.1.1.279 <NA> <NA>
#> 2 <NA> 1.1.1.280 <NA> <NA>
#> 3 <NA> 2.3.1.103 PWY-3301 <NA>
#> 4 <NA> 2.3.1.151 PWY-5002 <NA>
#> 5 <NA> 2.3.1.91 PWY-3301 <NA>
#> 6 <NA> <NA> PWY-6340 <NA>
#> LEFT
#> 1 ETHYL-R-3-HYDROXYHEXANOATE + NADP
#> 2 ETHYL-S-3-HYDROXYHEXANOATE + NADP
#> 3 (2 1-O-SINAPOYL-BETA-D-GLUCOSE)
#> 4 (3 MALONYL-COA) + CPD-264 + (3 PROTON)
#> 5 CHOLINE + 1-O-SINAPOYL-BETA-D-GLUCOSE
#> 6 WATER + CPD-11861
#> RIGHT
#> 1 ETHYL-3-OXOHEXANOATE + NADPH + PROTON
#> 2 ETHYL-3-OXOHEXANOATE + NADPH + PROTON
#> 3 Glucopyranose + 12-BIS-O-SINAPOYL-BETA-D-GLUCOSIDE
#> 4 (4 CO-A) + 2346-TETRAHYDROXYBENZOPHENONE + (3 CARBON-DIOXIDE)
#> 5 Glucopyranose + O-SINAPOYLCHOLINE
#> 6 CPD-11863 + CPD-11853 + PROTON
#> REACTION-DIRECTION
#> 1 <NA>
#> 2 <NA>
#> 3 LEFT-TO-RIGHT
#> 4 LEFT-TO-RIGHT
#> 5 LEFT-TO-RIGHT
#> 6 LEFT-TO-RIGHT
#> COMPOUNDS
#> 1 NADP;ETHYL-R-3-HYDROXYHEXANOATE;PROTON;NADPH;ETHYL-3-OXOHEXANOATE
#> 2 NADP;ETHYL-S-3-HYDROXYHEXANOATE;PROTON;NADPH;ETHYL-3-OXOHEXANOATE
#> 3 1-O-SINAPOYL-BETA-D-GLUCOSE;12-BIS-O-SINAPOYL-BETA-D-GLUCOSIDE;Glucopyranose
#> 4 PROTON;CPD-264;MALONYL-COA;CARBON-DIOXIDE;2346-TETRAHYDROXYBENZOPHENONE;CO-A
#> 5 1-O-SINAPOYL-BETA-D-GLUCOSE;CHOLINE;O-SINAPOYLCHOLINE;Glucopyranose
#> 6 CPD-11861;WATER;PROTON;CPD-11853;CPD-11863
#> DBLINKS
#> 1 (RHEA "24352" NIL |kothari| 3571758892 NIL NIL) (LIGAND-RXN "R04105" NIL |taltman| 3459474589 NIL NIL)
#> 2 (RHEA "18269" NIL |kothari| 3571758793 NIL NIL) (LIGAND-RXN "R04106" NIL |taltman| 3459474589 NIL NIL)
#> 3 (RHEA "22665" NIL |kothari| 3709310380 NIL NIL) (LIGAND-RXN "R00063" NIL |taltman| 3459474592 NIL NIL)
#> 4 (RHEA "19306" NIL |kothari| 3709310380 NIL NIL) (LIGAND-RXN "R04709" NIL |taltman| 3459474596 NIL NIL)
#> 5 (RHEA "12025" NIL |kothari| 3709310378 NIL NIL) (LIGAND-RXN "R03075" NIL |taltman| 3459474596 NIL NIL)
#> 6 <NA>
The process for mapping from compounds to modules (base pathways) is very similar to that for reactions.
modData <- mapCompoundsToModules(compoundData)
#> Warning: `cols` is now required when using `unnest()`.
#> ℹ Please use `cols = c(Reaction)`.
#> Warning: `group_by_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `group_by()` instead.
#> ℹ See vignette('programming') for more help
#> ℹ The deprecated feature was likely used in the ftmsRanalysis package.
#> Please report the issue at
#> <https://github.com/EMSL-Computing/ftmsRanalysis/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
modData
#> moduleData object
#> # Reactions: 1231
#> # Samples: 20
#> Database: MetaCyc
#> Meta data columns: [Module_Node, Module, Module_Node_Comb, Compounds_in_Dataset, N_Observable_Compounds]
In order to facilitate plotting the module reaction graph (coming
soon!), mapCompoundsToModules
actually maps each compound
to the node of the module graph it corresonds to, where a node is one or
more reactions. In the future ftmsRanalysis
will include a
function to plot a module graph and color the nodes by number (or
proportion) of compounds observed.
For now, let’s investigate the modules that are uniquely observed between the two treatment groups. This is accomplished much like it is for reactions above.
modData2 <- group_designation(modData, main_effects="Crop.Flora")
modGroupCompData <- divideByGroupComparisons(modData2, comparisons="all")
modCompSummary <- summarizeGroupComparisons(modGroupCompData, summary_functions = "uniqueness_gtest",
summary_function_params=list(uniqueness_gtest=
list(pres_fn="nsamps", pres_thresh=2,
pvalue_thresh=0.05)))
#> *** finding global variables used in 'fn'...
#> found: summary_function_params, summary_functions
#> *** testing 'fn' on a subset... ok
#> * Applying recombination...
y <- modCompSummary[["Group_Comparison=C vs S"]]$value
summary(y$e_data)
#> Module_Node_Comb uniqueness_gtest
#> Length:1231 Unique to S : 32
#> Class :character Unique to C : 14
#> Mode :character Observed in Both:929
#> NA's :256
ind <- grep("Unique", y$e_data$uniqueness_gtest)
y$e_data[ind, ] %>% dplyr::arrange(uniqueness_gtest)
#> Module_Node_Comb uniqueness_gtest
#> 1 PWY-5001: RXN-7482 Unique to S
#> 2 PWY-5001: RXN-7483 Unique to S
#> 3 PWY-5002: 2.3.1.151-RXN Unique to S
#> 4 PWY-5002: RXN-7484 Unique to S
#> 5 PWY-5160: RXN-8006 Unique to S
#> 6 PWY-5284: RXN-8169 Unique to S
#> 7 PWY-5284: RXN-8170 Unique to S
#> 8 PWY-5286: RXN-8176 Unique to S
#> 9 PWY-5321: RXN-8268 Unique to S
#> 10 PWY-5321: RXNQT-4161 Unique to S
#> 11 PWY-6340: RXN-10896 Unique to S
#> 12 PWY-6340: RXN-10897 Unique to S
#> 13 PWY-6690: RXN-12070 Unique to S
#> 14 PWY-6690: RXN-12073 Unique to S
#> 15 PWY-6827: RXN-12270 Unique to S
#> 16 PWY-6971: RXN-12802 Unique to S
#> 17 PWY-6971: RXN-12803 Unique to S
#> 18 PWY-7106: RXN-12933 Unique to S
#> 19 PWY-7134: RXN-13788 Unique to S
#> 20 PWY-7256: RXN-13967 Unique to S
#> 21 PWY-7256: RXN-14435 Unique to S
#> 22 PWY-7256: RXN-14436 Unique to S
#> 23 PWY-7262: RXN-8169 Unique to S
#> 24 PWY-7449: RXN-15322 Unique to S
#> 25 PWY-7450: RXN-15332 Unique to S
#> 26 PWY-7450: RXN-15333 Unique to S
#> 27 PWY-7458: RXN-15350 Unique to S
#> 28 PWY0-1527: RXN0-6677 Unique to S
#> 29 PWYQT-4450: RXN-18192 Unique to S
#> 30 PWYQT-4450: RXN-18193 Unique to S
#> 31 PWYQT-4450: RXN-18200 Unique to S
#> 32 PWYQT-4450: RXNQT-4175 Unique to S
#> 33 PWY-2821: 2.4.1.195-RXN Unique to C
#> 34 PWY-2821: RXN-4606 Unique to C
#> 35 PWY-5784: RXN-2947 Unique to C
#> 36 PWY-5784: RXN-9075 Unique to C
#> 37 PWY-5784: RXN-9077 Unique to C
#> 38 PWY-5797: RXN-9102 Unique to C
#> 39 PWY-5959: RXN-9498 Unique to C
#> 40 PWY-6011: AMYGDALIN-BETA-GLUCOSIDASE-RXN Unique to C
#> 41 PWY-6011: PRUNASIN-BETA-GLUCOSIDASE-RXN Unique to C
#> 42 PWY-6068: 2.4.1.220-RXN Unique to C
#> 43 PWY-6219: RXN-2947 Unique to C
#> 44 PWY-7824: RXN-18094 Unique to C
#> 45 PWY-7824: RXN-18095 Unique to C
#> 46 PWY-7911: RXN-18829 Unique to C
Then join this result to x$e_meta
and
mc_modules
to get more information about the modules
observed.
unique_mod_info <- y$e_data[ind, ] %>%
dplyr::left_join(y$e_meta) %>%
dplyr::left_join(dplyr::select(mc_modules, MODULE, URL), by=c(Module='MODULE')) %>%
dplyr::arrange(uniqueness_gtest)
#> Joining with `by = join_by(Module_Node_Comb)`
head(unique_mod_info)
#> Module_Node_Comb uniqueness_gtest Module_Node Module
#> 1 PWY-5001: RXN-7482 Unique to S RXN-7482 PWY-5001
#> 2 PWY-5001: RXN-7482 Unique to S RXN-7482 PWY-5001
#> 3 PWY-5001: RXN-7483 Unique to S RXN-7483 PWY-5001
#> 4 PWY-5002: 2.3.1.151-RXN Unique to S 2.3.1.151-RXN PWY-5002
#> 5 PWY-5002: RXN-7484 Unique to S RXN-7484 PWY-5002
#> 6 PWY-5160: RXN-8006 Unique to S RXN-8006 PWY-5160
#> Compounds_in_Dataset N_Observable_Compounds
#> 1 2346-TETRAHYDROXYBENZOPHENONE;CPD-6881 7
#> 2 2346-TETRAHYDROXYBENZOPHENONE;CPD-6881 7
#> 3 2346-TETRAHYDROXYBENZOPHENONE 6
#> 4 2346-TETRAHYDROXYBENZOPHENONE 6
#> 5 2346-TETRAHYDROXYBENZOPHENONE 6
#> 6 CPD-7138 4
#> URL
#> 1 http://metacyc.ai.sri.com/META/new-image?type=PATHWAY&object=PWY-5001
#> 2 http://metacyc.ai.sri.com/META/new-image?type=PATHWAY&object=PWY-5001
#> 3 http://metacyc.ai.sri.com/META/new-image?type=PATHWAY&object=PWY-5001
#> 4 http://metacyc.ai.sri.com/META/new-image?type=PATHWAY&object=PWY-5002
#> 5 http://metacyc.ai.sri.com/META/new-image?type=PATHWAY&object=PWY-5002
#> 6 http://metacyc.ai.sri.com/META/new-image?type=PATHWAY&object=PWY-5160