Skip to content

Adding chromPeaks metadata to the Spectra output of chromPeakSpectra() #779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Dec 15, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -158,4 +158,5 @@ Collate:
'writemzdata.R'
'writemztab.R'
'xcmsSource.R'
'zzz.R'
'zzz.R'

6 changes: 5 additions & 1 deletion R/XcmsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -793,7 +793,8 @@
"largest_bpi"),
msLevel = 2L, expandRt = 0, expandMz = 0,
ppm = 0, skipFilled = FALSE,
peaks = integer(), BPPARAM = bpparam()) {
peaks = integer(), peaksInfo = c("rt", "mz"),
BPPARAM = bpparam()) {
method <- match.arg(method)
pks <- .chromPeaks(x)[, c("mz", "mzmin", "mzmax", "rt",
"rtmin", "rtmax", "maxo", "sample")]
Expand Down Expand Up @@ -830,6 +831,9 @@
ids <- rep(rownames(pk), lengths(idx))
res <- sp[unlist(idx)]
res$peak_id <- ids
info <- pk[res$peak_id, peaksInfo]
colnames(info) <- paste("peak_", peaksInfo, sep = "")
res@backend@spectraData <- cbind(res@backend@spectraData, info)
res
},
MoreArgs = list(msLevel = msLevel, method = method),
Expand Down
8 changes: 6 additions & 2 deletions R/XcmsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -515,6 +515,10 @@
#' indicating the identified chromatographic peaks. Only a single color
#' is supported. Defaults to `peakCol = "#ff000060".
#'
#' @param peaksInfo For `chromPeakSpectra`: `character` vector of additional
#' information from `chromPeaks()` to be added to the spectra object. The
#' columns names will be appended with "peaks_".
#'
#' @param ppm For `chromPeaks` and `featureDefinitions`: optional `numeric(1)`
#' specifying the ppm by which the m/z range (defined by `mz` should be
#' extended. For a value of `ppm = 10`, all peaks within `mz[1] - ppm / 1e6`
Expand Down Expand Up @@ -1228,7 +1232,7 @@ setMethod(
function(object, method = c("all", "closest_rt", "closest_mz",
"largest_tic", "largest_bpi"),
msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0,
skipFilled = FALSE, peaks = character(),
skipFilled = FALSE, peaks = character(), peaksInfo = c("rt", "mz"),
return.type = c("Spectra", "List"), BPPARAM = bpparam()) {
if (hasAdjustedRtime(object))
object <- applyAdjustedRtime(object)
Expand All @@ -1244,7 +1248,7 @@ setMethod(
else pkidx <- integer()
res <- .mse_spectra_for_peaks(object, method, msLevel, expandRt,
expandMz, ppm, skipFilled, pkidx,
BPPARAM)
peaksInfo, BPPARAM)
if (!length(pkidx))
peaks <- rownames(.chromPeaks(object))
else peaks <- rownames(.chromPeaks(object))[pkidx]
Expand Down
2 changes: 1 addition & 1 deletion R/do_adjustRtime-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -874,7 +874,7 @@ NULL
resid_ratio = 3,
zero_weight = 10,
bs = "tp"){
rt_map <- rt_map[order(rt_map$obs), ]
rt_map <- rt_map[order(rt_map$obs), c("ref", "obs")]
# add first row of c(0,0) to set a fix timepoint.
rt_map <- rbind(c(0,0), rt_map)
weights <- rep(1, nrow(rt_map))
Expand Down
4 changes: 4 additions & 0 deletions man/XcmsExperiment.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/chromPeakSpectra.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

113 changes: 50 additions & 63 deletions vignettes/xcms.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,10 @@ be applied to the older *MSnbase*-based workflows (xcms version 3). Additional
documents and tutorials covering also other topics of untargeted metabolomics
analysis are listed at the end of this document. There is also a [xcms
tutorial](https://jorainer.github.io/xcmsTutorials) available with more examples
and details.
and details.
To get a complete overview of LCMS-MS analysis, an end-to-end workflow
[Metabonaut website](https://rformassspectrometry.github.io/metabonaut/), which
integrate the *xcms* preprocessing steps with the downstream analysis, is available.


# Preprocessing of LC-MS data
Expand Down Expand Up @@ -1180,6 +1183,48 @@ defined above. The `filter` argument can accommodate various types of input,
each determining the specific type of quality assessment and filtering to be
performed.

The `PercentMissingFilter` allows to filter features based on the percentage of
missing values for each feature. This function takes as an input the parameter
`f` which is supposed to be a vector of length equal to the length of the object
(i.e. number of samples) with the sample type for each. The function then
computes the percentage of missing values per sample groups and filters
features based on this. Features with a percent of missing values larger than
the threshold in all sample groups will be removed. Another option is to base
this quality assessment and filtering only on QC samples.

Both examples are shown below:

```{r}
# To set up parameter `f` to filter only based on QC samples
f <- sampleData(faahko)$sample_type
f[f != "QC"] <- NA

# To set up parameter `f` to filter per sample type excluding QC samples
f <- sampleData(faahko)$sample_type
f[f == "QC"] <- NA

missing_filter <- PercentMissingFilter(threshold = 30, f = f)
# Apply the filter to faakho object
filtered_faahko <- filterFeatures(object = faahko, filter = missing_filter)

# Apply the filter to res object
missing_filter <- PercentMissingFilter(threshold = 30, f = f)
filtered_res <- filterFeatures(object = res, filter = missing_filter)
```

Here, no feature was removed, meaning that all the features had less than 30%
of `NA` values in at least one of the sample type.

Although not directly relevant to this experiment, the `BlankFlag` filter can be
used to flag features based on the intensity relationship between blank and QC
samples. More information can be found in the documentation of the filter:

```{r}
# Retrieve documentation for the main function and the specific filter.
?filterFeatures
?BlankFlag
```

The `RsdFilter` enable users to filter features based on their relative
standard deviation (coefficient of variation) for a specified `threshold`. It
is recommended to base the computation on quality control (QC) samples,
Expand All @@ -1188,14 +1233,14 @@ as demonstrated below:
```{r}
# Set up parameters for RsdFilter
rsd_filter <- RsdFilter(threshold = 0.3,
qcIndex = sampleData(faahko)$sample_type == "QC")
qcIndex = sampleData(filtered_faahko)$sample_type == "QC")

# Apply the filter to faakho object
filtered_faahko <- filterFeatures(object = faahko, filter = rsd_filter)
filtered_faahko <- filterFeatures(object = filtered_faahko, filter = rsd_filter)

# Now apply the same strategy to the res object
rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = res$sample_type == "QC")
filtered_res <- filterFeatures(object = res, filter = rsd_filter, assay = "raw")
rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = filtered_res$sample_type == "QC")
filtered_res <- filterFeatures(object = filtered_res, filter = rsd_filter, assay = "raw")
```

All features with an RSD (CV) strictly larger than 0.3 in QC samples were thus
Expand Down Expand Up @@ -1229,64 +1274,6 @@ filtered_res <- filterFeatures(object = filtered_res,
All features with an D-ratio strictly larger than 0.5 were thus removed from
the data set.

The `PercentMissingFilter` allows to filter features based on the percentage of
missing values for each feature. This function takes as an input the parameter
`f` which is supposed to be a vector of length equal to the length of the object
(i.e. number of samples) with the sample type for each. The function then
computes the percentage of missing values per sample groups and filters
features based on this. Features with a percent of missing values larger than
the threshold in all sample groups will be removed. Another option is to base
this quality assessment and filtering only on QC samples.

Both examples are shown below:

```{r}
# To set up parameter `f` to filter only based on QC samples
f <- sampleData(filtered_faakho)$sample_type
f[f != "QC"] <- NA

# To set up parameter `f` to filter per sample type excluding QC samples
f <- sampleData(filtered_faakho)$sample_type
f[f == "QC"] <- NA

missing_filter <- PercentMissingFilter(threshold = 30,
f = f)

# Apply the filter to faakho object
filtered_faakho <- filterFeatures(object = filtered_faakho,
filter = missing_filter)

# Apply the filter to res object
missing_filter <- PercentMissingFilter(threshold = 30,
f = f)
filtered_res <- filterFeatures(object = filtered_res,
filter = missing_filter)
```

Here, no feature was removed, meaning that all the features had less than 30%
of `NA` values in at least one of the sample type.

Although not directly relevant to this experiment, the `BlankFlag` filter can be
used to flag features based on the intensity relationship between blank and QC
samples. More information can be found in the documentation of the filter:

```{r}
# Retrieve documentation for the main function and the specific filter.
?filterFeatures
?BlankFlag
```

## Normalization

Normalizing features' signal intensities is required, but at present not (yet)
supported in `xcms` (some methods might be added in near future). It is advised
to use the `SummarizedExperiment` returned by the `quantify()` method for any
further data processing, as this type of object stores feature definitions,
sample annotations as well as feature abundances in the same object. For the
identification of e.g. features with significant different
intensities/abundances it is suggested to use functionality provided in other R
packages, such as Bioconductor's excellent *limma* package.


## Alignment to an external reference dataset

Expand Down
Loading