-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathREADME.Rmd
executable file
·603 lines (404 loc) · 39.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
---
mainfont: Arial
output: github_document
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/research-institute-for-nature-and-forest.csl
link-citations: true
lang: en-US
zotero: true
bibliography: vignettes/references.bib
---
```{r setup, include = FALSE}
knit_table <- function(df, caption){
for(i in 1:ncol(df)){
df[,i] <- gsub("`", "$", arguments[,i])
}
df %>% knitr::kable("latex", linesep = "\\addlinespace", col.names = NULL, booktabs = TRUE, escape = F) %>%
kableExtra::kable_styling(full_width = F, latex_options = "HOLD_position") %>% kableExtra::column_spec(column=1, width = "3cm") %>% kableExtra::column_spec(column=2, width = "14cm",)
}
library(magrittr)
```
```{=latex}
\definecolor{codegray}{HTML}{efefef}
\let\textttOrig\texttt
\renewcommand{\texttt}[1]{\textttOrig{\colorbox{codegray}{#1}}}
```
```{r logo, dpi = 300, out.width="70%", fig.align='center', echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = TRUE}
knitr::include_graphics("https://i.imgur.com/aUqBv3S.png", error = F)
```
<br/>
<font size="4">***QurvE* is an open-source, R package and software that provides a fully automated pipeline for fitting time-resolved biological data, including curve fitting, statistical evaluation, model selection, dose-response analysis, and built-in functions for data visualization.**</font>
## Citation
If you use `QurvE` in your published work, please cite:
Wirth, N.T., Funk, J., Donati, S. et al. *QurvE: user-friendly software for the analysis of biological growth and fluorescence data.* _Nat Protoc_ (2023). https://doi.org/10.1038/s41596-023-00850-7
## Deployed application
[Download the Windows setup](https://github.com/NicWir/QurvE/releases/download/v.1.0.0/QurvE_1.1.1_setup_windows.exe)
[User manual for *QurvE* application](https://nicwir.github.io/QurvE/articles/shiny_app_manual.html)
## R package
[R package on CRAN](https://cran.r-project.org/package=QurvE)
[User manual for growth curve analysis](https://nicwir.github.io/QurvE/articles/vignette_growth.html)
[User manual for fluorescence curve analysis](https://nicwir.github.io/QurvE/articles/vignette_fluorescence.html)
# Introduction
In virtually all disciplines of biology dealing with living organisms, from classical microbiology to applied biotechnology, it is routine to characterize the **growth** of the species under study. `QurvE` provides a suite of analysis tools to make such growth profiling quick, efficient, and reproducible. In addition, it allows the characterization of **fluorescence** data for, e.g., biosensor characterization in plate reader experiments (further discussed in the vignette *Quantitiative Fluorescence Curve Evaluation with Package `QurvE`*). All computational steps to obtain an in-depth characterization are combined into user-friendly *workflow* functions and a range of plotting functions allow the visualization of fits and the comparison of organism performances.
Any physiological parameter calculated (e.g., growth rate µ, doubling time t~D~, lag time $\lambda$, growth measurement increase ΔY, or equivalent fluorescence parameters) can be used to perform a dose-response analysis to determine the *half-maximal effective concentration* (EC~50~).
The package is build on the foundation of the two R packages from @kahm2010 and @petzoldt2022. `QurvE` was designed to be usable with minimal prior knowledge of the R programming language or programming in general. You will need to be familiar with the idea of running commands from a console or writing basic scripts. For R beginners, [this](https://moderndive.netlify.app/1-getting-started.html) is a great starting point, there are some good resources [here](https://education.rstudio.com/learn/beginner/) and we suggest using the [RStudio application](https://posit.co/products/open-source/rstudio/). It provides an environment for writing and running R code.
With consideration for `R` novices, `QurvE` establishes a framework in which a complete, detailed growth curve analysis can be performed in two simple steps:
1. *Read data* in custom format or parse data from a plate reader experiment.
2. *Run workflow*, including fitting of growth curves, dose-response analysis, and rendering a report that summarizes the results.
All computational results of a workflow are stored in a data container (list) and can be visualized by passing them to the generic function `plot(list_object)`. `QurvE` further extends the user's control over the fits by defining thresholds and quality criteria, allows the direct parsing of data from plate reader result files, and calculates parameters for an additional growth phase (bi-phasic growth).
\pagebreak
# Installation
## Release version
The most recent release version can be found on CRAN:
```r
install.packages("QurvE")
```
## Development version
Install the most current version with package `devtools`:
```r
install.packages("devtools")
library(devtools)
install_github("NicWir/QurvE")
```
# Shiny app
*QurvE* features a graphical user interface (GUI) developed as a Shiny app, which has been designed to be user-friendly and intuitive. You can start the app by running:
```r
QurvE::run_app()
```
See the *QurvE User Manual* for details on how to use the front-end application.
# Growth profiling methods
Three methods are available to characterize growth curves:
1. Fit parametric growth models to (log-transformed) growth data
2. Determine maximum growth rates (µ~max~) from the log-linear part of a growth curve using a heuristic approach proposed as the "growth rates made easy"-method by @hall2014. Do do so, `QurvE` uses code from the package @petzoldt2022, but adds user-defined thresholds for (i) R^2^ values of linear fits, (ii) relative standard deviations (RSD) of estimates slopes, and (iii) the minimum fraction of total growth value increase (ΔY) a regression window should cover to be considered for the analysis. These thresholds ensure a more robust and reproducible identification of the linear range that best describes the growth curve. Additionally, parameters for a *secondary growth phase* can be extracted for bi-linear growth curves.[^1]\
The algorithm works as follows:
i. Fit linear regressions [with the Theil-Sen estimator [@sen1968; @theil1992]] to all subsets of `h` consecutive, log-transformed data points (sliding window of size `h`). If, for example, `h=5`, fit a linear regression to points 1 $\dots$ 5, 2 $\dots$ 6, 3 $\dots$ 7 and so forth.
ii. Find the subset with the highest slope $\mu_{max}$. Do the R^2^ and RSD values of the regression meet the defined thresholds and do the data points within the regression window account for at least a defined fraction of the total growth measurement increase? If not, evaluate the regression with the second highest slope, and so forth.
iii. Include also the data points of adjacent subsets that have a slope of at least a $defined \space quota \times \mu_{max}$, e.g., all regression windows that have at least 95% of the maximum slope.
iv. Fit a new linear model to the extended data window identified in step iii.
If `biphasic = TRUE` (see section \@ref(run-workflow)), the following steps are performed to define a second growth phase:
i. Perform a smooth spline fit on the data with a smoothing factor of 0.5.
ii. Calculate the second derivative of the spline fit and perform a smooth spline fit of the derivative with a smoothing factor of 0.4.
iii. Determine local maxima and minima in the second derivative.
iv. Find the local minimum following $\mu_{max}$ and repeat the heuristic linear method for later time values.
v. Find the local maximum before $\mu_{max}$ and repeat the heuristic linear method for earlier time values.
vi. Choose the greater of the two independently determined slopes as $\mu_{max}2$.
3. Perform a smooth spline fit on (log-transformed) growth data and extract µ~max~ as the maximum value of the first derivative^1^.
If `biphasic = TRUE` (see section \@ref(run-workflow)), the following steps are performed to define a second growth phase:
i. Determine local minima within the first derivative of the smooth spline fit.
ii. Remove the *'peak'* containing the highest value of the first derivative (i.e., $\mu_{max}$) that is flanked by two local minima.
iii. Repeat the smooth spline fit and identification of maximum slope for later time values than the local minimum after $\mu_{max}$.
iv. Repeat the smooth spline fit and identification of maximum slope for earlier time values than the local minimum before $\mu_{max}$.
v. Choose the greater of the two independently determined slopes as $\mu_{max}2$.
[^1]: For linear and nonparametric (i.e, smooth spline) fits, the lag time is calculated as the intersect of the tangent at µ~max~ and a horizontal line at the first data point (y~0~). For the analysis of bi-phasic growth curves, the lag time is extracted as the lower of the two obtained $\lambda$ values.
# Dose-response analysis methods
The purpose of a dose-response analysis is to define the *sensitivity* of a given organism to the effects of a compound or the *potency* of a substance, respectively. Such effects can be either beneficial (e.g., a nutrient compound) or detrimental (e.g., an antibiotic). The sensitivity is reflected in the half-maximal effective concentration (*EC~50~*), i.e., the concentration (dose) at which the half-maximal response (e.g., $\mu_{max}$ or ΔY) is observed. `QurvE` provides two methods to determine the *EC~50~*:
1. Perform a smooth spline fit on response vs. concentration data and extract the *EC~50~* as the concentration at the midpoint between the largest and smallest response value.
2. Apply up to 20 (parametric) dose-response models to response vs. concentration data and choose the best model based on the Akaike information criterion (AIC). This is done using the excellent package `drc` [@ritz2016].
# Data formats
`QurvE` accepts files with the formats *.xls*, *.xlsx*, *.csv*, *.tsv*, and *.txt* (tab separated). The data in the files should be structured as shown in Figure \@ref(fig:data-layout). Alternatively, data parsers are available that allow direct import of raw data from different culture instruments. For a list of currently supported devices, please run `?parse_data`.
*Please note: I recommend always converting .xls or .xlsx files to an alternate format first to speed up the parsing process. Reading Excel files may require orders of magnitude longer processing time.*
## Custom format
To ensure compatibility with any type of measurement and data type (e.g., optical density, cell count, measured dimensions), `QurvE` uses a custom data layout. Here the first column contains *time values* and **'Time'** in the top cell, cells #2 and #3 are ignored. The remaining columns contain *measurement values* and the following sample identifiers in the top three rows:
1. Sample name; usually a combination of organism and condition, or 'blank'.
2. Replicate number; replicates are identified by *identical* names and concentration values. If only one type of replicate (biological or technical) was performed, enter numerical values here. If **both** biological and technical replicates of these biological replicates have been performed, the technical replicates should have the same replicate number. The technical replicates are then combined by their average value.
3. (*optional*) Concentration values of an added compound; this information is used to perform a dose-response analysis.
Several experiments (e.g., runs on different plate readers) can be combined into a single file and analyzed simultaneously. Therefore, different experiments are marked by the presence of separate time columns. Different lengths and values in these time columns are permitted.\
\
<img src="https://i.imgur.com/T7LkFDi.png" alt="\label{fig:data-layout} Custom QurvE data layout" width="100%" style="display: block; margin: auto;" />
\
To read data in custom format, run:
```r
grodata <- read_data(data.growth = 'path_to_data_file',
csvsep = ';', # or ','
dec = '.', # or ','
sheet.growth = 1, # number (or "name") of the EXCEL file sheet containing data
subtract.blank = TRUE,
calib.growth = NULL)
```
The <span style='color: red;'>`data.growth`</span> argument takes the path to the file or the name of an R dataframe object containing experimental data in custom format. <span style='color: red;'>`csvsep`</span> specifies the separator symbol (only required for .csv files; default: `';'`). <span style='color: red;'>`dec`</span> is the decimal separator (only required for .csv, .tsv, or .txt files; default: `'.'`). If an Excel file format is used, <span style='color: red;'>`sheet.growth`</span> specifies the number or name (in quotes) of the sheet containing the data.\
If <span style='color: red;'>`subtract.blank = TRUE`</span>, columns with name 'blank' will be combined by their row-wise average, and the mean values will be subtracted from the measurements of all remaining samples. For the <span style='color: red;'>`calib.growth`</span> argument, a formula can be provided in the form *'y = function(x)'* (e.g., `calib.growth = 'y = x * 2 + 0.5'`) to transform growth measurement values.
## Tidy format
The `QurvE` package is designed to be flexible and user-friendly, and it is fully compatible with data in 'tidy' format. This format, also known as 'long' format, is a standard way of organizing data that suits many types of analyses and visualizations in R.
In tidy format, each row is an observation and each column is a variable. In the context of `QurvE`, your data should include the following columns:
- **"Time"**: This column should contain the time values for your observations.
- **"Description"**: This column should contain a description of the sample. This could be a combination of organism and condition or any other relevant descriptor.
- **"Values"**: This column should contain the measurement values for your experiment (e.g., optical density, cell count, etc.).
- **"Replicate"** (optional): If you have multiple replicate measurements for the same condition ("Description" labels), you can indicate the replicate number in this column.
- **"Concentration"** (optional): If there's a compound of interest added to the sample, you can record its concentration in this column.
To load your tidy format data into `QurvE`, use the `read_data` function in the following way:
```r
grodata <- read_data(data.growth = 'path_to_tidy_data_file',
csvsep = ';', # or ','
dec = '.', # or ','
sheet.growth = 1, # number (or "name") of the EXCEL file sheet containing data
subtract.blank = TRUE,
calib.growth = NULL)
```
This code block works the same way as for the custom format described above; `QurvE` detects automatically if data is in tidy format. Be sure to replace `'path_to_tidy_data_file'` with the path to your own data file.
Note: Always ensure your data meets the tidy data standard. This means that every variable has its own column, every observation has its own row, and every value has its own cell. Additionally, make sure that your column headers exactly match those specified above, as `QurvE` will look for these specific headers when processing your data.
## Data parser
The data generated by culture devices (e.g., plate readers) from different manufacturers come in different formats. If these data are to be used directly, they must first be "parsed" from the plate reader into the `QurvE` standard format. In this scenario, sample information must be provided in a separate table that *maps* samples with their respective identifiers.The *mapping table* must have the following layout (Figure \@ref(fig:mapping-layout)):
<img src="https://i.imgur.com/bQCFJba.png" alt="\label{fig:mapping-layout} Data parser mapping layout" width="40%" style="display: block; margin: auto;" />
To parse data, run:
```r
grodata <- parse_data(data.file = 'path_to_data_file',
map.file = 'path_to_mapping_file',
software = 'used_software_or_device',
csvsep.data = ';', # or ','
dec.data = '.', # or ','
csvsep.map = ';', # or ','
dec.map = '.', # or ','
sheet.data = 1, # number (or "name") of the EXCEL file sheet containing data
sheet.map = 1, # number (or "name") of the EXCEL file sheet containing
# mapping information
subtract.blank = TRUE,
calib.growth = NULL,
convert.time = NULL)
```
The <span style='color: red;'>`data.file`</span> argument takes the path to the file containing experimental data exported from a culture device, <span style='color: red;'>`map.file`</span> the path to the file with mapping information. With <span style='color: red;'>`software`</span>, you can specify the device (or software) that was used to generate the data. <span style='color: red;'>`csvsep.data`</span> and <span style='color: red;'>`csvsep.map`</span> specify the separator symbol for data and mapping file, respectively (only required for .csv files; default: `';'`). <span style='color: red;'>`dec.data`</span> and <span style='color: red;'>`dec.map`</span> are the decimal separator used in data and mapping file, respectively (only required for .csv, .tsv, or .txt files; default: `'.'`). If an Excel file format is used for both or one of data or mapping file, <span style='color: red;'>`sheet.data`</span> and/or <span style='color: red;'>`sheet.map`</span> specify the number or name (in quotes) of the sheet containing the data or mapping information, respectively. If the same Excel file contains both data and mapping information in different worksheets, the file path needs to be specified for both `data.file`and `map.file`. If <span style='color: red;'>`subtract.blank = TRUE`</span>, samples with name 'blank' will be combined by their row-wise average, and the mean values will be subtracted from the measurements of all remaining samples. The argument <span style='color: red;'>`convert.time`</span> accepts a function *'y = function(x)'* to transform time values (e.g., `convert.time = 'y = x/3600'` to convert seconds to hours).
If more than one read type is identified in the provided data file, the user will be prompted to specify which measurements belong to **growth**, **fluorescence**, and **fluorescence2**, respectively.
# Run a complete growth analysis workflow
`QurvE` reduces all computational steps required to create a complete growth profiling to two steps, **read data** and **run workflow**.
After loading the package:
```r
library(QurvE)
```
we load experimental data from the publication @wirth2021 in which *Pseudomonas putida* KT2440 and an engineered strain were tested for their sensitivity towards the product 2-fluoromuconic acid:
## Load data
```r
grodata <- read_data(data.growth = system.file("2-FMA_toxicity.csv",
package = "QurvE"), csvsep = ";")
```
The created object `grodata` can be inspected with `View(grodata)`. It is a list of class `grodata` containing:
- a `time` matrix with 66 rows, each corresponding to one sample in the dataset, and 161 columns, i.e., time values for each sample.
- a `growth` data frame with 66 rows and 161+3 columns. The three additional columns contain the sample identifiers `condition`, `replicate`, and `concentration`.
- `fluorescence1` (here: `NA`)
- `fluorescence2` (here: `NA`)
- `norm.fluorescence1` (here: `NA`)
- `norm.fluorescence2` (here: `NA`)
- `expdesign`, a data frame containing the `label`, `condition`, `replicate`, and `concentration` for each sample:
```r
head(grodata$expdesign)
#> label condition replicate concentration
#> 1 KT2440 | 1 | 90 KT2440 1 90
#> 2 KT2440 | 1 | 70 KT2440 1 70
#> 3 KT2440 | 1 | 50 KT2440 1 50
#> 4 KT2440 | 1 | 25 KT2440 1 25
#> 5 KT2440 | 1 | 20 KT2440 1 20
#> 6 KT2440 | 1 | 15 KT2440 1 15
```
We can plot the raw data. Applying the generic `plot()` function to `grodata` objects calls the function `plot.grodata()`.:
```r
plot(grodata, data.type = "growth", log.y = FALSE,
x.lim = c(NA, 32), legend.position = "right",
exclude.conc = c(50, 70, 90),
basesize = 10, legend.ncol = 1, lwd = 0.7)
```
<img src="vignettes/vigfig-raw-data-plot-1.png" alt="\label{fig:raw-data-plot} Raw data plot.
Conditions can be selected or deselected using the `names = c('grp1', 'grp2')` argument or `exclude.nm = c('grp3', 'grp4')` argument, respectively. Similarly, concentrations can be (de-selected) via the `conc` and `exclude.conc` arguments. To plot individual samples instead of grouping replicates, add `mean = FALSE`. See `?plot.grodata` for further options." width="80%" style="display: block; margin: auto;" />
## Run Workflow
To perform a complete growth profiling of all samples in the input dataset, we call the `growth.workflow()` function on the `grodata` object. With <span style='color: red;'>`supress.messages = TRUE`</span>, we avoid printing information about every sample's fit in the sample to the console. By default, the selected response parameter to perform a dose-response analysis is 'mu.linfit'. To choose a different parameter, provide the argument <span style='color: red;'>`dr.parameter = 'choice'`</span>. A list of appropriate parameters is provided within the function documentation (`?growth.workflow`).
```r
grofit <- growth.workflow(grodata = grodata, fit.opt = "a", ec50 = TRUE,
suppress.messages = TRUE, export.res = FALSE) # Prevent creating TXT table and RData files with results
```
If option <span style='color: red;'>`export.res`</span> is set to <span style='color: red;'>`TRUE`</span>, tab-delimited .txt files summarizing the computation results are created, as well as the `grofit` object (an object of class `grofit`) as .RData file. This object (or the .RData file) contains all raw data, fitting options, and computational results. Figure \@ref(fig:grofit-container) shows the structure of the generated `grofit` object. In RStudio, `View(grofit)` allows interactive inspection of the data container.
If you want to create a report summarizing all computational results including a graphical representation of every fit, provide the desired output format(s) as <span style='color: red;'>`report = 'pdf'`</span>, <span style='color: red;'>`report = 'html'`</span>, or <span style='color: red;'>`report = c('pdf', 'html')`</span>. The advantage of having the report in HTML format is that every figure can be exported as (editable) PDF file.
*<span style='color: orange;'>In the spirit of good scientific practice (data transparency), I would encourage anyone using QurvE to attach the .RData file and generated reports to their publication.</span>*
Arguments that are commonly modified:
```{r workflow-arguments, echo = F}
arguments <- data.frame(args = c("`fit.opt`",
"`model.type`",
"`log.y.lin`<br>`log.y.spline`<br>`log.y.model`",
"`biphasic`",
"`interactive`",
"`nboot.gc`",
"`dr.method`",
"`dr.parameter`"),
descr = c("Which growth fitting methods to perform; a string containing `'l'` for linear fits, `'s'` for spline fits, `'m'` for model fits, or `'a'`(the default) for all three methods. Combinations can be also given as a vector of strings, e.g., `c('l', 's').`",
"Which growth models to apply; a string containing one of, or a vector of strings containing any combination of 'logistic', 'richards', 'gompertz', 'gompertz.exp', 'huang', and 'baranyi'.",
"Should Ln(y/y0) be applied to the growth data for the respective fits?",
"Extract growth parameters for two different growth phases (as observed with, e.g., diauxic shifts)",
"Controls interactive mode. If `TRUE`, each fit is visualized in the Plots pane and the user can adjust fitting parameters and confirm the reliability of each fit per sample",
"Number of bootstrap samples used for nonparametric growth curve fitting. See `?growth.gcBootSpline` for details.",
"Define the method used to perform a dose-responde analysis: smooth spline fit (`'spline'`) or model fitting (`'model'`, the default). See section 4",
"The response parameter in the output table to be used for creating a dose response curve. See `?growth.drFit` for further details."
))
knitr::kable(arguments)
```
\
Please consult `?growth.workflow` for further arguments to customize the workflow.\
\
<img src="https://i.imgur.com/ExsQQqV.jpg" alt="\label{fig:grofit-container} Internal structure of a `grofit`object generated by `growth.workflow()`." width="90%" style="display: block; margin: auto;" />
\pagebreak
## Tabular results
A `grofit` object contains two tables summarizing the computational results:
- `grofit$gcFit$gcTable` lists all calculated physiological parameters for every sample and fit
- `grofit$drFit$drTable` contains the results of the dose-response analysis
```r
# show the first three rows and first 14 columns of gcTable
gcTable <- grofit$gcFit$gcTable
gcTable[1:3, 1:14]
```
TestId AddId concentration reliability_tag used.model log.x log.y.lin log.y.spline
1 KT2440 1 90 TRUE <NA> FALSE TRUE TRUE
2 KT2440 1 70 TRUE <NA> FALSE TRUE TRUE
3 KT2440 1 50 TRUE <NA> FALSE TRUE TRUE
log.y.model nboot.gc mu.linfit tD.linfit lambda.linfit dY.linfit
1 TRUE 0 0 <NA> <NA> 0
2 TRUE 0 0 <NA> <NA> 0
3 TRUE 0 0 <NA> <NA> 0
```r
# Show drTable. The function as.data.frame() ensures that it is shown in table format.
drTable <- as.data.frame(grofit$drFit$drTable)
```
Additionally, the dedicated functions `table_group_growth_linear()`, `table_group_growth_model()`, and `table_group_growth_spline()` allow the generation of grouped results tables for each of the three fit types with averages and standard deviations. The column headers in the resulting data frames are formatted with HTML for visualization in shiny and with `DT::datatable()`.
A summary of results for each individual fit can be obtained by applying the generic function `summary()` to any fit object within `grofit`.
## Visualize results
Several generic `plot()` methods have been written to allow easy plotting of results by merely accessing list items within the `grofit` object structure (Figure \@ref(fig:grofit-container)).
### Grouped spline fits
Applying `plot()` to the `grofit` object produces a figure of all spline fits performed as well as the first derivative (slope) over time. The generic function calls `plot.grofit()` with `data.type = 'spline'` and thus, the same options are available as described for Figure \@ref(fig:raw-data-plot).
```r
plot(grofit,
data.type = "spline",
log.y = TRUE,
deriv = TRUE,
conc = c(0,5,10,15,20),
legend.position = "right",
legend.ncol = 1,
x.lim = c(NA, 32),
y.lim = c(0.01,NA),
n.ybreaks = 10,
basesize=10,
lwd = 0.7)
```
<img src="vignettes/vigfig-group-spline-plot-1.png" alt="\label{fig:group-spline-plot} Combined plot of all spline fits performed.
In addition to the options available with `data.type = 'raw'`, further arguments can be defined that control the appearance of the secondary panel showing the slope over time. See `?plot.grofit` for all options." width="80%" style="display: block; margin: auto;" />
\pagebreak
### Compare growth parameters
A convenient way to compare the performance of different organisms under different conditions is to plot the calculated growth parameters by means of the function `plot.parameter()`.
```r
# Parameters obtained from linear regression
plot.parameter(grofit, param = "mu.linfit", basesize = 10, legend.position = "bottom")
plot.parameter(grofit, param = "dY.linfit", basesize = 10, legend.position = "bottom")
# Parameters obtained from nonparametric fits
plot.parameter(grofit, param = "mu.spline", basesize = 10, legend.position = "bottom")
plot.parameter(grofit, param = "dY.spline", basesize = 10, legend.position = "bottom")
# Parameters obtained from model fits
plot.parameter(grofit, param = "mu.model", basesize = 10, legend.position = "bottom")
plot.parameter(grofit, param = "dY.orig.model", basesize = 10,
legend.position = "bottom")
```
```{r plot-parameter, dpi = 300, fig.show="hold", out.width="47%", fig.height = 6, fig.align="center", fig.pos = "H", echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = TRUE, fig.cap = "\\label{fig:plot-parameter} Parameter plots. If `mean = TRUE`, the results of replicates are combined and shown as their mean \u00B1 95\\% confidence interval. As with the functions for combining different growth curves, the arguments `name`, `exclude.nm`, `conc` and `exclude.conc` allow (de)selection of specific samples or conditions. Since we applied growth models to log-transformed data, calling 'dY.orig.model' or 'A.orig.model' instead of 'dY.model' or 'A.model' provides the respective values on the original scale. For linear and spline fits, this is done automatically. For details about this function, run `?plot.parameter`."}
knitr::include_graphics("https://i.imgur.com/h9rk8ab.png", error = F)
knitr::include_graphics("https://i.imgur.com/BCehPfg.png", error = F)
knitr::include_graphics("https://i.imgur.com/KY2mthG.png", error = F)
knitr::include_graphics("https://i.imgur.com/eSNhq6l.png", error = F)
knitr::include_graphics("https://i.imgur.com/OObCQV6.png", error = F)
knitr::include_graphics("https://i.imgur.com/TYnoeH7.png", error = F)
```
From the parameter plot for ´mu.linfit´ (the growth rates determined with linear regression), we can see that there is an outlier for strain KT2440 at concentration 0. We can plot the individual fits for this condition to find out if this is due to the fit quality:
```r
plot(grofit$gcFit$gcFittedLinear$`KT2440 | 1 | 0`, cex.lab = 1.2,
cex.axis = 1.2)
plot(grofit$gcFit$gcFittedLinear$`KT2440 | 2 | 0`, cex.lab = 1.2,
cex.axis = 1.2)
```
```{r plot-linear, dpi = 300, fig.ncol = 1, fig.show="hold", out.width="70%", fig.height = 4.5, fig.pos = "H", fig.align='center', echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = TRUE, fig.cap = "\\label{fig:plot-linear} Linear fit plots to identify sample outliers. For details about this function, run `?plot.gcFitLinear`.", fig.subcap=c("Sample 'KT2440 | 1 | 0'", "Sample 'KT2440 | 2 | 0'")}
knitr::include_graphics("https://i.imgur.com/V1Z1IGS.png", error = F)
knitr::include_graphics("https://i.imgur.com/gfxB1LY.png", error = F)
```
Apparently, the algorithm to find the maximum slope in the growth curve with the standard threshold of `lin.R2 = 0.97` could not find an appropriate fit within the first stage of growth due to insufficient linearity. We can manually re-run the fit for this sample with adjusted parameters. Thereby, we lower the R2 threshold and increase the size of the sliding window to cover a larger fraction of the growth curve. Then, we update the respective entries in the `gcTable` object that summarizes all fitting results (and that plot.parameter() accesses to extract relevant data). The generic function `summary()`, when applied to a the fit object of a single sample within `grofit`, provides the required parameters to update the table. Lastly, we also have to re-run the dose-response analysis since 'mu.linfit' was used as response parameter (the default), including the erroneous value.
*Note: This process of manually updating `grofit`elements with adjusted fits can be avoided by re-running `growth.workflow` with adjusted global parameters or my running the workflow in interactive mode (`interactive = TRUE`). In interactive mode, each individual fit is printed and the user can decide to re-run a single fit with adjusted parameters.*
```r
# Replace the existing linear fit entry for sample `KT2440 | 2 | 0`
# with a new fit
grofit$gcFit$gcFittedLinear$`KT2440 | 2 | 0` <-
growth.gcFitLinear(time = grofit$gcFit$gcFittedLinear$`KT2440 | 2 | 0`$raw.time,
data = grofit$gcFit$gcFittedLinear$`KT2440 | 2 | 0`$raw.data,
control = growth.control(lin.R2 = 0.95, lin.h = 10))
# extract row index of sample `KT2440 | 2 | 0`
ndx.row <- grep("KT2440 \\| 2 \\| 0", grofit$expdesign$label)
# get column indices of linear fit parameters (".linfit")
ndx.col <- grep("\\.linfit", colnames(grofit$gcFit$gcTable) )
# Replace previous growth parameters stored in gcTable
grofit$gcFit$gcTable[ndx.row, ndx.col] <-
summary(grofit$gcFit$gcFittedLinear$`KT2440 | 2 | 0`)
# Replace existing dose-response analysis with new fit
grofit$drFit <- growth.drFit(
gcTable = grofit$gcFit$gcTable,
control = grofit$control) # we can copy the control object from the original workflow.
```
And we can validate the quality of the updated fit:
```r
plot(grofit$gcFit$gcFittedLinear$`KT2440 | 2 | 0`, cex.lab = 1.2)
```
```{r plot-linear-update, dpi = 300, fig.show="hold", out.width="70%", fig.height = 4.5, results='hold' ,fig.align="center", fig.pos = "H", echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = TRUE, fig.cap = "\\label{fig:plot-linear-update} Updated linear fit for the outlier sample 'KT2440 | 2 | 0'."}
knitr::include_graphics("https://i.imgur.com/tTh2BD1.png", error = F)
```
That looks better!
```r
# Parameters obtained from linear regression
plot.parameter(grofit, param = "mu.linfit", basesize = 15)
```
```{r plot-parameter-update, dpi = 300, out.width="70%", fig.height = 5, fig.align="center", fig.pos = "H", echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = TRUE, fig.cap = "\\label{fig:plot-parameter-update} Parameter plot with updated fit."}
# Parameters obtained from linear regression
knitr::include_graphics("https://i.imgur.com/AZIq9rW.png", error = F)
```
### Plot grids
By arranging the individual samples in a grid, we can create a visual representation similar to a heat map that illustrates the values of a chosen parameter. This can be a helpful way to gain insights and understand trends within the data.:
```r
plot.grid(grofit,
param = "mu.linfit",
pal = "Mint",
log.y = FALSE,
sort_by_conc = FALSE,
basesize = 9)
```
```{r grid-plot, dpi = 300, out.width="90%", fig.height=3, fig.pos = "H", fig.align='center', echo = F, eval = T, results='hide', message=FALSE, error=FALSE, warning=FALSE, fig.cap = "\\label{fig:plot} Plot grid of all spline fits performed . See `?plot.grid` for available options.", tidy = FALSE}
knitr::include_graphics("https://i.imgur.com/kvp45QE.png", error = F)
```
### Dose-response analysis
The results of the dose-response analysis can be visualized by calling `plot()` on the `drFit` object that is stored within `grofit`. This action calls `plot.drFit()` which in turn runs `plot.drFitSpline()` or `plot.drFitModel()` (depending on the choice of <span style='color: red;'>`dr.method`</span> in the workflow) on every condition for which a dose-response analysis has been performed. Alternatively, you can call `plot()` on the list elements in `grofit$drFit$drFittedModels` or `grofit$drFit$drFittedSplines`, respectively.
```r
plot(grofit$drFit, cex.point = 1, basesize = 12)
```
```{r plot-drFit, dpi = 300, fig.ncol = 1, fig.show="hold", out.width="70%", fig.height = 4.5, fig.pos = "H", fig.align='center', echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = TRUE, fig.cap = "\\label{fig:plot-drFit} Dose response analysis - model fits. For details about this function, run `?plot.drFit`."}
knitr::include_graphics("https://i.imgur.com/Wd4veyj.png", error = F)
```
# Bootstrapping
When growth experiments are performed on a larger scale with manual growth measurements, technical deviations can result in outliers. Such outliers can lead to a distortion of the curve fits, especially if fewer data points are available than is usual in plate reading experiments. In this instance, bootstrapping can provide a more realistic estimation of growth parameters. Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This is done by randomly drawing data points from a dataset with replacement until the original number of data points has been reached. The analysis (here: growth fitting) is then performed individually on each bootstrapped replicate. The variation in the resulting estimated parameters is a reasonable approximation of the variance in those parameters. To include bootstrapping into the `QurvE` workflow, we define the argument <span style='color: red;'>`nboot.gc`</span>.
Similarly, we can include bootstrapping in the dose-response analysis if done with <span style='color: red;'>`dr.method = 'spline'`</span> by defining argument <span style='color: red;'>`nboot.dr`</span>.
```r
grofit_bt <- growth.workflow(grodata = grodata,
fit.opt = "s", # perform only nonparametric growth fitting
nboot.gc = 50,
ec50 = T,
dr.method = "spline",
dr.parameter = "mu.spline",
nboot.dr = 50,
smooth.dr = 0.25,
suppress.messages = TRUE,
export.res = F)
```
To plot the results of a growth fit with bootstrapping, we call `plot()` on a `gcBootSpline` object:
```r
plot(grofit_bt$gcFit$gcBootSpline[[7]], # Double braces serve as an alternative to
# access list items and allow their access by number
combine = TRUE, # combine both growth curves and parameter plots in the same window
lwd = 0.7)
```
```{r plot-gcBootSpline, dpi = 300, fig.show="hold", out.width="85%", fig.height = 10, fig.pos = "H", fig.align='center', echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = FALSE, fig.cap = "\\label{fig:plot-gcBootSpline} Nonparametric growth fit with bootstrapping. For details about this function, run `?plot.gcBootSpline`."}
knitr::include_graphics("https://i.imgur.com/JUG2edR.png", error = F)
```
And by applying `plot()` to a `drBootSpline` object, we can plot the dose-response bootstrap results:
```r
plot(grofit_bt$drFit$drBootSpline[[1]],
combine = TRUE, # combine both dose-response curves and parameter plots in the same window
lwd = 0.7)
```
```{r plot-drBootSpline, dpi = 300, fig.show="hold", out.width="85%", fig.height = 6, fig.pos = "H", fig.align='center', echo = F, eval = T, message=FALSE, error=FALSE, warning=FALSE, tidy = FALSE, fig.cap = "\\label{fig:plot-drBootSpline} Dose-response analysis with bootstrapping. For details about this function, run `?plot.drBootSpline`."}
knitr::include_graphics("https://i.imgur.com/8J3taed.png", error = F)
```
\pagebreak
# References
<div id="refs"></div>