-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
183 lines (135 loc) · 7.03 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup2, include = FALSE}
library(BuzzardsBay)
library(knitr)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
warning = FALSE
)
# Delete preexisting example directory
parent_dir <- tempdir(check = TRUE)
example_base <- file.path(parent_dir, "BB_Data")
if (file.exists(example_base)) {
unlink(example_base, recursive = TRUE)
Sys.sleep(.1)
}
rm(parent_dir, example_base)
```
# BuzzardsBay
<!-- badges: start -->
<!-- badges: end -->
The goal of **BuzzardsBay** (R package) is to process and analyze data for the
[Continuous Oxygen Monitoring in Buzzards Bay (COMBB) project](https://www.woodwellclimate.org/project/combb/).
This package is split into two modules,
the **QC Module**,
which prepares calibrated data and produces a report in preparation
for a quality control editing of deployment files,
and the **Analysis Module**,
which stitches deployment files for a site and year into a set of result files,
produces a daily statistical summary file,
and a report with seasonal statistics and a set of graphs.
## Installation
You can install or update **BuzzardsBay** from [GitHub](https://github.com/) with:
```{r install, eval=FALSE}
# install.packages("devtools")
devtools::install_github("UMassCDS/BuzzardsBay")
```
## QC Module
The primary function in the QC module is `qc_deployment()`
it reads in calibrated data for a deployment
and generates files containing merged and flagged copies
of the data, and an html report with plots of the data.
A second function `make_deployment_report()` is typically called by
`qc_deployment()` but can also be called independently to recreate the
report from CSV files created by `qc_deployment()`.
### File structure
`qc_deployment()` and the **BuzzardsBay** package in general assume
[a specific file structure and naming convention](https://docs.google.com/document/d/1kJttcEXzpNNknGwjkVwdYHw9LzZyjJ-FaX0CrU7H7NU).
In particular:
* All the data associated with a deployment should be stored in a deployment
directory with a path `/BB_Data/<year>/<site>/<year>-<month>-<day>/` e.g.
`"~/BB_Data/2022/SR4/2022-07-08"`.
* The path to base directory and it's name (`"BB_Data"` in example) can be any
valid path.
* Within the deployment directory there should be a "Calibrated/" sub-directory
with the calibrated files in one of the three supported formats:
* Simple import based on a CSV and a YAML file.
* HOBOware output with a `.csv` and `details.txt` file for both
the conductivity and DO logger.
Likely from `U24` and `U26` loggers.
3. A single `.xlsx` file as generate with the `MX801` logger.
See help for `import_calibrated_data()` for a detailed description
of these files.
### Usage
Once files are in place we can process the calibrated data in the deployment and generate the html report with the code below.
```{r run, eval = FALSE}
library(BuzzardsBay)
deployment_dir <- "" # Provide full path here. Don't include `Calibration/`
qc_deployment(deployment_dir)
```
### Running on example data
Create an example data folder structure and deployment with `setup_example_dir()`.
If you specify a path with the `parent_dir` argument the example will be created
there. Otherwise, as in the example below, it creates the example files within
the location specified by `tempdir()`.
```{r run_example, echo = TRUE, results = "hide"}
paths <- setup_example_dir()
print(paths)
```
With that in place we can then call `qc_deployment()` on the example
deployment stored in `paths$deployment` which will also be where the output
files are created.
```{r qc_deployment, echo=TRUE, message=FALSE, results=FALSE}
a <- qc_deployment(paths$deployment)
```
The code above will only work once as `qc_deployment()` will throw an error if
the output already exists. To run again create a fresh example directory with:
`setup_example_dir(delete_old = TRUE)`
#### Output
`qc_deployment()` writes two identical CSV files, a metadata file, and an
HTML report to the deployment directory.
* `Auto_QC_<deployment>.csv` is a permanent record of this state of the process
and should not be edited.
* `Preliminary_QC_<deployment>.csv` is for subsequent review and editing.
`Preliminary_` should be dropped from the name when review is complete.
* `Metadata_<deployment>.yml` contains the metadata written as a YAML file.
* `QC_<deployment>_report.html` contains plots and summary information about the
deployment. It is a very large file so should be deleted after the QC is
complete. It can be recreated later with
`make_deployment_report(deployment_dir)` as long as the Auto QC and metadata
files are present.
## Analysis Module
The Analysis Module consists of three primary functions: `stitch_site()`, `check_site()`, and
`report_site()`. Each takes the site path as the primary argument, `/BB_Data/<year>/<site>` e.g.
`"~/BB_Data/2022/AB2"`.
1. `stitch_site()` reads the deployment files for the specified site and year and merges them into
a single file. Gaps between deployments are filled with missing values (a warning is displayed if
the gap is greater than 1 hour; this may be changed with the `max_gap` option, setting the maximum
gap (in hours) to accept silently. Three result files are written to
`/BB_Data/<year>/<site>/<combined>/`:
a. The **archive** file, e.g., `archive_AB2_2024.csv`, with all columns and values, including rejected
data. Missing data are represented by `#N/A`. This file is intended to be for long-term storage of
the complete set of QC'd data.
b. The **WPP** file, e.g., `WPP_AB2_2024.csv`, with all columns. Rejected values are replaced with
`DR` (for "data rejected"). Missing data are represented by `#N/A`. This file is in the format
that MassDEP wants.
c. The **core** file, e.g., `core_AB2_2024.csv`, with selected columns. Rejected values and missing
values are represented by blanks. This is the file used for statistics and graphs in the report, and
is also intended for sharing with collaborators.
`stitch_site()` can optionally run `report_site()` (use `report = TRUE`) to stitch and produce
a report for the site in one call.
2. `check_site()`, intended to be run after some time has elapsed since running `stitch_site()`, checks
to see if any of the deployment files have been updated or if any of the result data files have changed
(likely through inadvertent editing). It also checks for missing files. If a site passes `check_site()`,
the result data files are up to date. If it fails, result files and the report need to be recreated by
running `stitch_site()` again.
3. `report_site()` reads the `core` file produced by `stitch_site()` and produces the daily statistics
file, e.g., `combined\daily_stats_AB2_2024.csv` and the site report, `combined\site_report_AB2_2024.pdf`.
**Note: the site report is not yet implemented**. `report_site()` normally runs `check_site()` before
producing the report and throws an error if the check fails; you can override this with `check = FALSE`.