Skip to content

Commit e591d7f

Browse files
authored
Merge pull request #5 from jk-brown/2-edits-to-workflow-documents-to-prep-for-publication
Updating workflow for reproducability
2 parents cd21e18 + 20db10d commit e591d7f

12 files changed

+303
-259
lines changed

workflows/01_building_dataset_from_Sherwood2020.Rmd

+2-4
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,12 @@ knitr::opts_chunk$set(echo = TRUE)
99

1010
# Goal
1111

12-
Build and save dataset of ECS values from different ECS configuraiton distirbutions as provided by S20 supplemental information.
12+
Build and save data set of ECS values from different ECS configuration distributions as provided by S20 supplemental information.
1313

1414
# Building data set
1515

1616
Here we are building ECS data sets for each of the evidence configurations of interest. We are using data from [Sherwood et al. 2020](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019RG000678) henceforth referred to as S20. The data was drawn from the supplemental information in S20 and represents ECS percentile estimates from likelihood distributions quantified in S20 for each of the five evidence configurations (each containing different combinations of lines of evidence). There are other configurations available in S20 beyond the five we have chosen here. The percentile estimates reported in this data represent the 5th, 10th, 17th, 20th, 25th, 50th, 75th, 80th, 83rd, 80th, and 95th percentile, as well as the mode and the mean.
1717

18-
Names of each vector list are coded to represent the evidence configuration the data are associated with.
19-
2018
```{r}
2119
ecs_data_list <- list(
2220
"Baseline" = c(
@@ -90,5 +88,5 @@ ecs_data_list <- list(
9088
Write data frame and store in the `data` directory.
9189

9290
```{r}
93-
saveRDS(ecs_data_list, "data/ecs_posterior_data_S20.RDS")
91+
saveRDS(ecs_data_list, "data/ecs_prior_data_S20.RDS")
9492
```

workflows/02_fitting_and_sampling_ecs_distributions.Rmd

+5-50
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@ knitr::opts_chunk$set(echo = TRUE)
1111

1212
The goal of this script is to fit and easy to sample parametric distribution to the ECS values provided in S20. We will then sample the distributions, save the data, and visualize them.
1313

14+
## Source all setup, mapping, and helper functions stored in `sourced`
1415
```{r}
15-
library(MASS)
16+
source("sourced/source_all.R")
1617
```
1718

1819
# Fit Distribution to Sherwood et al. 2020 data
@@ -25,7 +26,7 @@ Some notes on `fitdistr()` usage: The function fits a maximum likelihood distrib
2526

2627
```{r}
2728
# read in the ecs_data and store as a list
28-
ecs_data <- readRDS("data/ecs_posterior_data_S20.RDS")
29+
ecs_data <- readRDS("data/ecs_prior_data_S20.RDS")
2930
3031
```
3132

@@ -70,10 +71,6 @@ The result is a new list (`ecs_sample_list`) of data frames, one for each eviden
7071

7172
# Visualize Simulated Samples
7273

73-
```{r}
74-
library(ggplot2)
75-
```
76-
7774
Once the ECS samples are produced and stored in a list of data frames (`ecs_sample_list`), we visualize the sample distribution with `ggplot2`.
7875

7976
Before working in `ggplot2`, we build a single data frame with all samples and the name of the evidence configurations fro each sample to make it easier to plot.
@@ -125,46 +122,9 @@ quantile(ecs_sample_list$Baseline_Emergent_constraints$ECS, probs = c(0.05,0.50,
125122

126123
Here we want to determine the optimal distribution hyperparameters for a skew normal distribution that meets specified quantile requirements. For this task we utilize the `sn` and `optimx` packages. The objective it to fit a skew normal distribution such that at quantile 5%, 50%, and 95% reflect IPCC ECS range (3, 2-5).
127124

128-
129-
We need the following packages:
130-
```{r}
131-
# Load the required libraries
132-
library(sn)
133-
library(optimx)
134-
135-
```
136-
137125
## Define objective function
138126

139-
Define an objective function to quantify the deviation of the quantiles for the skew normal distribution from the desired IPCC values.
140-
141-
```{r}
142-
# Objective function to find the optimal xi, omega, and alpha
143-
objective_function <- function(params) {
144-
xi <- params[1]
145-
omega <- params[2]
146-
alpha <- params[3]
147-
148-
# Calculate the quantiles for the given parameters
149-
q5 <- qsn(0.05, xi = xi, omega = omega, alpha = alpha)
150-
q50 <- qsn(0.50, xi = xi, omega = omega, alpha = alpha)
151-
q95 <- qsn(0.95, xi = xi, omega = omega, alpha = alpha)
152-
153-
# Desired quantiles
154-
desired_q5 <- 2.0
155-
desired_q50 <- 3.0
156-
desired_q95 <- 5.0
157-
158-
# Calculate the squared differences
159-
error <- (q5 - desired_q5)^2 + (q50 - desired_q50)^2 + (q95 - desired_q95)^2
160-
161-
# Print intermediate values for debugging
162-
cat("xi:", xi, "omega:", omega, "alpha:", alpha, "error:", error, "\n")
163-
164-
return(error)
165-
}
166-
167-
```
127+
We define an objective function to quantify the deviation of the quantiles for the skew normal distribution from the desired IPCC values. This function is stored in `sourced/helper_functions.R`.
168128

169129
Here, the hyperparameters (`xi`, `omega`, and `alpha`) represent the location, scale, and shape of the skew normal distribution, respectively. The function computes quantiles for the 5th, 50th, and 95th percentiles from the skew normal distribution provided initial hyperparameters (`qsn()`). It then compares these values to the targeted IPCC values (3.0, 2.0-5.0) by calculating squared differences. This approach allows us to determine how well the hyperparameters of the skew normal distribution align with the desired percentiles.
170130

@@ -233,7 +193,7 @@ print(likely)
233193
```
234194
Here we can see that the very likely distribution aligns exactly with the IPCC expectation. However, the likely range is slightly more constrained that what has been proposed by IPCC AR6.
235195

236-
## Visualize the distribution
196+
## preliminary visualization of the distribution
237197
```{r}
238198
# convert the data to data frame
239199
IPCC_est_df <- data.frame(ECS = data)
@@ -249,18 +209,13 @@ ECS_sn <-
249209
xlim = c(2.0, 5.0), fill = "#F21A00", alpha = 0.5) + # Shade area between 2.0 and 5.0
250210
geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
251211
xlim = c(2.6, 3.43), fill = "#F21A00", alpha = 0.7) +
252-
# geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
253-
# xlim = c(0, 2.0), fill = "grey", alpha = 0.5) +
254-
# geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
255-
# xlim = c(5.0, 8.0), fill = "grey", alpha = 0.5) +
256212
geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
257213
xlim = c(3.0, 3.01), fill = "black", alpha = 0.5)
258214
259215
260216
ECS_sn
261217
```
262218

263-
264219
## Add the IPCC AR6 distribution value to ECS sample list
265220
```{r}
266221
# add estimated IPCC ECS values to ecs_sample_list

workflows/03_plotting_ecs_samples_S1.Rmd

+13-31
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,6 @@ knitr::opts_chunk$set(echo = TRUE)
1111

1212
Produce a figure with the ECS samples. This may serve as a good supplemental figure for the paper to show the distribution shapes for the evidence configurations from S20, based on gamma distribution assumption.
1313

14-
```{r}
15-
library(tidyverse)
16-
```
1714
# Produce figure
1815

1916
We produce PDF curves in a panel using the color palette and then save the figure in the `figures` directory.
@@ -27,54 +24,39 @@ ecs_sample_list <- readRDS("data/ecs_samples_plot_data.RDS")
2724
ecs_df_S1 <- data.frame(
2825
2926
value = unlist(ecs_sample_list),
30-
evidence = rep(names(ecs_sample_list), each = n),
27+
scenario = rep(names(ecs_sample_list), each = n),
3128
row.names = NULL
3229
3330
)
3431
3532
# recode evidence configurations for aesthetics
36-
ecs_df_S1$evidence <- ecs_df_S1$evidence %>%
37-
recode(
38-
IPCC = "IPCC AR6",
39-
No_Process = "No Process",
40-
No_Historical = "No Historical",
41-
No_Paleoclimate = "No Paleoclimate",
42-
Baseline_Emergent_constraints = "Baseline + Emergent constraints"
43-
)
44-
45-
# edit order of the facet panels
46-
facet_order <- c(
47-
"IPCC AR6",
48-
"Baseline",
49-
"No Historical",
50-
"No Process",
51-
"No Paleoclimate",
52-
"Baseline + Emergent constraints"
53-
)
33+
ecs_df_S1_fig_data <- recode_scenarios(ecs_df_S1)
5434
5535
# convert the evidence configuration to factor with the facet order we want
56-
ecs_df_S1$evidence <- factor(ecs_df_S1$evidence, levels = facet_order)
36+
ecs_df_S1_fig_data$scenario <- factor(ecs_df_S1_fig_data$scenario, levels = scenario_order)
5737
5838
# plot with ggplot"
5939
ggplot() +
60-
geom_density(data = ecs_df_S1,
40+
geom_density(data = ecs_df_S1_fig_data,
6141
aes(x = value,
62-
color = evidence,
63-
fill = evidence),
42+
color = scenario,
43+
fill = scenario),
6444
linewidth = 0.75,
6545
alpha = 0.2,
6646
bw = 0.3) +
6747
scale_color_manual(values = ECS_COLORS,
68-
name = "ECS Configurations") +
48+
guide = "none") +
6949
scale_fill_manual(values = ECS_COLORS,
70-
name = "ECS Configurations") +
50+
guide = "none") +
7151
scale_x_continuous(breaks = seq(from = 0, to = 9,
7252
by = 1)) +
7353
labs(x = "Equilibrium Climate Sensitivity Value",
74-
y = "Densisty") +
75-
facet_wrap(~ evidence) +
54+
y = "Density") +
55+
facet_wrap(~ scenario) +
7656
theme_light() +
77-
theme(legend.position = "bottom")
57+
theme(legend.position = "bottom",
58+
strip.text = element_text(size = 10, color = "black"),
59+
strip.background = element_rect(fill = "white"))
7860
7961
ggsave("figures/S1_fig_ECS_distributions.png",
8062
device = "png",

workflows/04_running_the_model_computing_probabilities.Rmd

+32-14
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,6 @@ knitr::opts_chunk$set(echo = TRUE)
1111

1212
The goal of this script is to run Matilda with each of the ECS distributions we sampled prior.
1313

14-
```{r}
15-
library(matilda)
16-
options(matilda.verbose = FALSE)
17-
library(parallel)
18-
library(tidyverse)
19-
```
20-
2114
# Using ECS samples to run Matilda
2215

2316
We use ECS values sampled from the estimated parametric distributions from S20 to propagate the varying levels of uncertainty associated with evidence configurations to probabilistic climate projections. This provides an opportunity to better understand how different ECS evidence configurations affect temperature trajectories from a simple carbon cycle climate model.
@@ -45,8 +38,6 @@ The result will be a new core object that can will be a required input to run th
4538

4639
# Generate values for other model parameters
4740

48-
**This needs to be edited** I think for this experiment it will be more straight forward to keep all parameters aside from ECS fixed. This will reduce the complexity that is introduced from parameter interactions and will isolate the affect of the ECS distributions from different ECS evidence configurations. Below are notes that will be edited accordingly and the code chunk in this section will be skipped.
49-
5041
Matilda works by running Hector multiple times to build a perturbed parameter ensemble, thus applying parameter uncertainty to model outputs. We need to produce parameter values to accompany the ECS values we sampled in previous steps of the workflow.
5142

5243
Parameter sets are generated in Matilda using `generate_params`. We use this function to produce `n` initial parameter sets (`init_params`). In this analysis I do not think I am going to run samples of the other parameters. Instead we can isolate the behavior of Hector to different ECS distributions by using Hector defaults for all parameters aside from ECS.
@@ -182,7 +173,11 @@ gmst_criteria_data <- gmst_criteria_data %>% rename("value" = "Anomaly..deg.C.",
182173
# Temperature anomaly error vector
183174
gmst_error <- read.csv("data-raw/annual_gmst_SD_quant.csv")
184175
temp_error_vec <- gmst_error$value
176+
```
177+
178+
Plot temperature data with time varying uncertainty error as a sanity check:
185179

180+
```{r}
186181
# for plotting add error values to criteria data
187182
gmst_criteria_data$error <- gmst_error$value
188183
ggplot() +
@@ -195,12 +190,17 @@ ggplot() +
195190
ymax = value + error),
196191
alpha = 0.2)
197192
193+
```
198194

199-
# create a new ocean c uptake criterion
195+
Create new temperature criterion:
196+
```{r}
197+
# create a GMST criterion
200198
gmst_criterion <- new_criterion("gmst", years = gmst_criteria_data$year, obs_values = gmst_criteria_data$value)
201199
202200
```
203201

202+
*The Matilda provided gmst criterion only uses a time frame 1961-2023. We create the new criterion to use the timeframe of 1850-2024.*
203+
204204
Create criterion using ocean carbon uptake from the Global Carbon Project:
205205

206206
```{r}
@@ -253,10 +253,10 @@ names(model_weights)
253253
## model_weights before culling constraint
254254
saveRDS(model_weights, "data/pre_culled_model_weights.RDS")
255255
```
256+
256257
Filter out ensemble members with near-zero weight:
257-
```{r}
258-
## CURRENTLY TESTING CONSTRAINING RESULT BY MINIMUM WEIGHT
259258

259+
```{r}
260260
# filter out ensemble members that do not meet the minimum weight constraint (> 1e-6)
261261
constrained_weights <- lapply(model_weights, function(df) {
262262
@@ -265,6 +265,7 @@ constrained_weights <- lapply(model_weights, function(df) {
265265
return(filtered_result)
266266
267267
})
268+
268269
```
269270

270271
Constrained weights can be merged with the constrained results. This would produce a list (based on ECS scenario) of the constrained model ensemble and the assigned weights for each run. However, because some of the models have been filtered out during the constraint, need to re-normalize so weights sum to 1. This way we can still use the resulting ensembles to compute metrics and probabilities accurately.
@@ -278,6 +279,10 @@ weighted_ensemble <- Map(function(a, b) {
278279
279280
}, constrained_weights, model_result)
280281
282+
```
283+
284+
Sanity check to ensure names are correct and re-weighted ensemble sums to 1.0:
285+
```{r}
281286
#sanity check names
282287
names(weighted_ensemble)
283288
@@ -299,6 +304,10 @@ norm_weights_ensemble <- lapply(names(weighted_ensemble), function(df_name) {
299304
return(df)
300305
301306
})
307+
```
308+
309+
Sanity check that element names are preserved and final weights sum to 1.0:
310+
```{r}
302311
# make sure element names are preserved
303312
names(norm_weights_ensemble) <- names(weighted_ensemble)
304313
@@ -309,9 +318,10 @@ sapply(norm_weights_ensemble, function(df) {
309318
310319
print(sum)
311320
})
321+
```
312322

313-
314-
# saving as a list does this work?
323+
```{r}
324+
# saver normalized weighted ensemble
315325
saveRDS(norm_weights_ensemble, "data/weighted_ensemble.RDS")
316326
317327
```
@@ -381,6 +391,10 @@ eoc_gsat_metrics <- Map(function(a, b){
381391
382392
}, eoc_warming_results, norm_weights_ensemble)
383393
394+
```
395+
396+
Sanity check to verify the normaized metric weights sum to 1.0:
397+
```{r}
384398
# Apply function to calculate and print sum of the specified column
385399
# Verify the final normalized weights
386400
sapply(eoc_gsat_metrics, function(df) {
@@ -390,6 +404,10 @@ sapply(eoc_gsat_metrics, function(df) {
390404
print(sum)
391405
})
392406
407+
```
408+
409+
Save weighted metric data:
410+
```{r}
393411
# save the result for future visualization
394412
saveRDS(eoc_gsat_metrics, "data/weighted_gsat_metrics.RDS")
395413

0 commit comments

Comments
 (0)