-
Notifications
You must be signed in to change notification settings - Fork 273
Implement the join_inputs_design into the current SDA workflows.
#3634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DongchenZ
wants to merge
74
commits into
PecanProject:develop
Choose a base branch
from
DongchenZ:SDA_data
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
74 commits
Select commit
Hold shift + click to select a range
8276e71
Bug fixes for the missing ensemble.size error.
05f82da
Implement joint input sampling in the SDA workflows.
884a69d
Update the joint inputs sampling function.
f7f7736
Merge branch 'PecanProject:develop' into SDA_data
DongchenZ 447b08d
Update the ensemble configs function.
92cd3f1
Clean the multisite sda function.
f2a96d1
Revert the changes.
40701fe
Revert changes.
4893c11
revert changes.
f6d473a
Revert changes.
f787a17
Add documentation and bug fixes.
c82acac
Update document.
f6f2232
Update the function to work with the newest functions.
a30c9e8
Remove the samples argument from the function.
74d5a1e
Add changes to the sda workflow.
70cfcb1
Add namespace.
1f87740
Bug fix.
8b0a640
Apply Mike's comments.
fe6bdca
Add the merge nc feature.
660186b
Improve scripts.
dd0ee2e
Bug fix.
5ddcaa7
Update the parallel registration.
ea8332a
Fix the dimension error when YN equals one.
5cb5a38
Update the parallel registration. And add the else control when it's …
fdacd5f
Convert from SoilMoist to SoilMoistFrac.
abecd42
Add the recursive file deletion.
fa18299
Revert changes to resolve conflicts.
81f035b
revert change.
3fc6907
Pull from pecan develop.
6fba821
change by make.
02ca610
Combine previous changes with the debias implementation.
3575463
Merge the debiasing feature into the parallel SDA workflow.
bebf893
Update document.
1409564
Update document.
870593f
Add the debiasing feature to the parallel job submission.
10310bc
map forecast results to the variable boundary.
300a900
Update comments.
408a390
update observation paths.
5b3cc63
Update computation configuration.
393c16e
Bug fix.
4ac6250
Bug fix.
50c2324
Revert change.
993e98b
Revert change.
78e4702
Add debias code.
d6d06f3
Move the previous debias script to the inst folder.
460e0ee
remove previous debias functions.
6025d83
Add the main SDA debias workflow.
85c59fe
Add the new debias workflow to the multi-site version.
ec38802
Add the new bias correction workflow to the sda_local script.
f1d10a1
Add the new SDA debiasing workflow to the SDA job submission function.
77bda48
Move out of the if control.
9c3fd8e
Bug fix.
e6ef100
Bug fix
087f18a
Remove the condition for the wrapping where we have more than 20 char…
5a04563
Update the runner script.
7ba598e
remove arg.
19db026
Update script.
2f14acc
Revert "Move out of the if control."
caefddf
Resolve conflicts.
6da0cd6
Revert back.
cb2379a
Revert to using the new debias workflow.
668c4b1
Update document.
e543c23
Update document.
a44399d
Update the runner script.
4701b41
Update the bias correction function.
2754b91
Update the usage of the bias correction workflow.
459c945
Merge branch 'PecanProject:develop' into SDA_data
DongchenZ 0e08b79
Update observation paths.
af5c081
Update namespace.
300dfae
Merge branch 'SDA_data' of https://github.com/DongchenZ/pecan into SD…
c8a8206
Modify the test to match the change in the logger.message function.
441f732
Change the default value of wrap in the logger message function.
9099aa0
Revert back.
986930b
Merge branch 'PecanProject:develop' into SDA_data
DongchenZ File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,143 @@ | ||
| #' @description | ||
| #' This function helps to correct the forecasts' biases based on | ||
| #' ML (random forest) training on the previous time point. | ||
| #' @title sda_bias_correction | ||
| #' | ||
| #' @param site.locs data.frame: data.frame that contains longitude and latitude in its first and second column. | ||
| #' @param t numeric: the current number of time points (e.g., t=1 for the beginning time point). | ||
| #' @param pre.X data.frame: data frame of model forecast at the previous time point | ||
| #' that has n (ensemble size) rows and n.var (number of variables) times n.site (number of locations) columns. | ||
| #' (e.g., 100 ensembles, 4 variables, and 8,000 locations will end up with data.frame of 100 rows and 32,000 columns) | ||
| #' @param X data.frame: data frame of model forecast at the current time point. | ||
| #' @param obs.mean List: lists of date times named by time points, which contains lists of sites named by site ids, | ||
| #' which contains observation means for each state variables of each site for each time point. | ||
| #' @param state.interval matrix: containing the upper and lower boundaries for each state variable. | ||
| #' @param cov.dir character: physical path to the directory contains the time series covariate maps. | ||
| #' @param py.init R function: R function to initialize the python functions. Default is NULL. | ||
| #' the default random forest will be used if `py.init` is NULL. | ||
| #' @param pre.states list: containing previous covariates for each location. | ||
| #' | ||
| #' @return list: the current X after the bias-correction; the ML model for each variable; predicted residuals. | ||
| #' | ||
| #' @author Dongchen Zhang | ||
| #' @importFrom dplyr %>% | ||
|
|
||
| sda_bias_correction <- function (site.locs, | ||
| t, pre.X, X, | ||
| obs.mean, | ||
| state.interval, | ||
| cov.dir, | ||
| pre.states, | ||
| py.init = NULL) { | ||
| # if we have prescribed python script to use. | ||
| if (!is.null(py.init)) { | ||
| # load python functions. | ||
| py <- py.init() | ||
| } | ||
| # grab variable names. | ||
| var.names <- rownames(state.interval) | ||
| # create terra spatial points. | ||
| pts <- terra::vect(cbind(site.locs$Lon, site.locs$Lat), crs = "epsg:4326") | ||
| # grab the current year. | ||
| y <- lubridate::year(names(obs.mean))[t] | ||
| # if we don't have previous extracted information. | ||
| # grab the covariate file path. | ||
| cov.file.pre <- list.files(cov.dir, full.names = T)[which(grepl(y-1, list.files(cov.dir)))] # previous covaraites. | ||
| # extract covariates for the previous time point. | ||
| cov.pre <- terra::extract(x = terra::rast(cov.file.pre), y = pts)[,-1] # remove the first ID column. | ||
| # factorize land cover band. | ||
| if ("LC" %in% colnames(cov.pre)) { | ||
| cov.pre[,"LC"] <- factor(cov.pre[,"LC"]) | ||
| } | ||
| # extract covariates for the current time point. | ||
| cov.file <- list.files(cov.dir, full.names = T)[which(grepl(y, list.files(cov.dir)))] # current covaraites. | ||
| cov.current <- terra::extract(x = terra::rast(cov.file), y = pts)[,-1] # remove the first ID column. | ||
| complete.inds <- which(stats::complete.cases(cov.current)) | ||
| cov.current <- cov.current[complete.inds,] | ||
| # factorize land cover band. | ||
| if ("LC" %in% colnames(cov.current)) { | ||
| cov.current[,"LC"] <- factor(cov.current[,"LC"]) | ||
| } | ||
| cov.names <- colnames(cov.current) # grab band names for the covariate map. | ||
| # loop over variables. | ||
| # initialize model list for each variable. | ||
| models <- res.vars <- vector("list", length = length(var.names)) %>% purrr::set_names(var.names) | ||
| for (v in var.names) { | ||
| message(paste("processing", v)) | ||
| # train residuals on the previous time point. | ||
| # grab column index for the current variable. | ||
| inds <- which(grepl(v, colnames(pre.X))) | ||
| # grab observations for the current variable. | ||
| obs.v <- obs.mean[[t-1]] %>% purrr::map(function(obs){ | ||
| if (is.null(obs[[v]])) { | ||
| return(NA) | ||
| } else { | ||
| return(obs[[v]]) | ||
| } | ||
| }) %>% unlist | ||
| # calculate residuals for the previous time point. | ||
| res.pre <- colMeans(pre.X[,inds]) - obs.v | ||
| # prepare training data set. | ||
| ml.df <- cbind(cov.pre, colMeans(pre.X)[inds], res.pre) | ||
| colnames(ml.df)[length(ml.df)-1] <- "raw_dat" # rename the column name. | ||
| ml.df <- rbind(pre.states[[v]], ml.df) # grab previous covariates. | ||
| ml.df <- ml.df[which(stats::complete.cases(ml.df)),] | ||
| pre.states[[v]] <- ml.df # store the historical covariates for future use. | ||
| # prepare predicting covariates. | ||
| cov.df <- cbind(cov.current, colMeans(X)[inds[complete.inds]]) | ||
| colnames(cov.df)[length(cov.df)] <- "raw_dat" | ||
| if (nrow(ml.df) == 0) next # jump to the next loop if we have zero records. | ||
| if (is.null(py.init)) { | ||
| # random forest training. | ||
| formula <- stats::as.formula(paste("res.pre", "~", paste(cov.names, collapse = " + "))) | ||
| model <- randomForest::randomForest(formula, | ||
| data = ml.df, | ||
| ntree = 1000, | ||
| na.action = stats::na.omit, | ||
| keep.forest = TRUE, | ||
| importance = TRUE) | ||
| var.imp <- randomForest::importance(model) | ||
| models[[v]] <- var.imp # store the variable importance. | ||
| # predict residuals for the current time point. | ||
| res.current <- stats::predict(model, cov.df) | ||
| } else { | ||
| # using functions from the python script. | ||
| # training. | ||
| fi_ret <- py$train_full_model( | ||
| name = as.character(v), # current variable name. | ||
| X = as.matrix(ml.df[,-length(ml.df)]), # covariates + previous forecast means. | ||
| y = as.numeric(ml.df[["res.pre"]]), # residuals. | ||
| feature_names = colnames(ml.df[,-length(ml.df)]) | ||
| ) | ||
| # predicting. | ||
| res.current <- py$predict_residual(as.character(v), as.matrix(cov.df)) | ||
| # store model outputs. | ||
| # weights. | ||
| w_now <- try(py$get_model_weights(as.character(v)), silent = TRUE) | ||
| w_now <- min(max(as.numeric(w_now), 0), 1) | ||
| w_named <- c(KNN = w_now, TREE = 1 - w_now) | ||
| # var importance. | ||
| fi_ret <- tryCatch(reticulate::py_to_r(fi_ret), error = function(e) fi_ret) | ||
| fn <- as.character(unlist(fi_ret[["names"]], use.names = FALSE)) | ||
| fv <- as.numeric(unlist(fi_ret[["importances"]], use.names = FALSE)) %>% purrr::set_names(fn) | ||
| models[[v]] <- list(weights = w_named, var.imp = fv) # store the variable importance. | ||
| } | ||
| # assign NAs to places with no observations in the previous time point. | ||
| res <- rep(NA, length(obs.v)) %>% purrr::set_names(unique(attributes(X)$Site)) | ||
| res[complete.inds] <- res.current | ||
| res[which(is.na(obs.v))] <- NA | ||
| res.vars[[v]] <- res | ||
| # correct the current forecasts. | ||
| for (i in seq_along(inds)) { | ||
| if (is.na(res[i])) next | ||
| X[,inds[i]] <- X[,inds[i]] - res[i] | ||
| } | ||
| } | ||
| # map forecasts towards the prescribed variable boundaries. | ||
| for(i in 1:ncol(X)){ | ||
| int.save <- state.interval[which(startsWith(colnames(X)[i], var.names)),] | ||
| X[X[,i] < int.save[1],i] <- int.save[1] | ||
| X[X[,i] > int.save[2],i] <- int.save[2] | ||
| } | ||
| return(list(X = X, models = models, res = res.vars, pre.states = pre.states)) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file was deleted.
Oops, something went wrong.
52 changes: 0 additions & 52 deletions
52
modules/assim.sequential/man/debias_get_covariates_for_date.Rd
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the parallel registration.