Skip to content

Commit

Permalink
Fixed bug with empty runs in parallel bootstrap
Browse files Browse the repository at this point in the history
  • Loading branch information
Holmin authored and Holmin committed Feb 8, 2019
1 parent 473a87f commit 7e5c65e
Show file tree
Hide file tree
Showing 6 changed files with 97 additions and 87 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: Rstox
Title: Running Stox functionality independently in R
Version: 1.10
Version: 1.10.1
Authors@R: c(
person("Arne Johannes", "Holmin", role = c("aut","cre"), email = "[email protected]"),
person("Edvin", "Fuglebakk", role = "ctb"),
Expand Down
79 changes: 79 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -735,6 +735,8 @@ Fixed bug with parallel R sessions run by StoX writing to and sourcing the same

########## Version 1.10 (2019-02-07) ##########

(Identical to NEWS for Rstox_1.10)

Added support for NMD biotic API version 3 and biotic version 3.0 in getNMDdata(). Removed search for 'serialno'. Only full year files can be downloaded, but filtering on species through 'tsn' and serial number through 'serialno' is incorporated in the StoX project generated at download.

Added support for NMD reference API version 2 in getMNDinfo() and getNMDdata(). The output from getMNDinfo("platform") now coincides with that from getMNDinfo("v"), being a data frame with all entries of all platforms, where some platforms can have multiple entries. Before, getMNDinfo("v") returned only the last entry of each platform.
Expand Down Expand Up @@ -807,3 +809,80 @@ Fixed bug in createProject(), where template now can be given as the first eleme

Fixed bug with parallel R sessions run by StoX writing to and sourcing the same temporary R script at once. This is solved by the argument tempRScriptFileName.


########## Version 1.11 (2019-02-08) ##########

Added support for NMD biotic API version 3 and biotic version 3.0 in getNMDdata(). Removed search for 'serialno'. Only full year files can be downloaded, but filtering on species through 'tsn' and serial number through 'serialno' is incorporated in the StoX project generated at download.

Added support for NMD reference API version 2 in getMNDinfo() and getNMDdata(). The output from getMNDinfo("platform") now coincides with that from getMNDinfo("v"), being a data frame with all entries of all platforms, where some platforms can have multiple entries. Before, getMNDinfo("v") returned only the last entry of each platform.

Changed the output of getNMDinfo("taxa") to have one column per species, with the scientific, norwegian, english and russian name as columns, as well as a column OldNames for old (non-preffered) names, given as strings such as "Norwegian: rødkrabbe, Scientific: Geryon tridens, English: red crab".

Added the functions prepareRECA(), runRECA() for preparing and running ECA (estimated catch at age), and plotRECA() and reportRECA() for plotting and reporting the results. These functions depend on the Reca package, which currently only works on Mac and Windows 7 (not Windows 10).

Added the functions writeBioticXML() and writeAcousticXML() (with supporting functions), which takes a data frame as input and writes the data frame to an XML file given XML-schema (XSD) files shipped along with Rstox. The input data frame has one column per combination of variable and attribute, where the attributes are coded into the column names in the following manner: variableName..attributeName.attributeValue. If there are variables with identical names at different levels in the XMl hierarchy, the level (i.e., the name of the parent node) can be given in the column name by separation of a dot: variableName.level.

Added the option getMNDinfo("cruise") or getMNDinfo("cruises"), for returning a list of all cruises containing biotic data.

Added the function prepareDATRAS() for converting a biotic file to a DATRAS csv file. The function is included in the StoX template DATRAS conversion.

Added the function closeAllProjects().

Changed the parameter 'margin' in plotStratum() to the new parameter 'zoom', which has default 1.5 and where 1 indicates no zooming, The old parameter named 'zoom' has been removed along with the parameter 'google' for Google-maps type of plots due to new requirements for users to create an API project and get this authorized.

Added the function exportDatras() which takes the output from a baseline model including the StoX function DATRASConvert and exports a DATRAS file. In the current version only one biotic file can be included in the project.

Increased default memory from 2 to 6 GB to accommodate ECA projects, which are generally large.

Fixed bug in getNMDinfo("v"), where the returned data frame only contained NA.

Fixed bug in surveyPlanner(), where the output data frame 'Transect' contained stratum indices instead of stratum names.

Fixed bug in plotStratum(), where transect=FLASE did not suppress plotting transects, whereas transect=NULL did. In the new version all values other than TRUE suppresses plotting transects.

Fixed bug in getNMDinfo("platform") and getNMDinfo("v"), where the columns validFrom and validTo did not correspond to the correct platform codes/names due to a sorting of these dates.

Fixed bug in getNMDinfo, where UTF-8 encoding was ineffective (the corresponding bugfix in Version 1.3.2 was discovered to be ineffective).

Removed error in readStrataPolygons(), where multipolygons were not supported. Now only the first polygon of multipolygons are kept in the outputs 'lonlat' and 'lonlatAll', and all are kept in 'lonlatFull'.

Changed surveyPlanner() to discard strata with zero effort, but keep those strata in the Input list for plotting.

Added the parameter "JavaMem" in bootstrap functions, used for setting the Java memory of each bootstrap replicate (useful to reduce memory for parallelized bootstrapping).

Removed LICENSE file, as this is unambigously stated in the DESCRIPTION file.

Changed from using 'formulardato' to using 'sistefangstdato' in baseline2eca().

Fixed bug in runBaseline(), where exportCSV==TRUE now implies resetting (and rerunning) the baseline.

Changed 'covariateLink' in the output from baseline2eca() to link to the union of the covariate values in biotic and landing (and not only to landing as before). Also changed name of the thrid column of the data frames in covariateDefinition from "Value" to "Definition".

Removed the covariate 'season', which was used in conjunction to 'year'. Now, these are concatinated in the temporal covariate.

Changed the name of baseline2eca() to getCovData(). The old function kept for backwards compatibility.

Removed the option 'quiet' in downloadXML(), which is replaced by 'msg'.

Avoided strange behavior by rgeos::readWKT() on Windows, where long strings were "unable to parse" (500 charaters).

Fixed bug in runBaseline(), where when reset=TRUE, and startProcess was a later process than the first process to be changed using parlist or ..., the baseline was not run from that first process, but from the specified startProcess. Using parlist or ... now forces running the baseline from the first necessary process.

Cleaned up how startProcess and endProcess interacts with 'reset' and the unrun and changed processes. The documentation updated with the following: The parameters startProcess and endProcess specify the range of processes to run (startProcess : endProcess). If the model has been run already for all or some of the processes in this range, only the unrun processes are run. If there are processes in or prior to this range for which parameters have changed, all processes from the first changed process and throughout the range startProcess : endProcess are rerun. The range of processes to run is extended to any changed processes beyond the range. If reset=TRUE, the range or processes to run is extended to the range startProcess : endProcess regardless of whether the processes have previouslu been run.

Added the predefined values as attributes to the parameter values returned from getBaseline().

Changed default of keepMissing to TRUE in readBaselineParametersJava(), thus including all parameters, even those which are not set when running getBaseline().

Renamed the parameter 'API' to 'server' in getNMDinfo() and getNMDdata(). No backwards compatibility.

Fixed bug with Java v11.

Fixed bug with subset in getNMDdata().

Fixed bug in createProject(), where template now can be given as the first element of a list of processes in 'model'.

Fixed bug with parallel R sessions run by StoX writing to and sourcing the same temporary R script at once. This is solved by the argument tempRScriptFileName.



3 changes: 2 additions & 1 deletion R/onAttach.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
.onAttach <- function(libname, pkgname){

packageStartupMessage("Rstox_1.10
packageStartupMessage("Rstox_1.10.1
**********
WARNING: This version of Rstox is an unofficial/developer version and bugs should be expected.
If problems with Java Memory such as java.lang.OutOfMemoryError occurs, see ?setJavaMemory.
**********
", appendLF=FALSE)
Expand Down
2 changes: 1 addition & 1 deletion R/rstox_base.r
Original file line number Diff line number Diff line change
Expand Up @@ -1423,7 +1423,7 @@ writeMessageToConsoleOrFile <- function(text, msg, add.time=FALSE){
}
}


# Function for setting the name of the temporary R script written by StoX. These scripts are placed in dirname(tempdir()):
setTempRScriptFileName <- function(projectName, tempRScriptFileName, msg=TRUE){

# Get projet object:
Expand Down
16 changes: 9 additions & 7 deletions R/rstox_boostrap.r
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ bootstrapOneIteration <- function(i, projectName, assignments, strataNames, psuN

# Get the baseline object (run if not already run), as this is needed to insert biostation weighting and meanNASC values into. The warningLevel = 1 continues with a warning when the baseline encounters warnings:
# 2019-02-08: Added the tempRScriptFileName=paste("tempRScriptFile", i, sep="_"), which ensures that R-functions run by StoX are sourced from differently named files, so that we avoid any complications caused by mustiple parallel sessions writing to and sourcing the same file at the same time:
temp <- runBaseline(projectName=projectName, out="baseline", msg=FALSE, warningLevel=1, ...)
temp <- runBaseline(projectName=projectName, out="baseline", msg=FALSE, warningLevel=1, tempRScriptFileName=paste("tempRScriptFile", i, sep="_"), ...)

# Perform sampling drawing and replacement by stratum
BootWeights <- data.frame()
Expand Down Expand Up @@ -88,11 +88,13 @@ bootstrapOneIteration <- function(i, projectName, assignments, strataNames, psuN

# Run the sub baseline within Java. The argument reset=TRUE is essensial to obtain the bootstrapping:
# 2019-02-08: Added the tempRScriptFileName=paste("tempRScriptFile", i, sep="_"), which ensures that R-functions run by StoX are sourced from differently named files, so that we avoid any complications caused by mustiple parallel sessions writing to and sourcing the same file at the same time:


getBaseline(projectName, startProcess=startProcess, endProcess=endProcess, proc=endProcess, input=FALSE, msg=FALSE, save=FALSE, reset=TRUE, drop=FALSE, warningLevel=1, tempRScriptFileName=paste("tempRScriptFile", i, sep="_"), ...)$outputData


# Test done to identify the source of the random failure of one core when running bootstrap in parallel. The conclusion was that the error occurs in AbundanceByLength, which may read or write to some common resource for all cores. See bootstrapParallel():
# getBaseline(projectName, startProcess=startProcess, endProcess=endProcess, proc=c("AbundanceByLength", "SuperIndAbundance"), input=FALSE, msg=FALSE, save=FALSE, reset=TRUE, drop=FALSE, warningLevel=1, tempRScriptFileName=paste("tempRScriptFile", i, sep="_"), ...)$outputData
#getBaseline(projectName, startProcess=startProcess, endProcess=endProcess, proc=c("AbundanceByLength", "SuperIndAbundance"), input=FALSE, msg=FALSE, save=FALSE, reset=TRUE, drop=FALSE, warningLevel=1, tempRScriptFileName=paste("tempRScriptFile", i, sep="_"), ...)$outputData
# "TotalLengthDist", "AcousticDensity", "MeanDensity_Stratum", "SumDensity_Stratum", "AbundanceByLength", "IndividualDataStations", "IndividualData", "SuperIndAbundance"
}

Expand Down Expand Up @@ -196,11 +198,11 @@ bootstrapParallel <- function(projectName, assignments, psuNASC=NULL, stratumNAS
}

# Test done to identify the source of the random failure of one core when running bootstrap in parallel. The conclusion was that the error occurs in AbundanceByLength, which may read or write to some common resource for all cores:
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$TotalLengthDist)))) Crash at run 5
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$AcousticDensity)))) No crashes after 20 runs
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$MeanDensity_Stratum)))) Crash at run 5
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$SumDensity_Stratum)))) Crash at run 2, 3
# print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$AbundanceByLength)))) Crash at run 4, AbundanceByLength also returning empty data, thus being the source of the error.
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$TotalLengthDist)))) # Crash at run 5
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$AcousticDensity)))) # No crashes after 20 runs
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$MeanDensity_Stratum)))) # Crash at run 5
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$SumDensity_Stratum)))) # Crash at run 2, 3
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$AbundanceByLength)))) # Crash at run 4, AbundanceByLength also returning empty data, thus being the source of the error.
#print(as.data.frame(sapply(seq_along(out), function(x) dim(out[[x]]$SuperIndAbundance))))
#return(list())

Expand Down
Loading

0 comments on commit 7e5c65e

Please sign in to comment.