-
Notifications
You must be signed in to change notification settings - Fork 8
2.7 Timeseries padder: variable vs. constant
The QAQC module needs more data before and after the time window of interest, so the timeseries padder preps the extra data around the time window of interest. For example, say the pipeline wants to just process data on 2020-01-04 00:00:00 through 23:59:59. The padder would draw in data from 2020-01-03 and 2020-01-05, andthat extra data would facilitate QAQC scripts that need 'edge' data outside of 2020-01-04.
The [SHORT-NAME]_timeseries_padder.yaml
file may call a constant or variable timeseries padder python module.
You want a constant timeseries padder when you provide a specific time window (WINDOW_SIZE
) that pads data on each side of the time window.
You want a variable timeseries padder when the location information drawn from the thresholds.json
repo includes data rate.
The constant timeseries padder python module timeseries_padder.timeseries_padder.constant_pad_main
uses variables designated under env:
(e.g. OUT_PATH
, WINDOW_SIZE
, YEAR_INDEX
, etc.) to designate arguments for the module. See an example of how the env:
is designated for the constant timeseries padder below:
transform:
image_pull_secrets:
- battelleecology-quay-read-all-pull-secret
image: quay.io/battelleecology/timeseries_padder:26
cmd:
- "/bin/bash"
stdin:
- "#!/bin/bash"
- python3 -m timeseries_padder.timeseries_padder.constant_pad_main
env:
OUT_PATH: /pfs/out
WINDOW_SIZE: '1'
LOG_LEVEL: INFO
RELATIVE_PATH_INDEX: '3'
YEAR_INDEX: '4'
MONTH_INDEX: '5'
DAY_INDEX: '6'
LOCATION_INDEX: '7'
DATA_TYPE_INDEX: '8'
The variable timeseries padder python module does not use the env
specified in a yaml file, but rather arguments passed via the python command using the argparse
python package. This same approach is also used in the [SHORT-NAME]_egress.yaml
. The following example shows the corresponding variable timeseries padder employed in the [SHORT-NAME]_timeseries_padder.yaml
. Note how timeseries_padder.timeseries_padder.variable_pad_main
is now called, followed by the arguments that will be parsed in lieu of being specified in env:
.
transform:
image_pull_secrets:
- battelleecology-quay-read-all-pull-secret
image: quay.io/battelleecology/timeseries_padder:31
cmd:
- "/bin/bash"
stdin:
- "#!/bin/bash"
- python3 -m timeseries_padder.timeseries_padder.variable_pad_main --yearindex 4 --monthindex 5 --dayindex 6 --locindex 7 --subdirindex 8
env:
OUT_PATH: /pfs/out
LOG_LEVEL: INFO