Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a better config file system for CWL/WDL options #4666

Merged
merged 51 commits into from
Dec 12, 2023

Conversation

stxue1
Copy link
Contributor

@stxue1 stxue1 commented Nov 2, 2023

This should resolve #4632

Changelog Entry

To be copied to the draft changelog by merger:

  • Updated docs for config file
  • Config file will include CWL and WDL arguments
    • Moved nonpositional CWL/WDL argument definitions to common.py
    • Config file will have all options be commented out when generated

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. But some of the docs are worded oddly, and I think we need to not have the CWL and WDL front-ends talk about each other.

I also see some optional opportunities for condensing repeated logic.

docs/running/cliOptions.rst Outdated Show resolved Hide resolved
docs/running/cliOptions.rst Outdated Show resolved Hide resolved
docs/running/cliOptions.rst Outdated Show resolved Hide resolved
src/toil/common.py Show resolved Hide resolved
src/toil/common.py Show resolved Hide resolved
Comment on lines 1621 to 1623
output_file_arguments = ["--wdlOutputFile"] + (["--outputFile", "-m"] if not suppress else [])
parser.add_argument(*output_file_arguments, dest="output_file", type=str, default=None,
help=suppress_help or "File or URI to save output JSON to.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of duplicated logic between these, and between all the CWL options. It would be nice if we had a way to repeat ourselves less somehow. Like maybe a wrapper function that knows how to namespace the options and suppress the help when appropriate? That might be too much extra work though.

@@ -1198,6 +1283,345 @@ def __call__(self, parser: Any, namespace: Any, values: Any, option_string: Any
caching.add_argument('--disableCaching', dest='enableCaching', action='store_false', help=SUPPRESS)
caching.set_defaults(disableCaching=None)

def add_cwl_options(parser: ArgumentParser, suppress: bool = True) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish the different front-ends didn't all have to have their options in one giant file here. We could consider reorganizing things so we have like src/toil/options/cwl.py, src/toil/options/common.py, etc. so we could keep the different runners' options split up even when we can't import their main files.

Comment on lines 3258 to 3272
parser = ArgParser()
addOptions(parser, jobstore_as_flag=True, cwl=True)
add_cwl_options(parser)
parser.add_argument("cwltool", type=str, help="CWL file to run.")
parser.add_argument("cwljob", nargs="*", help="Input file or CWL options. If CWL workflow takes an input, "
"the name of the input can be used as an option. "
"For example: \"%(prog)s workflow.cwl --file1 file\". "
"If an input has the same name as a Toil option, pass '--' before it.")

wdl_parser = ArgParser()
add_wdl_options(wdl_parser)
for action in wdl_parser._actions:
action.default = SUPPRESS
possible_wdl_options, _ = wdl_parser.parse_known_args(args)
if len(vars(possible_wdl_options)) != 0:
raise parser.error(f"WDL options are not allowed on the command line.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we abstract this out somehow? Maybe use the function we wrote make the default parser instead of ArgParser()'s constructor, and then move the check for options for front-ends not in use down into the options-parsing code?

We shouldn't have to know in the CWL front-end that the WDL front-end even exists.


parser = ArgParser(description='Runs WDL files with toil.')
addOptions(parser, jobstore_as_flag=True)
addOptions(parser, jobstore_as_flag=True, cwl=False, wdl=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably just say wdl=True; we don't want to know anything about the CWL front-end in the WDL front-end either.

Comment on lines 2540 to 2548
cwl_parser = ArgParser()
add_cwl_options(cwl_parser)
for action in cwl_parser._actions:
action.default = SUPPRESS
possible_cwl_options, _ = cwl_parser.parse_known_args(args)
if len(vars(possible_cwl_options)) != 0:
raise parser.error(f"CWL options are not allowed on the command line.")

options = parser.parse_args(args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the same check code as for excluding WDL options from the CWL runner, and it should be somewhere in the option parsing library code we have. Maybe we need something we import that just does this whole block, and we tell it if we support CWL options and if we support WDL options?

Copy link
Contributor

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toil-cwl-runner --help no longer lists all options

$ toil-cwl-runner --help
usage: toil-cwl-runner [-h]

options:
  -h, --help  show this help message and exit
Before this PR
usage: toil-cwl-runner [-h] [--logCritical] [--logError] [--logWarning]
                       [--logDebug] [--logInfo] [--logOff]
                       [--logLevel {Critical,Error,Warning,Debug,Info,critical,error,warning,debug,info,CRITICAL,ERROR,WARNING,DEBUG,INFO}]
                       [--logFile LOGFILE] [--rotatingLogging]
                       [--jobstore JOBSTORE] [--workDir WORKDIR]
                       [--coordinationDir COORDINATION_DIR] [--noStdOutErr]
                       [--stats] [--clean {always,onError,never,onSuccess}]
                       [--cleanWorkDir {always,onError,never,onSuccess}]
                       [--clusterStats [CLUSTERSTATS]] [--restart]
                       [--batchSystem {aws_batch,single_machine,grid_engine,lsf,mesos,slurm,torque,htcondor,kubernetes}]
                       [--disableHotDeployment]
                       [--disableAutoDeployment DISABLEAUTODEPLOYMENT]
                       [--maxJobs MAX_JOBS] [--maxLocalJobs MAX_LOCAL_JOBS]
                       [--manualMemArgs] [--runLocalJobsOnWorkers]
                       [--coalesceStatusCalls]
                       [--statePollingWait STATEPOLLINGWAIT]
                       [--batchLogsDir BATCH_LOGS_DIR]
                       [--awsBatchRegion AWS_BATCH_REGION]
                       [--awsBatchQueue AWS_BATCH_QUEUE]
                       [--awsBatchJobRoleArn AWS_BATCH_JOB_ROLE_ARN]
                       [--scale SCALE] [--dont_allocate_mem | --allocate_mem]
                       [--kubernetesHostPath KUBERNETES_HOST_PATH]
                       [--kubernetesOwner KUBERNETES_OWNER]
                       [--kubernetesServiceAccount KUBERNETES_SERVICE_ACCOUNT]
                       [--kubernetesPodTimeout KUBERNETES_POD_TIMEOUT]
                       [--symlinkImports SYMLINKIMPORTS]
                       [--moveOutputs MOVEOUTPUTS] [--caching CACHING]
                       [--provisioner {aws,gce,None}] [--nodeTypes NODETYPES]
                       [--maxNodes MAXNODES] [--minNodes MINNODES]
                       [--targetTime TARGETTIME] [--betaInertia BETAINERTIA]
                       [--scaleInterval SCALEINTERVAL]
                       [--preemptibleCompensation PREEMPTIBLECOMPENSATION]
                       [--nodeStorage NODESTORAGE]
                       [--nodeStorageOverrides NODESTORAGEOVERRIDES]
                       [--metrics METRICS]
                       [--assumeZeroOverhead ASSUME_ZERO_OVERHEAD]
                       [--defaultMemory DEFAULTMEMORY] [--defaultCores FLOAT]
                       [--defaultDisk INT]
                       [--defaultAccelerators ACCELERATOR[,ACCELERATOR...]]
                       [--defaultPreemptible [BOOL]] [--maxCores INT]
                       [--maxMemory INT] [--maxDisk INT]
                       [--retryCount RETRYCOUNT]
                       [--enableUnlimitedPreemptibleRetries ENABLEUNLIMITEDPREEMPTIBLERETRIES]
                       [--doubleMem DOUBLEMEM]
                       [--maxJobDuration MAXJOBDURATION]
                       [--rescueJobsFrequency RESCUEJOBSFREQUENCY]
                       [--maxLogFileSize MAXLOGFILESIZE]
                       [--writeLogs [WRITELOGS]]
                       [--writeLogsGzip [WRITELOGSGZIP]]
                       [--writeLogsFromAllJobs WRITELOGSFROMALLJOBS]
                       [--writeMessages WRITE_MESSAGES]
                       [--realTimeLogging REALTIMELOGGING]
                       [--disableChaining DISABLECHAINING]
                       [--disableJobStoreChecksumVerification DISABLEJOBSTORECHECKSUMVERIFICATION]
                       [--sseKey SSEKEY] [--setEnv NAME=VALUE or NAME]
                       [--servicePollingInterval SERVICEPOLLINGINTERVAL]
                       [--forceDockerAppliance FORCEDOCKERAPPLIANCE]
                       [--statusWait STATUSWAIT]
                       [--disableProgress DISABLEPROGRESS] [--config CONFIG]
                       [--debugWorker] [--disableWorkerOutputCapture]
                       [--badWorker BADWORKER]
                       [--badWorkerFailInterval BADWORKERFAILINTERVAL]
                       [--not-strict] [--enable-dev] [--enable-ext] [--quiet]
                       [--basedir BASEDIR] [--outdir OUTDIR] [--version]
                       [--log-dir LOG_DIR]
                       [--user-space-docker-cmd USER_SPACE_DOCKER_CMD | --singularity | --podman | --no-container | --leave-container]
                       [--custom-net CUSTOM_NET] [--cidfile-dir CIDFILE_DIR]
                       [--cidfile-prefix CIDFILE_PREFIX]
                       [--preserve-environment VAR1 VAR2 [VAR1 VAR2 ...]]
                       [--preserve-entire-environment]
                       [--destBucket DESTBUCKET]
                       [--beta-dependency-resolvers-configuration BETA_DEPENDENCY_RESOLVERS_CONFIGURATION]
                       [--beta-dependencies-directory BETA_DEPENDENCIES_DIRECTORY]
                       [--beta-use-biocontainers] [--beta-conda-dependencies]
                       [--tmpdir-prefix TMPDIR_PREFIX]
                       [--tmp-outdir-prefix TMP_OUTDIR_PREFIX]
                       [--force-docker-pull] [--no-match-user]
                       [--no-read-only] [--strict-memory-limit]
                       [--strict-cpu-limit] [--relax-path-checks]
                       [--default-container DEFAULT_CONTAINER]
                       [--compute-checksum | --no-compute-checksum]
                       [--eval-timeout EVAL_TIMEOUT] [--overrides OVERRIDES]
                       [--mpi-config-file MPI_CONFIG_FILE]
                       [--bypass-file-store] [--disable-streaming]
                       [--provenance PROVENANCE] [--enable-user-provenance]
                       [--disable-user-provenance] [--enable-host-provenance]
                       [--disable-host-provenance] [--orcid ORCID]
                       [--full-name CWL_FULL_NAME]
                       cwltool [cwljob ...]

positional arguments:
cwltool CWL file to run.
cwljob Input file or CWL options. If CWL workflow takes an
input, the name of the input can be used as an option.
For example: "toil-cwl-runner workflow.cwl --file1
file". If an input has the same name as a Toil option,
pass '--' before it.

options:
-h, --help show this help message and exit
--not-strict
--enable-dev Enable loading and running development versions of CWL
--enable-ext Enable loading and running 'cwltool:' extensions to
the CWL standards.
--quiet
--basedir BASEDIR
--outdir OUTDIR
--version show program's version number and exit
--log-dir LOG_DIR Log your tools stdout/stderr to this location outside
of container
--user-space-docker-cmd USER_SPACE_DOCKER_CMD
(Linux/OS X only) Specify a user space docker command
(like udocker or dx-docker) that will be used to call
'pull' and 'run'
--singularity Use Singularity runtime for running containers.
Requires Singularity v2.6.1+ and Linux with kernel
version v3.18+ or with overlayfs support backported.
--podman Use Podman runtime for running containers.
--no-container Do not execute jobs in a Docker container, even when
DockerRequirement is specified under hints.
--leave-container Do not delete Docker container used by jobs after they
exit
--preserve-environment VAR1 VAR2 [VAR1 VAR2 ...]
Preserve specified environment variables when running
CommandLineTools
--preserve-entire-environment
Preserve all environment variable when running
CommandLineTools.
--destBucket DESTBUCKET
Specify a cloud bucket endpoint for output files.
--beta-dependency-resolvers-configuration BETA_DEPENDENCY_RESOLVERS_CONFIGURATION
--beta-dependencies-directory BETA_DEPENDENCIES_DIRECTORY
--beta-use-biocontainers
--beta-conda-dependencies
--tmpdir-prefix TMPDIR_PREFIX
Path prefix for temporary directories
--tmp-outdir-prefix TMP_OUTDIR_PREFIX
Path prefix for intermediate output directories
--force-docker-pull Pull latest docker image even if it is locally present
--no-match-user Disable passing the current uid to docker run --user
--no-read-only Do not set root directory in the container as read-
only
--strict-memory-limit
When running with software containers and the Docker
engine, pass either the calculated memory allocation
from ResourceRequirements or the default of 1 gigabyte
to Docker's --memory option.
--strict-cpu-limit When running with software containers and the Docker
engine, pass either the calculated cpu allocation from
ResourceRequirements or the default of 1 core to
Docker's --cpu option. Requires docker version >=
v1.13.
--relax-path-checks Relax requirements on path names to permit spaces and
hash characters.
--default-container DEFAULT_CONTAINER
Specify a default docker container that will be used
if the workflow fails to specify one.
--compute-checksum Compute checksum of contents while collecting outputs
--no-compute-checksum
Do not compute checksum of contents while collecting
outputs
--eval-timeout EVAL_TIMEOUT
Time to wait for a Javascript expression to evaluate
before giving an error, default 20s.
--overrides OVERRIDES
Read process requirement overrides from file.
--mpi-config-file MPI_CONFIG_FILE
Platform specific configuration for MPI (parallel
launcher, its flag etc). See the cwltool README
section 'Running MPI-based tools' for details of the
format: https://github.com/common-workflow-
language/cwltool#running-mpi-based-tools-that-need-to-
be-launched
--bypass-file-store Do not use Toil's file store and assume all paths are
accessible in place from all nodes.
--disable-streaming Disable file streaming for files that have
'streamable' flag True

Logging Options:
--logCritical Turn on loglevel Critical. Default: INFO.
--logError Turn on loglevel Error. Default: INFO.
--logWarning Turn on loglevel Warning. Default: INFO.
--logDebug Turn on loglevel Debug. Default: INFO.
--logInfo Turn on loglevel Info. Default: INFO.
--logOff Same as --logCRITICAL.
--logLevel {Critical,Error,Warning,Debug,Info,critical,error,warning,debug,info,CRITICAL,ERROR,WARNING,DEBUG,INFO}
Set the log level. Default: INFO. Options:
['Critical', 'Error', 'Warning', 'Debug', 'Info',
'critical', 'error', 'warning', 'debug', 'info',
'CRITICAL', 'ERROR', 'WARNING', 'DEBUG', 'INFO'].
--logFile LOGFILE File to log in.
--rotatingLogging Turn on rotating logging, which prevents log files
from getting too big.

Toil core options.:
Options to specify the location of the Toil workflow and turn on stats
collation about the performance of jobs.

--jobstore JOBSTORE, --jobStore JOBSTORE
The location of the job store for the workflow. A job
store holds persistent information about the jobs,
stats, and files in a workflow. If the workflow is run
with a distributed batch system, the job store must be
accessible by all worker nodes. Depending on the
desired job store implementation, the location should
be formatted according to one of the following
schemes: file: where points to a
directory on the file systen aws::
where is the name of an AWS region like us-
west-2 and will be prepended to the names of
any top-level AWS resources in use by job store, e.g.
S3 buckets. google:<project_id>: TODO: explain
For backwards compatibility, you may also specify
./foo (equivalent to file:./foo or just file:foo) or
/bar (equivalent to file:/bar).
--workDir WORKDIR Absolute path to directory where temporary files
generated during the Toil run should be placed.
Standard output and error from batch system jobs
(unless --noStdOutErr is set) will be placed in this
directory. A cache directory may be placed in this
directory. Temp files and folders will be placed in a
directory toil- within workDir. The
workflowID is generated by Toil and will be reported
in the workflow logs. Default is determined by the
variables (TMPDIR, TEMP, TMP) via mkdtemp. This
directory needs to exist on all machines running jobs;
if capturing standard output and error from batch
system jobs is desired, it will generally need to be
on a shared file system. When sharing a cache between
containers on a host, this directory must be shared
between the containers. [env var: TOIL_WORKDIR]
--coordinationDir COORDINATION_DIR
Absolute path to directory where Toil will keep state
and lock files.When sharing a cache between containers
on a host, this directory must be shared between the
containers. [env var: TOIL_COORDINATION_DIR]
--noStdOutErr Do not capture standard output and error from batch
system jobs.
--stats Records statistics about the toil workflow to be used
by 'toil stats'.
--clean {always,onError,never,onSuccess}
Determines the deletion of the jobStore upon
completion of the program. Choices: ['always',
'onError', 'never', 'onSuccess']. The --stats option
requires information from the jobStore upon completion
so the jobStore will never be deleted with that flag.
If you wish to be able to restart the run, choose
'never' or 'onSuccess'. Default is 'never' if stats is
enabled, and 'onSuccess' otherwise.
--cleanWorkDir {always,onError,never,onSuccess}
Determines deletion of temporary worker directory upon
completion of a job. Choices: ['always', 'onError',
'never', 'onSuccess']. Default = always. WARNING: This
option should be changed for debugging only. Running a
full pipeline with this option could fill your disk
with excessive intermediate data.
--clusterStats [CLUSTERSTATS]
If enabled, writes out JSON resource usage statistics
to a file. The default location for this file is the
current working directory, but an absolute path can
also be passed to specify where this file should be
written. This options only applies when using scalable
batch systems.

Toil options for restarting an existing workflow.:
Allows the restart of an existing workflow

--restart If --restart is specified then will attempt to restart
existing workflow at the location pointed to by the
--jobStore option. Will raise an exception if the
workflow does not exist

Toil options for specifying the batch system.:
Allows the specification of the batch system.

--batchSystem {aws_batch,single_machine,grid_engine,lsf,mesos,slurm,torque,htcondor,kubernetes}
The type of batch system to run the job(s) with,
currently can be one of aws_batch, single_machine,
grid_engine, lsf, mesos, slurm, torque, htcondor,
kubernetes. default=single_machine
--disableHotDeployment
Hot-deployment was renamed to auto-deployment. Option
now redirects to --disableAutoDeployment. Left in for
backwards compatibility.
--disableAutoDeployment DISABLEAUTODEPLOYMENT
Should auto-deployment of the user script be
deactivated? If True, the user script/package should
be present at the same location on all workers.
Default = False.
--maxJobs MAX_JOBS Specifies the maximum number of jobs to submit to the
backing scheduler at once. Not supported on Mesos or
AWS Batch. Use 0 for unlimited. Defaults to unlimited.
--maxLocalJobs MAX_LOCAL_JOBS
Specifies the maximum number of housekeeping jobs to
run sumultaneously on the local system. Use 0 for
unlimited. Defaults to the number of local cores (8).
--manualMemArgs Do not add the default arguments: 'hv=MEMORY' &
'h_vmem=MEMORY' to the qsub call, and instead rely on
TOIL_GRIDGENGINE_ARGS to supply alternative arguments.
Requires that TOIL_GRIDGENGINE_ARGS be set.
--runLocalJobsOnWorkers, --runCwlInternalJobsOnWorkers
Whether to run jobs marked as local (e.g. CWLScatter)
on the worker nodes instead of the leader node. If
false (default), then all such jobs are run on the
leader node. Setting this to true can speed up CWL
pipelines for very large workflows with many sub-
workflows and/or scatters, provided that the worker
pool is large enough.
--coalesceStatusCalls
Ask for job statuses from the batch system in a batch.
Deprecated; now always enabled where supported.
--statePollingWait STATEPOLLINGWAIT
Time, in seconds, to wait before doing a scheduler
query for job state. Return cached results if within
the waiting period. Only works for grid engine batch
systems such as gridengine, htcondor, torque, slurm,
and lsf.
--batchLogsDir BATCH_LOGS_DIR
Directory to tell the backing batch system to log
into. Should be available on both the leader and the
workers, if the backing batch system writes logs to
the worker machines' filesystems, as many HPC
schedulers do. If unset, the Toil work directory will
be used. Only works for grid engine batch systems such
as gridengine, htcondor, torque, slurm, and lsf. [env
var: TOIL_BATCH_LOGS_DIR]
--awsBatchRegion AWS_BATCH_REGION
The AWS region containing the AWS Batch queue to
submit to. [env var: TOIL_AWS_REGION]
--awsBatchQueue AWS_BATCH_QUEUE
The name or ARN of the AWS Batch queue to submit to.
[env var: TOIL_AWS_BATCH_QUEUE]
--awsBatchJobRoleArn AWS_BATCH_JOB_ROLE_ARN
The ARN of an IAM role to run AWS Batch jobs as, so
they can e.g. access a job store. Must be assumable by
ecs-tasks.amazonaws.com. [env var:
TOIL_AWS_BATCH_JOB_ROLE_ARN]
--scale SCALE A scaling factor to change the value of all submitted
tasks's submitted cores. Used in the single_machine
batch system. Useful for running workflows on smaller
machines than they were designed for, by setting a
value less than 1. (default: 1)
--dont_allocate_mem A flag that can block allocating memory with '--mem'
for job submissions on SLURM since some system servers
may reject any job request that explicitly specifies
the memory allocation. The default is to always
allocate memory.
--allocate_mem A flag that can block allocating memory with '--mem'
for job submissions on SLURM since some system servers
may reject any job request that explicitly specifies
the memory allocation. The default is to always
allocate memory.
--kubernetesHostPath KUBERNETES_HOST_PATH
Path on Kubernetes hosts to use as shared inter-pod
temp directory. (default: None) [env var:
TOIL_KUBERNETES_HOST_PATH]
--kubernetesOwner KUBERNETES_OWNER
Username to mark Kubernetes jobs with. If the provided
value is None, the value will be generated at runtime.
(Generated default: michael) [env var:
TOIL_KUBERNETES_OWNER]
--kubernetesServiceAccount KUBERNETES_SERVICE_ACCOUNT
Service account to run jobs as. (default: None) [env
var: TOIL_KUBERNETES_SERVICE_ACCOUNT]
--kubernetesPodTimeout KUBERNETES_POD_TIMEOUT
Seconds to wait for a scheduled Kubernetes pod to
start running. (default: 120) [env var:
TOIL_KUBERNETES_POD_TIMEOUT]

Toil options for configuring storage.:
Allows configuring Toil's data storage.

--symlinkImports SYMLINKIMPORTS
When using a filesystem based job store, CWL input
files are by default symlinked in. Setting this option
to True instead copies the files into the job store,
which may protect them from being modified externally.
When set to False, as long as caching is enabled, Toil
will protect the file automatically by changing the
permissions to read-only.default=True
--moveOutputs MOVEOUTPUTS
When using a filesystem based job store, output files
are by default moved to the output directory, and a
symlink to the moved exported file is created at the
initial location. Setting this option to True instead
copies the files into the output directory. Applies to
filesystem-based job stores only.default=False
--caching CACHING Enable or disable caching for your workflow,
specifying this overrides default from job store

Toil options for autoscaling the cluster of worker nodes.:
Allows the specification of the minimum and maximum number of nodes in an
autoscaled cluster, as well as parameters to control the level of
provisioning.

--provisioner {aws,gce,None}, -p {aws,gce,None}
The provisioner for cluster auto-scaling. This is the
main Toil '--provisioner' option, and defaults to None
for running on single machine and non-auto-scaling
batch systems. The currently supported choices are
['aws', 'gce', None]. The default is None.
--nodeTypes NODETYPES
Specifies a list of comma-separated node types, each
of which is composed of slash-separated instance
types, and an optional spot bid set off by a colon,
making the node type preemptible. Instance types may
appear in multiple node types, and the same node type
may appear as both preemptible and non-preemptible.
Valid argument specifying two node types:
c5.4xlarge/c5a.4xlarge:0.42,t2.large Node types:
c5.4xlarge/c5a.4xlarge:0.42 and t2.large Instance
types: c5.4xlarge, c5a.4xlarge, and t2.large
Semantics: Bid $0.42/hour for either c5.4xlarge or
c5a.4xlarge instances, treated interchangeably, while
they are available at that price, and buy t2.large
instances at full price. default=[]
--maxNodes MAXNODES Maximum number of nodes of each type in the cluster,
if using autoscaling, provided as a comma-separated
list. The first value is used as a default if the list
length is less than the number of nodeTypes.
default=[10]
--minNodes MINNODES Mininum number of nodes of each type in the cluster,
if using auto-scaling. This should be provided as a
comma-separated list of the same length as the list of
node types. default=[0]
--targetTime TARGETTIME
Sets how rapidly you aim to complete jobs in seconds.
Shorter times mean more aggressive parallelization.
The autoscaler attempts to scale up/down so that it
expects all queued jobs will complete within
targetTime seconds. default=1800
--betaInertia BETAINERTIA
A smoothing parameter to prevent unnecessary
oscillations in the number of provisioned nodes. This
controls an exponentially weighted moving average of
the estimated number of nodes. A value of 0.0 disables
any smoothing, and a value of 0.9 will smooth so much
that few changes will ever be made. Must be between
0.0 and 0.9. default=0.1
--scaleInterval SCALEINTERVAL
The interval (seconds) between assessing if the scale
of the cluster needs to change. default=60
--preemptibleCompensation PREEMPTIBLECOMPENSATION, --preemptableCompensation PREEMPTIBLECOMPENSATION
The preference of the autoscaler to replace
preemptible nodes with non-preemptible nodes, when
preemptible nodes cannot be started for some reason.
This value must be between 0.0 and 1.0, inclusive. A
value of 0.0 disables such compensation, a value of
0.5 compensates two missing preemptible nodes with a
non-preemptible one. A value of 1.0 replaces every
missing pre-emptable node with a non-preemptible one.
default=0.0
--nodeStorage NODESTORAGE
Specify the size of the root volume of worker nodes
when they are launched in gigabytes. You may want to
set this if your jobs require a lot of disk space.
(default=50).
--nodeStorageOverrides NODESTORAGEOVERRIDES
Comma-separated list of nodeType:nodeStorage that are
used to override the default value from --nodeStorage
for the specified nodeType(s). This is useful for
heterogeneous jobs where some tasks require much more
disk than others.
--metrics METRICS Enable the prometheus/grafana dashboard for monitoring
CPU/RAM usage, queue size, and issued jobs.
--assumeZeroOverhead ASSUME_ZERO_OVERHEAD
Ignore scheduler and OS overhead and assume jobs can
use every last byte of memory and disk on a node when
autoscaling.

Toil options for limiting the number of service jobs and detecting service deadlocks:
Allows the specification of the maximum number of service jobs in a
cluster. By keeping this limited we can avoid nodes occupied with services
causing deadlocks.

Toil options for cores/memory requirements.:
The options to specify default cores/memory requirements (if not specified
by the jobs themselves), and to limit the total amount of memory/cores
requested from the batch system.

--defaultMemory DEFAULTMEMORY
The default amount of memory to request for a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Standard suffixes
like K, Ki, M, Mi, G or Gi are supported. Default is
2.0 Gi.
--defaultCores FLOAT The default amount of cpu to request for a job. Only
applicable to jobs that do not specify an explicit
value for this requirement. Fractions of a core (for
example 0.1) are supported on some batch systems
[mesos, single_machine]. Default is 1.
--defaultDisk INT The default amount of disk to request for a job. Only
applicable to jobs that do not specify an explicit
value for this requirement. Standard suffixes like K,
Ki, M, Mi, G or Gi are supported. Default is 2.0 Gi.
--defaultAccelerators ACCELERATOR[,ACCELERATOR...]
The default amount of accelerators to request for a
job. Only applicable to jobs that do not specify an
explicit value for this requirement. Each accelerator
specification can have a type (gpu [default], nvidia,
amd, cuda, rocm, opencl, or a specific model like
nvidia-tesla-k80), and a count [default: 1]. If both a
type and a count are used, they must be separated by a
colon. If multiple types of accelerators are used, the
specifications are separated by commas. Default is [].
--defaultPreemptible [BOOL], --defaultPreemptable [BOOL]
Make all jobs able to run on preemptible (spot) nodes
by default.
--maxCores INT The max amount of cpu to request for a job. Only
applicable to jobs that do not specify an explicit
value for this requirement. Fractions of a core (for
example 0.1) are supported on some batch systems
[mesos, single_machine]. Default is
9223372036854775807.
--maxMemory INT The max amount of memory to request for a job. Only
applicable to jobs that do not specify an explicit
value for this requirement. Standard suffixes like K,
Ki, M, Mi, G or Gi are supported. Default is 8.0 Ei.
--maxDisk INT The max amount of disk to request for a job. Only
applicable to jobs that do not specify an explicit
value for this requirement. Standard suffixes like K,
Ki, M, Mi, G or Gi are supported. Default is 8.0 Ei.

Toil options for rescuing/killing/restarting jobs.:
The options for jobs that either run too long/fail or get lost (some batch
systems have issues!).

--retryCount RETRYCOUNT
Number of times to retry a failing job before giving
up and labeling job failed. default=1
--enableUnlimitedPreemptibleRetries ENABLEUNLIMITEDPREEMPTIBLERETRIES, --enableUnlimitedPreemptableRetries ENABLEUNLIMITEDPREEMPTIBLERETRIES
If set, preemptible failures (or any failure due to an
instance getting unexpectedly terminated) will not
count towards job failures and --retryCount.
--doubleMem DOUBLEMEM
If set, batch jobs which die to reaching memory limit
on batch schedulers will have their memory doubled and
they will be retried. The remaining retry count will
be reduced by 1. Currently supported by LSF.
--maxJobDuration MAXJOBDURATION
Maximum runtime of a job (in seconds) before we kill
it (this is a lower bound, and the actual time before
killing the job may be longer).
default=9223372036854775807
--rescueJobsFrequency RESCUEJOBSFREQUENCY
Period of time to wait (in seconds) between checking
for missing/overlong jobs, that is jobs which get lost
by the batch system. Expert parameter. default=60

Toil log management options.:
Options for how Toil should manage its logs.

--maxLogFileSize MAXLOGFILESIZE
The maximum size of a job log file to keep (in bytes),
log files larger than this will be truncated to the
last X bytes. Setting this option to zero will prevent
any truncation. Setting this option to a negative
value will truncate from the beginning. Default=62.5
Ki
--writeLogs [WRITELOGS]
Write worker logs received by the leader into their
own files at the specified path. Any non-empty
standard output and error from failed batch system
jobs will also be written into files at this path. The
current working directory will be used if a path is
not specified explicitly. Note: By default only the
logs of failed jobs are returned to leader. Set log
level to 'debug' or enable '--writeLogsFromAllJobs' to
get logs back from successful jobs, and adjust
'maxLogFileSize' to control the truncation limit for
worker logs.
--writeLogsGzip [WRITELOGSGZIP]
Identical to --writeLogs except the logs files are
gzipped on the leader.
--writeLogsFromAllJobs WRITELOGSFROMALLJOBS
Whether to write logs from all jobs (including the
successful ones) without necessarily setting the log
level to 'debug'. Ensure that either --writeLogs or
--writeLogsGzip is set if enabling this option.
--writeMessages WRITE_MESSAGES
File to send messages from the leader's message bus
to.
--realTimeLogging REALTIMELOGGING
Enable real-time logging from workers to leader

Toil miscellaneous options.:
Everything else.

--disableChaining DISABLECHAINING
Disables chaining of jobs (chaining uses one job's
resource allocation for its successor job if
possible).
--disableJobStoreChecksumVerification DISABLEJOBSTORECHECKSUMVERIFICATION
Disables checksum verification for files transferred
to/from the job store. Checksum verification is a
safety check to ensure the data is not corrupted
during transfer. Currently only supported for non-
streaming AWS files.
--sseKey SSEKEY Path to file containing 32 character key to be used
for server-side encryption on awsJobStore or
googleJobStore. SSE will not be used if this flag is
not passed.
--setEnv NAME=VALUE or NAME, -e NAME=VALUE or NAME
Set an environment variable early on in the worker. If
VALUE is null, it will be looked up in the current
environment. Independently of this option, the worker
will try to emulate the leader's environment before
running a job, except for some variables known to vary
across systems. Using this option, a variable can be
injected into the worker process itself before it is
started.
--servicePollingInterval SERVICEPOLLINGINTERVAL
Interval of time service jobs wait between polling for
the existence of the keep-alive flag. Default: 60.0
--forceDockerAppliance FORCEDOCKERAPPLIANCE
Disables sanity checking the existence of the docker
image specified by TOIL_APPLIANCE_SELF, which Toil
uses to provision mesos for autoscaling.
--statusWait STATUSWAIT
Seconds to wait between reports of running jobs.
--disableProgress DISABLEPROGRESS
Disables the progress bar shown when standard error is
a terminal.
--config CONFIG Get options from a config file.

Toil debug options.:
Debug options for finding problems or helping with testing.

--debugWorker Experimental no forking mode for local debugging.
Specifically, workers are not forked and stderr/stdout
are not redirected to the log.
--disableWorkerOutputCapture
Let worker output go to worker's standard out/error
instead of per-job logs.
--badWorker BADWORKER
For testing purposes randomly kill --badWorker
proportion of jobs using SIGKILL. default=0.0
--badWorkerFailInterval BADWORKERFAILINTERVAL
When killing the job pick uniformly within the
interval from 0.0 to --badWorkerFailInterval seconds
after the worker starts. default=0.01

--custom-net CUSTOM_NET
Specify docker network name to pass to docker run
command

Options for recording the Docker container identifier into a file.:
--cidfile-dir CIDFILE_DIR
Store the Docker container ID into a file in the
specified directory.
--cidfile-prefix CIDFILE_PREFIX
Specify a prefix to the container ID filename. Final
file name will be followed by a timestamp. The default
is no prefix.

Options for recording provenance information of the execution:
--provenance PROVENANCE
Save provenance to specified folder as a Research
Object that captures and aggregates workflow execution
and data products.
--enable-user-provenance
Record user account info as part of provenance.
--disable-user-provenance
Do not record user account info in provenance.
--enable-host-provenance
Record host info as part of provenance.
--disable-host-provenance
Do not record host info in provenance.
--orcid ORCID Record user ORCID identifier as part of provenance,
e.g. https://orcid.org/0000-0002-1825-0097 or
0000-0002-1825-0097. Alternatively the environment
variable ORCID may be set.
--full-name CWL_FULL_NAME
Record full name of user as part of provenance, e.g.
Josiah Carberry. You may need to use shell quotes to
preserve spaces. Alternatively the environment
variable CWL_FULL_NAME may be set.

Args that start with '--' can also be set in a config file (specified via
--config). The config file uses YAML syntax and must represent a YAML
'mapping' (for details, see http://learn.getgrav.org/advanced/yaml). In
general, command-line values override environment variables which override
config file values which override defaults.

@stxue1 stxue1 requested a review from mr-c December 4, 2023 17:16
Copy link
Contributor

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a diff of the old and new options

  1. --version no longer has a description (was "show program's version number and exit")
  2. --custom-net is now listed as being part of extra_dockergroup (that should probably not be printed)

From a diff of toil 5.12

  1. --runCwlInternalJobsOnWorkers is missing from the summary section
  2. [--linkImports | --noLinkImports] is missing
  3. [--moveExports | --noMoveExports] is missing
  4. --disableCaching is missing

The metavars don't seem to be very helpful except to say that a particular command line option requires a value. Example: --disableAutoDeployment DISABLEAUTODEPLOYMENT ; wouldn't it be better to say --disableAutoDeployment BOOLEAN or similar? Likewise --config PATH instead of --config CONFIG, etc..

I appreciate that the environment variables are listed directly, thank you!

src/toil/options/common.py Outdated Show resolved Hide resolved
@stxue1 stxue1 enabled auto-merge (squash) December 12, 2023 01:39
@stxue1 stxue1 merged commit 9aa2da9 into master Dec 12, 2023
2 checks passed
michael-kotliar added a commit to michael-kotliar/toil that referenced this pull request May 14, 2024
* Update docs to hide Mesos (#4413)

* Update docs to hide Mesos

* address review comments

* remove invisible characters?

* replace mesos in more places

* Document Kubernetes-managed autoscaling, with in-workflow Mesos autoscaling as deprected

* Reword some documentation and messages

* Chase out more Mesoses

* Don't insist on processes actually running promptly in parallel

* Ask for a compatible set of Sphinx packages

* Keep back astroid

We can't use astroid 3 until sphinx-autoapi releases a fix for
https://github.com/readthedocs/sphinx-autoapi/issues/392

---------

Co-authored-by: Adam Novak <[email protected]>

* Avoid concurrent modification in cluster scaler tests (#4600)

This will fix #4599 by making the mock leader thread safe.

* Add String to File functionality into toil-wdl-runner (#4589)

* monkeypatch coerce for workflow related nodes

* Fix task inputs string coerce

* Disable kubernetes

* Comment out cwl kubernetes

* Maybe markers are wrong and comment out cactus-on-kubernetes

* Add docstrings to changed functions + change input list to dict

* Deal with nonetype

---------

Co-authored-by: Adam Novak <[email protected]>

* Separate out integration tests to run on a schedule (#4612)

* Reorganize tests and move integration tests to scheduled pipeline runs

* Also handle tags

* Add config file support (#4569)

* Centralize defaults

* Add requirements

* Grab logLevel

grabbed logLevel used to be the default in Config(), so grab effective
logLevel that is set

* Satisfy mypy

mypy might still complain about missing stubs for configargparser
though

* Fix wrong default

* add config tool

* temp fix

config sets defaults but so does argparse, runs twice in workflows but
deals with tests

* Fix create_config for tests instead

* Fix setting of config defaults

* Go back to previous method, create defaults at init

* Fix default cli options set

* Centralize, config util, and read properly

* Fix type hinting to support 3.9

* mypy

* Fix cwl edge case

* Fix tests

* fix typos, always generate config, fix some tests

* Remove subprocess as maybe tests are flaky on CI with it?

* just run quick_test_offline

* make CI print stuff

* Harden default config creation against races

* Cleanup and argument renaming

* Fix bad yaml and toil status bug

* Fix mypy

* Change behavior of --stats and --clean

* Change test behavior as options namespace and config now have the same
behavior

* Put forgotten line

ouch

* Batchsystem, requirements, fixes for tests

* Mypy conformance

* Mypy conformance

* Fix retryCount argument and kubernetesPodTimeout type

* Only run batchsystem and slurm_test tests on CI

* Whoops, this implementation never worked

* Add pyyaml to requirements for slurm to pass

* Add rest of gitlab CI back and run all tests

* Update stub file to be compatible with updated mypy

* Fix environment CLI option

* Update provisioner test to use configargparse

* Code cleanup and add jobstore_as_flag to DefaultArgumentParser etc

* Fix toil config test

* Add suggestions

* Deprecate options, add underscore CLI options only for newly deprecated options

* Update docs/argparse help and fix bug with deprecated options
also make most generic arg as default for runLocalJobsOnWorkers

* Add config file section to docs

* Remove upper bound for ruamel requirements

* Remove redundancies and improve disableCaching's destination name

* Update src/toil/batchSystems/kubernetes.py

Co-authored-by: Adam Novak <[email protected]>

* Remove redundant line in status util

* Remove comments in configargparse stub

* Workaround to get action=append instead of nargs and get proper backwards compatibility
Fix wrong name for link_imports and move_exports, remove new unused functions

* Import SYS_MAX_SIZE from common rather than duplicating it

* Mypy and syntax errors

* Move config options back to the old naming syntax

* Change names for link_imports and move_exports to camelCase options

* Fix formatting

* Bring back old --restart and --clean functionality where they collide and raise an error

* Make debug less spammy and remove unused types

* Disable kubernetes temporarily

* Revert changes to --restart and --clean collision

* Typo in tests

* Change some comments and add member fields to config

* Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575)

Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Adam Novak <[email protected]>

* Reduce the number of assert statements (#4590)

* Change all asserts to raising errors for central toil files

Co-authored-by: Adam Novak <[email protected]>

* Fix mypy and update docs to match options in common

* Update src/toil/common.py

Co-authored-by: Adam Novak <[email protected]>

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>

* take any nvidia-smi exception as not having gpu (#4611)

Co-authored-by: Adam Novak <[email protected]>

* Make WDLOutputJob collect all task outputs (#4602)

Co-authored-by: Adam Novak <[email protected]>

* Ensure sibling files in toil-wdl-runner (#4610)

* Ensure sibling files stay sibling files when downloaded

* Fix incorrect argument order

* Fix directory collisions with sibling files

* Make sure the `--batchLogsDir` exists if it is set (#4635)

* Make sure the batch logs dir exists if it is set

* Test Slurm with nonexistent --batchLogsDir

* Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639)

Fixes: https://github.com/DataBiosphere/toil/issues/4638

* cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565)

* Report errors in WDL using MiniWDL's error location printer (#4637)

* Report errors in WDL using MiniWDL's error location printer

* Decorate actual tasks with fancy WDL error reporting

* Slap WDL error reporting on main

* Remove banned ignore comment

* Support Python3.11 and drop Python 3.7 (#4646)

* Remove python 3.7 and add python 3.11 and make python3.11 the main python package

* Move main python package back to 3.9

* Incude python3.11 in docker

* Test 3.11 in CI

* Add python3.11 to CI dockerfile

* Add 3.11 to setup.py and debugging statements

* Python 3.7 backwards compatibility

* Update to py 3.12 and run 3.12 on gitlab CI

* Comment out fstring and try importlib

* Debug lint

* Ensure mypy is using python3.12

* Print python version beofre mypy

* Fix virtualenv, pip for python3.12

* Get rid of mesos tests/builds

* 3.12

* Revert debug change

* Go back to 3.11 and update docker package to make requests work again

* use an available htcondor package closest to 3.10 version

* update htcondor for all

* get pip for all python versions

* get virtualenv for all python versions

* needs specific ordering

* Separate mesos tests

* remove 3.7 from CI image

* Remove debug statement from makefile

* Fix configargparse in CWL (#4618)

* Parse config file separately from rest of args

* Mypy

* update configargparse stub

* Dont try to eat cwl arguments

* Use simpler workaround

* Revert to just CWL

* Change REMAINDER to "*", add help statements and test command line inputs

* Remove extradockergroup name

* Declare type

* Add proper relative path to cwl file

* Remove unnecessary test

---------

Co-authored-by: Adam Novak <[email protected]>

* Update ruamel-yaml requirement from <0.17.33,>=0.15 to >=0.15,<0.18.4 (#4659)

Updates the requirements on [ruamel-yaml]() to permit the latest version.

---
updated-dependencies:
- dependency-name: ruamel-yaml
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix CI Appliance Builds (#4655)

* Properly build 3.11, fix dependencies and move aws stubs/mock into dev

* only keep htcondor installs in appliance builds

* Remove unused import

* Fix extras_require syntax

* Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656)

* Bump mypy from 1.5.1 to 1.6.1 (#4660)

* Bump mypy from 1.5.1 to 1.6.1

Bumps [mypy](https://github.com/python/mypy) from 1.5.1 to 1.6.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.5.1...v1.6.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* type fix

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Move around reqs and move aws dev libraries to aws (#4664)

* Turn batch system tests back on (#4649)

This should fix #4648 by turning on the batch system tests again. The Mesos-specific ones are already moved elsewhere.

Co-authored-by: Lon Blauvelt <[email protected]>

* Bump miniwdl from 1.10.0 to 1.11.1 (#4669)

Bumps [miniwdl](https://github.com/chanzuckerberg/miniwdl) from 1.10.0 to 1.11.1.
- [Release notes](https://github.com/chanzuckerberg/miniwdl/releases)
- [Commits](https://github.com/chanzuckerberg/miniwdl/compare/v1.10.0...v1.11.1)

---
updated-dependencies:
- dependency-name: miniwdl
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Move TES batch system to a plugin (#4650)

* Implement new batch system finding API and plugin scan

* Satisfy MyPy

* Implement deprecation for the old constants

* Get plugin loader to actually load, and drop TES

* Remove TES Kubernetes setup we don't use

* Stop asking for needs_tes

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* skip unwanted networkx version (#4450)

* skip unwanted networkx version

* Limit to released major versions of networkx

---------

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* CWL Pipefish compatibility (#4636)

* Add a bunch of value resolving logging

* Quiet debugging a bit

* Move default setting for workflows so it works on subworkflows

* Remember to keep making a ToilFsAccess on the leader

* Satisfy MyPy

* Stop giving CWL containers directories full of broken symlinks

* Update test to expect no symlinks

* Move CWL integration tests for bioconda/biocontainers to integration test runs

* Wrap mkdtemp to fix #4644

* Sort imports in example scripts

* Use absolute-ized paths for work and coordination directories

* Bump cwltool from 3.1.20231020140205 to 3.1.20231114134824 (#4685)

Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20231020140205 to 3.1.20231114134824.
- [Release notes](https://github.com/common-workflow-language/cwltool/releases)
- [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20231020140205...3.1.20231114134824)

---
updated-dependencies:
- dependency-name: cwltool
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump mypy from 1.6.1 to 1.7.0 (#4684)

* Bump mypy from 1.6.1 to 1.7.0

Bumps [mypy](https://github.com/python/mypy) from 1.6.1 to 1.7.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.6.1...v1.7.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* mypy 1.7.0 type updates

* format modified files

* remove unused imports

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Remove the parasol batch system. (#4678)

Co-authored-by: Adam Novak <[email protected]>

* Reenable Cactus on Kubernetes CI test (#4604)

* Reenable kubernetes tests that don't require a local cluster, eg CWL on
ARM and Cactus integration on kubernetes

* Disable CWL kubernetes

* enable cactus tests

* Add to scheduled integration tests

* Add forgotten file

* Remove print statements

* Remove unnecessary env var and move file

* Run test when updated

Co-authored-by: Adam Novak <[email protected]>

* update gitlab

* Fix typo in path

* Add virtualenv and prepare build to gitlab CI to run tests properly

* add gitlab setup scripts

* add gitlab setup scripts

---------

Co-authored-by: Adam Novak <[email protected]>

* Only count output file usage when using the file store (#4692)

* Bump mypy from 1.7.0 to 1.7.1 (#4697)

Bumps [mypy](https://github.com/python/mypy) from 1.7.0 to 1.7.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.7.0...v1.7.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700)

Ignore errors when cleaning up the FileJobStoreTest

* Make sure cwltool always knows we have an outdir to fix #4698 (#4699)

* remove useage of the deprecated pkg_resources (#4701)

setup.py: make clear that Python 3.7 is no longer supported

Co-authored-by: Lon Blauvelt <[email protected]>

* more resiliancy (#4395)

* Support CWL 1.2.1 (#4682)

* cwl: use the latest commit from the proposed CWL v1.2.1 branch
* Double default CWL conformance test timeout
* Support abs path for directory outputs
* Better comment for why local paths are permitted
* add relax-path-checks to CI tests

---------

Co-authored-by: Michael R. Crusoe <[email protected]>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Remove the WDL compiler. (#4679)

* Remove the WDL compiler.

* Linting.

* Update WDL stand-alone.

* Weird linting error?

* Cut compiler docs

* Stop trying to run removed WDL compiler tests

---------

Co-authored-by: Adam Novak <[email protected]>

* Allow working with remote files in CWL and WDL workflows (#4690)

* Start implementing real ToilFsAccess URL operations

* Implement URL opening for CWL

* Implement other ToilFsAccess operations without local copies

* Remove getSize spelling and pass mypy

* Add missing import

* Remove check for extremely old setuptools

* Add --reference-inputs option to toil-cwl-runner

* Allow files to be gotten by URI on the nodes

* Add some tests to exercise URL references

* Implement URI access and import logic in WDL interpreter

* Remove duplicated test

* Fixc some merge problems

* Satisfy MyPy

* Spell default correctly

* Actually hook up import bypass flag

* Actually pass self test when using URLs

* Make file job store volunteer for non-schemed URIs

* Revert "Make file job store volunteer for non-schemed URIs"

This reverts commit 3d1e8f6761bd29f5bfedfd055f025943ab6ed1b8.

* Handle size requests for bare filenames

* Handle polling for URL existence

* Add a make test_debug target for getting test logs

* Add more logging to CWL streaming tests

* Contemplate multi-threaded access to the CachingFileStore from user code

* Allow downloading URLs in structures, and poll AWS directory existence right

* Update tests to a Debian with ARM Docker images

* Undo permission changes

* Add missing import

---------

Co-authored-by: Michael R. Crusoe <[email protected]>

* upgrade to cwltool 3.1.20231207110929 (#4707)

Co-authored-by: Michael R. Crusoe <[email protected]>

* Update docker requirement from <7,>=3.7.2 to >=3.7.2,<8 (#4713)

Updates the requirements on [docker](https://github.com/docker/docker-py) to permit the latest version.
- [Release notes](https://github.com/docker/docker-py/releases)
- [Commits](https://github.com/docker/docker-py/compare/3.7.2...7.0.0)

---
updated-dependencies:
- dependency-name: docker
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Implement a better config file system for CWL/WDL options (#4666)

* Strip leading whitespace from WDL commands (#4720)

* Strip leading whitespace from WDL commands

* Work around MiniWDL's wrong type

* Add __init__.py to options folder (#4723)

* Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725)

* Point CI at the new public URLs for stuff we host

* Bump mypy from 1.7.1 to 1.8.0 (#4731)

Bumps [mypy](https://github.com/python/mypy) from 1.7.1 to 1.8.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.7.1...v1.8.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Tolerate a failed AMI polling attempt (#4727)

* Tolerate a failed AMI polling attempt

* Start marking Internet-relates tests to keep them out of the offline step

* Update flake8 requirement from <7,>=3.8.4 to >=3.8.4,<8 (#4738)

Updates the requirements on [flake8](https://github.com/pycqa/flake8) to permit the latest version.
- [Commits](https://github.com/pycqa/flake8/compare/3.8.4...7.0.0)

---
updated-dependencies:
- dependency-name: flake8
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix --printJobInfo (#4709)

* Add a test for --printJobInfo

* Move file name listing into the FileJobStore so it can sort of work again

* Fix Toil subcommand usage to include the subcommand

* Satisfy MyPy

* Fix =True syntax and find files even when their jobs are gone or they are no-job

* Add a test for actually rerunning a job

* Make the test for running a job alone pass

* Address review comments

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* remove extraneous dependency on old 'mock' (#4739)

'mock' has been integrated in the standard library as 'unittest.mock'

* Improve WDL documentation (#4732)

* Fix code block boundary

* Make the CWL quickstart the main one

* Talk about Python workflows instead of user scripts

* Chase away all the Sphinx warnings so we know the docs should look right

* Fail the docs build if the docstrings don't parse cleanly

* Encourage installing with cwl and wdl extras

* Qualify Python development

* Reorganize docs to plug the workflow languages more

* Talk a bit about WDL

* Add conformance test and install info

* Stop trying to draw inheritance diagrams since RtD doesn't give us a dot anyway

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Fix scheduled CI tests (#4742)

* Actually filter to Mesos tests

Also run Mesos tests of we touch Mesos.

It looks like https://github.com/DataBiosphere/toil/pull/4646 added a
bunch of Mesos test run steps but didn't include tests= so they just run
all tests, even if the dependencies aren't there.

* Don't import boto when it may not be installed

* Stop pinning very old setuptools and pyyaml

This basically reverts 60096d89eb7233b2791000da87a9754399fcb9c4 and
should let us use a setuptools that is new enough for the Python
versions we are using.

* Run all tests on -fix-ci branches

* Put Mesos AWS tests in the Mesos step

* Improve WDL documentation (#4732)

* Fix code block boundary

* Make the CWL quickstart the main one

* Talk about Python workflows instead of user scripts

* Chase away all the Sphinx warnings so we know the docs should look right

* Fail the docs build if the docstrings don't parse cleanly

* Encourage installing with cwl and wdl extras

* Qualify Python development

* Reorganize docs to plug the workflow languages more

* Talk a bit about WDL

* Add conformance test and install info

* Stop trying to draw inheritance diagrams since RtD doesn't give us a dot anyway

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Indent docstring to fix doc build failure

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Update EC2 instances and EC2 update script. (#4745)

* Update EC2 instances and EC2 update script.

* Minor details.

* Clean up.

* Linting.

* Ignore a perfectly good import.

---------

Co-authored-by: Adam Novak <[email protected]>

* Log more usefully for CWL workflows (#4736)

* Log files going in and out and the various CWL workflow phases

* Log CWL job executions to the leader just as text; replace logToMaster

* Log runtime context name

* Revise other logging messages to improve CWL logs

* Fix test to allow trailing newline

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Don't mark inputs (or outputs) executable for no reason (#4728)

* Be explicit about executable representation

* Add testing to make sure outputs aren't unexpecteldy executable

* Let js expressions in the scatters take a long time to start Node

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Bump cwltool from 3.1.20231207110929 to 3.1.20240112164112 (#4751)

Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20231207110929 to 3.1.20240112164112.
- [Release notes](https://github.com/common-workflow-language/cwltool/releases)
- [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20231207110929...3.1.20240112164112)

---
updated-dependencies:
- dependency-name: cwltool
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update flake8-bugbear requirement from <24,>=20.11.1 to >=20.11.1,<25 (#4752)

Updates the requirements on [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) to permit the latest version.
- [Release notes](https://github.com/PyCQA/flake8-bugbear/releases)
- [Commits](https://github.com/PyCQA/flake8-bugbear/compare/20.11.1...24.1.15)

---
updated-dependencies:
- dependency-name: flake8-bugbear
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add pure Python fallback for getDirSizeRecursively() (#4753)

* add pure Python fallback for getDirSizeRecursively()

* Fix spelling

---------

Co-authored-by: Adam Novak <[email protected]>

* Update version_template.py for release

* Store chaining information just once (#4737)

* Keep around old names of chained jobs

* Get rid of chainedJobs

* Just pull log names from jobDesc

* Use an accessor to just get the whole chain together

* Imporve comment and formatting

* Fix wrong name in import

* Stop marking HTTP registry as insecure (#4757)

This should fix #4756 and hopefully the intermittent test failures where buildkit tries to speak HTTPS to our Docker cache.

* CWL: don't clear out user-provided values for the --default-container (#4730)

* CWL: don't clear out user-provided values for the --default-container

Fixes https://stackoverflow.com/questions/77684785/toil-cwl-runner-not-using-default-container-option-with-singularity-option

* mypy --strict for the CWL tests

* soften cap on ruamel.yaml dependency

* remove ruamel.yaml.string dependency for a simpler solution (#4760)

* Try to mitigate filling up the coordination directory (#4749)

* Complain more usefully about a bad coordination directory

* Don't pick tiny filesystems for coordination, and organize everything in toilwf- directories

* Put cleanup arena so it shares a prefix with but isn't in the directory it protects

* Fix variable name

* Don't catch any old thing, which doesn't work anymore anyway

* Allow toil-wdl-runner to run on Kubernetes and Mesos (#4754)

* Change docker security rules, remove --containall on singularity, add tzdata as dependency

* remove link for tzdata and add integration test

* Add test to gitlab and remove provisioner option

---------

Co-authored-by: Adam Novak <[email protected]>

* Ship User Logs to Leader (#4755)

* Document the stats and logging design as it stands

* Plug WDL task stdout and stderr into the --writeLogs system as new user streams

* Log CWL and WDL output and error logs that aren't captured by the workflow itself

* Name CWL and WDL log files usefully

This goes back to using displayName for stats and logging.

It also adds a WDL "task path" which is like the namespace but includes
numbers for scatters, and uses that to name the log files.

* Log more to illustrate https://github.com/moby/buildkit/issues/4458

* Document the user log system architecture

* Satisfy mypy

* Go back to using displayName for stats again

* Clarify CWL output handling

* Revise test to allow new '_'

* Update pytest requirement from <8,>=6.2.1 to >=6.2.1,<9 (#4772)

Updates the requirements on [pytest](https://github.com/pytest-dev/pytest) to permit the latest version.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/6.2.1...8.0.0)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add workflow to automatically update PRs when other PRs merge (#4774)

* Stop complaining about XDG_RUNTIME_DIR (#4769)

* Update setuptools requirement from <69,>=65.5.1 to >=65.5.1,<70 (#4693)

Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v65.5.1...v69.0.0)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Stop failing auto-update workflow on every merge conflict

* read the docs: enable generating graphs like inheritance trees. (#4734)

* read the docs: enable generating graphs like inheritance trees.

* Add Graphviz to CI image

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Docs: Always show Python execution using `python3` (#4764)

In case a virtualenv is not used

Co-authored-by: Andreas Tille <[email protected]>
Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Make formatting do all the code (#4777)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* avoid unnecessary boto{,3} imports (#4763)

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* remove use of distutils by copying in strtobool() (#4765)

* remove use of distutils by copying in strtobool()

Copied code is MIT licensed

https://github.com/pypa/distutils/blob/fb5c5704962cd3f40c69955437da9a88f4b28567/distutils/util.py#L340
https://github.com/pypa/distutils/blob/fb5c5704962cd3f40c69955437da9a88f4b28567/LICENSE

* Add type hints and replace distutils code with our own

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Revert --disableProgress to old flag-style behavior (#4778)

* Change default Singularity cache paths to be global (#4762)

* Change default cache paths to piggyback off of singularity and miniwdl defaults + set cache paths on cloud to /var/lib/toil

* Improve documentation

* Revert block quote and bold instead

* Change singularity cache directory to the right default directory

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* CPU count fallback (#4780)

* Fall back to 1 core when # CPUs unavailable

* Apply all limits and then fall back to 1

---------

Co-authored-by: Theodore Ni <[email protected]>

* Fix special characters in filenames with the FileJobStore (#4781)

* Remove extraneous unquote

* Log task standard error to the worker log if it fails and MiniWDL hasn't already logged it

* Hack around having to dedent the command at the wrong time by keying on the first line

* Remove extra logging and cross-checks

* Add back missing line end

* Work around boto stubs regression in https://github.com/python/typeshed/issues/11381

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update sphinx-autodoc-typehints requirement (#4784)

Updates the requirements on [sphinx-autodoc-typehints](https://github.com/tox-dev/sphinx-autodoc-typehints) to permit the latest version.
- [Release notes](https://github.com/tox-dev/sphinx-autodoc-typehints/releases)
- [Changelog](https://github.com/tox-dev/sphinx-autodoc-typehints/blob/main/CHANGELOG.md)
- [Commits](https://github.com/tox-dev/sphinx-autodoc-typehints/compare/1.24.0...2.0.0)

---
updated-dependencies:
- dependency-name: sphinx-autodoc-typehints
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Use a default log limit of 100MiB (#4788)

* Use a default log limit of 100MiB

* Update documented default

* Require a new enough Docker to fix #4794 (#4795)

* Log CWL command output inline on failure, and to logging system whether it succeeds or not (#4793)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Unify devirtualization to fix output name collisions (#4792)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow setting WDL container engine with --container (#4787)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>

* Request and handle Slurm timeout signal (#4804)

* Add a Slurm termination signal for timeouts

* Use SIGINT for Slurm timeouts instead of SIGTERM

* Make the interrupt signal actually get to the worker process

* Run worker orderly cleanup even if asked to stop

* Preserve exit code from user code

* Enforce failure when Slurm jobs time out (#4802)

* Don't let 0 exit codes out of the Slurm batch system if the job isn't completed.

* Add missing import

* Teach Slurm and part of LSF to use the Toil exit reason system

* Report unavailable exit status better

* Make sure exit reasons come out as readable strings when logged on Python 3.11+

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix caching being accidentally set to True instead of None (#4805)

* Better stats for WDL workflows (#4770)

* Split up WDL input evaluation and command execution

* Rename task parts to inputs and command

* Deduplicate across scatters for stats

* Report CPU wait accurately with multiple cores, and improve titles

Fixes #4768

* Fix memory units in stats and on Mac

* Move job disk usage tracking and warning to AbstractFileStore

* Save disk to stats

* Fix imports and variable name

* Remove duplicated stat printing code

* Unify stat computation

* Use the category metadata globals to drive everything and sync the width and print code

* Stop coming up with negative wait when jobs don't report cores

* Allow setting WDL container engine with --container

* Use a default log limit of 100MiB

* Update documented default

* Require a new enough Docker to fix #4794

* Add a unit notion to stats

* Be consistent about printing units in toil stats

* Rename functions to snake_case

* Improve error reporting and split cluster and normal utils

* Start documenting the parts of the stats

* Swap over to a stats example that is more illustrative

* Fix counting the jobs per worker

* Explain all the job columns and the sorting

* Fix typing of jobs list

* Fix documentation build

* Fix white-box stats test

* Move the cluster utils out of the cloud providers ToC section

* Update worker.py

---------

Co-authored-by: Lon Blauvelt <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update EC2 instance list. (#4808)

* Bump version.

* Update README.rst

Couple of small doc changes.

* Respect job local-ness when chaining (#4809)

* Add test to make sure local jobs don't chain to nonlocal ones

* Implement chaining block for local to nonlocal

* Scale down stats tutorial test to fit on small CI runners

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix Python 3.8 support (#4823)

* Add all the supported Python versions as scheduled tests

* Don't let the Docker build succeed when Toil can't run at all

* Use 3.8-compatible type hints

* Fix missing description on PyPI (#4820)

* setuptools: Include README in the package metadata.

Currently https://pypi.org/project/toil/#description is
> The author of this package has not provided a project description

* Makefile: use isolated builds, add dist target (sdist+wheel) and deprecate the sdist target.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Install build (#4826)

* Use a sentinel location instead of an unmodified location to mark missing files (#4818)

* Use a sentinel location instead of an unmodified location to mark missing files

* Fix spelling

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Bump mypy from 1.8.0 to 1.9.0 (#4830)

Bumps [mypy](https://github.com/python/mypy) from 1.8.0 to 1.9.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.8.0...1.9.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Make sure output directory exists before using it (#4832)

* Pass through statusCode to prevent infinite loop (#4829)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add tests for environment pickling (#4837)

* Add a test for the environment coming from environment.pickle over top of anything on the leader

* Make sure test works with slow job stores like AWS

* Bump sphinxcontrib-autoprogram from 0.1.8 to 0.1.9 (#4838)

Bumps [sphinxcontrib-autoprogram](https://github.com/sphinx-contrib/autoprogram) from 0.1.8 to 0.1.9.
- [Release notes](https://github.com/sphinx-contrib/autoprogram/releases)
- [Changelog](https://github.com/sphinx-contrib/autoprogram/blob/master/doc/changelog.rst)
- [Commits](https://github.com/sphinx-contrib/autoprogram/compare/0.1.8...0.1.9)

---
updated-dependencies:
- dependency-name: sphinxcontrib-autoprogram
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add colored logging (#4828)

* Add coloredlogs

* type ignore

* Fix test to get around how coloredlogs deals with  handlers

* Fix option, functionname, license, formatting, and colors

* Remove excess datetime

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove unused CI test (#4843)

* Measure CPU and memory usage in WDL Docker containers (#4819)

* Inject code into the container like MiniWDL to get Docker CPU and memory usage

* Remove not a real ref

* Keep resource monitoring state in a class

* Fix lingering old import

* Get import name right

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>

* Allow debugging jobs by name (and status improvements) (#4840)

* Report tag parsing errors better in case you mix up type and tag

* Fix toil status per-job status report to be per-job

* Shorten toil status option names

* Report completely failed jobs

* Rearrange per-job stats to make it easier to find runnable and failed jobs

* Add printing failed jobs specifically

* Stop making a config just to get status

* Implement search for job by name in debug-job by cribbing from status

* Document the toil status flags a bit

* Write up some debug-job examples

* Explain names more and drop distracting log line

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Improve exception handling to not output tracebacks (#4839)

* Improve exception handling, don't output tracebacks when possible

* Remove excess code in test

* Fix test to use subprocess to accommodate for changed exception handling

* Reword check_initialized()

Co-authored-by: Adam Novak <[email protected]>

* Move comments and make LocatorException take a prefix instead

* Change config to options as it no longer exists

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Update pytest-cov requirement from <5,>=2.12.1 to >=2.12.1,<6 (#4851)

Updates the requirements on [pytest-cov](https://github.com/pytest-dev/pytest-cov) to permit the latest version.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v2.12.1...v5.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update docutils requirement from <0.21,>=0.16 to >=0.16,<0.22 (#4866)

Updates the requirements on [docutils](https://docutils.sourceforge.io) to permit the latest version.

---
updated-dependencies:
- dependency-name: docutils
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update galaxy-util requirement from <23 to <25 (#4862)

Updates the requirements on [galaxy-util](https://github.com/galaxyproject/galaxy) to permit the latest version.
- [Release notes](https://github.com/galaxyproject/galaxy/releases)
- [Commits](https://github.com/galaxyproject/galaxy/compare/galaxy-util-19.9.0...v24.0)

---
updated-dependencies:
- dependency-name: galaxy-util
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update galaxy-tool-util requirement from <23 to <25 (#4861)

Updates the requirements on [galaxy-tool-util](https://github.com/galaxyproject/galaxy) to permit the latest version.
- [Release notes](https://github.com/galaxyproject/galaxy/releases)
- [Commits](https://github.com/galaxyproject/galaxy/compare/galaxy-tool-util-19.9.0...v24.0)

---
updated-dependencies:
- dependency-name: galaxy-tool-util
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Bump cwltool from 3.1.20240112164112 to 3.1.20240404144621 (#4870)

Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20240112164112 to 3.1.20240404144621.
- [Release notes](https://github.com/common-workflow-language/cwltool/releases)
- [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20240112164112...3.1.20240404144621)

---
updated-dependencies:
- dependency-name: cwltool
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump gunicorn from 21.2.0 to 22.0.0 (#4871)

Bumps [gunicorn](https://github.com/benoitc/gunicorn) from 21.2.0 to 22.0.0.
- [Release notes](https://github.com/benoitc/gunicorn/releases)
- [Commits](https://github.com/benoitc/gunicorn/compare/21.2.0...22.0.0)

---
updated-dependencies:
- dependency-name: gunicorn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Retry Slurm interactions more (#4869)

* Hook up grid engine batch systems to the normal retry system and add --stastePollingTimeout

* Remove extra word

* Insist on understanding the Slurm states and stop if we don't

* Change how we think of REVOKED and SPECIAL_EXIT

* Add missing argument

* Import missing exception type

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Replace use of boto with boto3 for `awsProvisioner.py` (#4859)

* Take out boto2 from awsProvisioner.py

* Add mypy stub file for s3

* Lazy import aws to avoid dependency if extra is not installed yet

* Also lazy import in tests

* Separate out wdl kubernetes test to avoid missing dependency

* Add unittest main

* Fix wdl CI to run separated tests

* Fix typo in lookup

* Update moto and remove leftover line in node.py

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

* Apply fixes

* Abstract AWS ErrorCondition server errors into a constant instance

* Move AWSServiceErrors declaration to a better place

* Prevent aliasing from confusing sphinx and remove cached autoapi in clean

* Update src/toil/lib/aws/__init__.py

Co-authored-by: Adam Novak <[email protected]>

* Change retry loop

* Replace assert with raise

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow fetching job inputs for debugging (#4848)

* Reformat worker

* Actually change kwarg name

* Enable stopping WDL (and probably CWL) jobs after files are downloaded

* Make sure WDL commands get logged before we stop

* Add type hints

* Add debug flag accessor

* Make debug-job default to debug logging

* Build fake container environments for CWL and WDL jobs when debugging them

* Add an example of dumping job files to the docs

* Add tests for the file retrieval and container faking

* Add missing imports

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Make leader wait for expected updates to be visible in the job store, or fail the job (#4811)

* Implement expecting version bumps and fail src/toil/test/batchSystems/batchSystemTest.py::MaxCoresSingleMachineBatchSystemTest::testServices

* Actually turn on debug logging for service test

* Refer to jobs for space usage accounting by stringified job description and not body file

* Use exponential backoff when polling for job updates

* Fix comparison direction

* Plug the new CLI option

* Include version writers in warnings

* Make return type annotation correct

* Don't wait for new versions of failed jobs because then we're too slow to pass the badWorker tests

* Scale down stats tutorial test to fit on small CI runners

* Work out that command overrides aren't being removed

* Stop having an overloaded command field on JobDescriptions

* Fix typos and update architecture to lean less on command

* Fix calling the checkpoint restore

* Handle None vs. empty successors in tests

* Handle places that didn't expect nextSuccessors() to ever be None

* Remove extra the

* Fix handling jobs that had no bodies, and consolidate warning logic

* Always actually do a reset even if no new version is ready.

* Use has_body accessor more

* Rename loadJob variables

* Rename _body_spec and use more has_body()

* Use a NamedTuple instead of a command-style string to point to the body

* Improve JobDescription docstring and fix typoed argument name

* Remove worker command from JobDescription

* Eliminate references to get_worker_command/set_worker_command

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Enable FUSE for privileged Toil clusters (#4824)

* Add option for privileged clusters and enable privileges for toil-managed clusters

* Fix syntax error and add back namespace rules

* packages might be broken

* Dependencies

* Move apt clean

* Create test image

* Create test image 2

* Try just creating the base docker image

* test image creation, typo

* Try focal debian package

* Try the last docker build command

* remove nontoil makefile dependencies to test

* Successfully build docker images at least for amd64

* Remove unprivileged fuse mount code

* Bring back rest of docker builds

* Remove unnecessary env var in dockerfile

* Fix setuptools and virtualenv to some version and revert whitespace

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

* Move SINGULARITY_CACHEDIR comment

* Formatting and move strtobool

* Reflect moved functions for imports

* Remove debug_mute flag and print debugging statement outside instead

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Detect if the GridEngine worker thread has crashed to prevent hanging the workflow (#4873)

* Debug envvar

* add error to message

* Add logic for unexpected background thread failure

* Set block back to true

* Don't duplicate thread exception message and print at end

* Revert "Debug envvar"

This reverts commit 13392858db352da75c8ddfe3b4d13b5d88eccf14.

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Bump mypy from 1.9.0 to 1.10.0 (#4878)

Bumps [mypy](https://github.com/python/mypy) from 1.9.0 to 1.10.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* remove SLURM caching override to support caching (#4884)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add more debug logging for when the job is attempted and the worker is started (#4881)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update WDL conformance tests on CI (#4876)

* Update wdltoil_test.py

* Fix typo

* Fix version for integration tests

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Replace all usage of boto2 with boto3 (#4868)

* Take out boto2 from awsProvisioner.py

* Add mypy stub file for s3

* Lazy import aws to avoid dependency if extra is not installed yet

* Also lazy import in tests

* Separate out wdl kubernetes test to avoid missing dependency

* Add unittest main

* Fix wdl CI to run separated tests

* Fix typo in lookup

* Update moto and remove leftover line in node.py

* Remove all instances of boto

* Fix issues with boto return types and grab attributes before deleting

* Remove some unnecessary abstraction

* Fix improperly types in ec2.py

* Ensure UUID is a string for boto3

* No more boto

* Remove comments

* Move attribute initialization

* Properly delete all attributes of the item

* Move out pager and use pager for select to get around output limits

* Turn getter into method

* Remove comment in setup.py

* Remove commented dead import

* Remove stray boto import

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

* Rename, rearrange some code

* Revert not passing Value's to attributes when deleting attributes in SDB

* Fix missed changed var names

* Change ordering of jobstorexists exception to fix improper output on exception

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Revert ensurepip to get-pip (#4900)

* docs cleanup (#4889)

* file incorrect file extensions.

* fix typos

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Bump to a new major version (#4885)

Since #4811 made the batch systems take the command as an argument, we now have to bump the major version to signal incompatibility with any old batch system plugins.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Warn user. (#4893)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow symlinks to inputs as WDL outputs (#4883)

* Detect missing files at the offending step and announce the problem conspicuously

* Log the offending expression

* Resolve symlinks against container mounts during file virtualization

* Try and forward along original virtualized filenames

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* bye pytz (#4890)

* pytz is not needed in Python 3.9+, or with the zoneinfo backport

* make diff_mypy: quieter and target the correct branch

* Linting.

* Satisfy MyPy more (new MyPy?)

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: DailyDreaming <[email protected]>

* Stop suggesting infinity when validating half-open intervals (#4887)

This should fix #4886 by not suggesting to the user that "infinity" is an option value that can be used.

It also explains the option intervals in words instead of interval notation, which people might not be expecting.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix WDL option spelling and tolerate Cromwell-isms (#4906)

* Fix WDL option spelling and tolerate Cromwell-isms

* Linting.

* Satisfy MyPy more (new MyPy?)

---------

Co-authored-by: DailyDreaming <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove wrapped CWL doc example. (#4892)

* Remove wrapped CWL doc example.

* Patch missing links.

* Remove AWS dependant import/test from cwlTest.py.

* Missing @slow.

* Missing import.

* Make SimpleDB retry on EndpointConnectionError

* Linting.

* Satisfy MyPy more (new MyPy?)

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Add retries to DockerCheckTest.testBadGoogleRepo (#4909)

* Add retries to flaky test

* get rid of extra import

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix 3.8 backport.timezone import (#4908)

* Fix 3.8 import and remove dead comment in requirements.txt

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>

* Update to Python 3.12 (#4901)

* Add Python 3.12 to CI

* Update sphinx-autoapi and astroid to deal with crash

https://github.com/pylint-dev/pylint/issues/8782

* Remove dead comment

* Add rules to 3.11 build

* update htcondor

* Update use of HTcondor in appliance build

* Ensure tests are instanced and don't jumble relative paths + debug logging

* oops, update utilsTest too

* is this a pytest issue?

* Add some more log messages

* Fix time.sleep

* Remove the debug statement in docker

* Bump flask-cors from 4.0.0 to 4.0.1 (#4916)

Bumps [flask-cors](https://github.com/corydolphin/flask-cors) from 4.0.0 to 4.0.1.
- [Release notes](https://github.com/corydolphin/flask-cors/releases)
- [Changelog](https://github.com/corydolphin/flask-cors/blob/main/CHANGELOG.md)
- [Commits](https://github.com/corydolphin/flask-cors/compare/4.0.0...4.0.1)

---
updated-dependencies:
- dependency-name: flask-cors
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Try /tmp before the workdir (#4914)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* biocontainer tests: use version corresponding to v2 Docker Image Format (#4912)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Revert "Update to Python 3.12 (#4901)" (#4917)

This reverts commit 460846d7ded3820acc505cccb9c866ea9a7a940a.

* Bump miniwdl from 1.11.1 to 1.12.0 (#4920)

Bumps [miniwdl](https://github.com/chanzuckerberg/miniwdl) from 1.11.1 to 1.12.0.
- [Release notes](https://github.com/chanzuckerberg/miniwdl/releases)
- [Commits](https://github.com/chanzuckerberg/miniwdl/compare/v1.11.1...v1.12.0)

---
updated-dependencies:
- dependency-name: miniwdl
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Support Python 3.12 (#4919)

* Add Python 3.12 to CI

* Update sphinx-autoapi and astroid to deal with crash

https://github.com/pylint-dev/pylint/issues/8782

* Remove dead comment

* Add rules to 3.11 build

* update htcondor

* Update use of HTcondor in appliance build

* Ensure tests are instanced and don't jumble relative paths + debug logging

* oops, update utilsTest too

* is this a pytest issue?

* Add some more log messages

* Fix time.sleep

* Remove the debug statement in docker

* remove logger print statements in utilsTest.py and pin pytest

* Up the timeout on some tests (possiby a timing issue)

* Up the timeout on more tests

* Up the pytest version again

* Add documentation for installing batch system plugins (#4926)

Co-authored-by: Adam Novak <[email protected]>

* Update Werkzeug to appease the Github security police (#4925)

It looks like if you give away your debugger PIN, people can use your Werkzeug debugger. This is somehow a security issue and was apparently never fixed on Werkzeug 2.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove unused comment

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: William Gao <[email protected]>
Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: stxue1 <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Glenn Hickey <[email protected]>
Co-authored-by: Michael R. Crusoe <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>
Co-authored-by: Lon Blauvelt <[email protected]>
Co-authored-by: Alexandre Detiste <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andreas Tille <[email protected]>
Co-authored-by: Theodore Ni <[email protected]>
Co-authored-by: Benedict Paten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Good config file support for WDL and CWL runner options
3 participants