Skip to content

Commit

Permalink
AWS Batch Batch System (DataBiosphere#3956)
Browse files Browse the repository at this point in the history
* Implement untested AWSBatchBatchSystem

* Fix AWS Batch pylint errors

* Fix AWS Batch mypy errors

* Actually use our backoff to respond to SlowDown

* Fix API calling mistakes

* Rearrange session management so we can use the right primitives

* Let AWS Batch batch system have a region

* Refactor zone selection

* Make job names safe for Amazon Batch

* Properly find running and stopped batch jobs, and report status reasons

* Try to apply tags and set example scripts to parse args

* Get owner tagging working

Still to do:

* Ability to separate preemptible and non-preemptible jobs

* Ability to find the ephemeral storage on nodes that have it

* Ability to make sure jobs requiring a lot of storage run on queues with a CE
that provides a lot of storage, or maybe use an EFS filesystem

* Satisfy MyPy

* Fix some format strings missing fields

* Get AWSBatchBatchSystemTest::test_run_jobs to pass

* Give AWS Batch tests a longer time to start running jobs

* Actually use the subset arguments

* Add docs for the AWSBatchBatchSystem options

* Remove debug logging that isn't type-correct

* Harmonize specialized and general unit converters

* Get rid of decorator we don't actually decorate with

* Enstatic __ensafen_name

* Revise job-finding loop to break properly

* Replace double underscores with admonishments

* Fix spelling

* De-magic the API's min request limits

* Move boto-needing AWS imports to a new file

* Just use or instead of a fancy higher-order function

* Add logging to debug the busted Docker builder

* Fix conversion function variables

* Fix new MyPy error in threading

* Work around missing overloads in Boto3 stubs

* Find the new client and resource location

* Add a bunch of casts that MyPy insists on

* Fix an actual but MyPy caught where we used the wrong queries to remove policies

* Appease MyPy's fear of re-used loop variables

* Deal with roles appearing to be a single item

* Fit base-type-shaped peg into no-base-type-shaped hole

* Tolerate and require better-typed boto3 stubs

* Require SDB and IAM stubs

* Adapt to establish_boto3_session moving

* Remove unused import

* Use variable we already have

* Adjust copyright year

* Factor out job packing for contained executor

* Adjust KubernetesBatchSystem variable naming to match other contained batch systems
  • Loading branch information
adamnovak authored Jan 28, 2022
1 parent 7f32521 commit 355d152
Show file tree
Hide file tree
Showing 37 changed files with 1,191 additions and 365 deletions.
2 changes: 2 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ variables:
MAIN_PYTHON_PKG: "python3.9"

before_script:
# Log where we are running, in case some Kubernetes hosts are busted. IPs are assigned per host.
- ip addr
# Configure Docker to use a mirror for Docker Hub and restart the daemon
# Set the registry as insecure because it is probably cluster-internal over plain HTTP.
- |
Expand Down
14 changes: 14 additions & 0 deletions docs/appendices/environment_vars.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,20 @@ There are several environment variables that affect the way Toil runs.
| | deleted until all associated nodes have been |
| | terminated. |
+----------------------------------+----------------------------------------------------+
| TOIL_AWS_BATCH_REGION | Region to use when using the AWS Batch batch |
| | system. Can often be autodetected from Boto |
| | configuration or the AWS region in which the |
| | current machine is running, if any. |
+----------------------------------+----------------------------------------------------+
| TOIL_AWS_BATCH_QUEUE | Name or ARN of an AWS Batch Queue to use with the |
| | AWS Batch batch system. |
+----------------------------------+----------------------------------------------------+
| TOIL_AWS_BATCH_JOB_ROLE_ARN | ARN of an IAM role to run AWS Batch jobs as with |
| | the AWS Batch batch system. If the jobs are not |
| | run with an IAM role or on machines that have |
| | access to S3 and SimpleDB, the AWS job store will |
| | not be usable. |
+----------------------------------+----------------------------------------------------+
| TOIL_GOOGLE_PROJECTID | The Google project ID to use when generating |
| | Google job store names for tests or CWL workflows. |
+----------------------------------+----------------------------------------------------+
Expand Down
14 changes: 14 additions & 0 deletions docs/running/cliOptions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,20 @@ the logging module:
--tesBearerToken TES_BEARER_TOKEN
Bearer token to use for authentication to TES server.

--awsBatchRegion AWS_BATCH_REGION
Region to use when using the AWS Batch batch system.
Can often be autodetected from Boto configuration or
the AWS region in which the current machine is running,
if any.
--awsBatchQueue AWS_BATCH_QUEUE
Name or ARN of an AWS Batch Queue to use with the AWS
Batch batch system.
--awsBatchJobRoleArn AWS_BATCH_JOB_ROLE_ARN
ARN of an IAM role to run AWS Batch jobs as with the
AWS Batch batch system. If the jobs are not run with an
IAM role or on machines that have access to S3 and
SimpleDB, the AWS job store will not be usable.

--scale SCALE A scaling factor to change the value of all submitted
tasks' submitted cores. Used in singleMachine batch
system. Useful for running workflows on smaller
Expand Down
4 changes: 2 additions & 2 deletions requirements-aws.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
boto>=2.48.0, <3
boto3>=1.17, <2
boto3-stubs[s3]>=1.17, <2
boto3>=1.20.35, <2
boto3-stubs[s3,sdb,iam]>=1.20.35.post1, <2
futures>=3.1.1, <4
2 changes: 1 addition & 1 deletion src/toil/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ def __init__(self, name, access_key=None, secret_key=None,
# We will backend into a boto3 resolver for getting credentials.
# Make sure to enable boto3's own caching, so we can share that
# cache with pure boto3 code elsewhere in Toil.
# Keep synced with toil.lib.ec2.establish_boto3_session
# Keep synced with toil.lib.aws.session.establish_boto3_session
self._boto3_resolver = create_credential_resolver(Session(profile=profile_name), cache=JSONFileCache())
else:
# We will use the normal flow
Expand Down
Loading

0 comments on commit 355d152

Please sign in to comment.