Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify s3 functions: create_s3_bucket. #4566

Open
wants to merge 62 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
062da74
Simplify create s3 bucket.
DailyDreaming Aug 16, 2023
1142681
Remove old function.
DailyDreaming Aug 16, 2023
25ab299
Resolve one more mk bucket location.
DailyDreaming Aug 16, 2023
5bf34bc
Simplify tagging.
DailyDreaming Aug 17, 2023
cd47aa1
Typing.
DailyDreaming Aug 17, 2023
b39d561
Typing.
DailyDreaming Aug 17, 2023
5428f5e
Merge branch 'master' into issues/4088-mv-s3-functions-mk-bucket
adamnovak Aug 17, 2023
5b25a35
Simplify s3 functions: delete_s3_bucket. (#4567)
DailyDreaming Aug 31, 2023
7646a5d
Merge branch 'master' into issues/4088-mv-s3-functions-mk-bucket
adamnovak Aug 31, 2023
095ba36
Rebase.
DailyDreaming Apr 30, 2024
29a2ecc
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 1, 2024
8870d67
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 1, 2024
db585b7
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 1, 2024
2bb9092
Missing import.
DailyDreaming May 1, 2024
cb9e474
Another missing import.
DailyDreaming May 2, 2024
0e08474
Merge branch 'master' into issues/4088-mv-s3-functions-mk-bucket
DailyDreaming May 2, 2024
3ca63ed
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 2, 2024
3e21094
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 6, 2024
7f4f0d6
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 6, 2024
a4a395e
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 6, 2024
0a950bf
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 6, 2024
03941e9
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 6, 2024
3323e08
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 7, 2024
9341962
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 9, 2024
c0eea1b
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 14, 2024
59b4b8a
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 14, 2024
52be927
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 14, 2024
2242ace
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 17, 2024
8c3a876
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 17, 2024
a9c62db
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 18, 2024
f779127
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 20, 2024
d13fc62
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 20, 2024
cc6ab3c
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 21, 2024
fbdc80a
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 21, 2024
392064b
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 21, 2024
74d8ee8
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 23, 2024
004f139
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] May 24, 2024
455da2a
Merge branch 'master' into issues/4088-mv-s3-functions-mk-bucket
DailyDreaming Jun 6, 2024
53dcd42
Remove cruft.
DailyDreaming Jun 6, 2024
f6876ad
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 6, 2024
797624d
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 10, 2024
b468005
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 10, 2024
04c47e2
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 10, 2024
77303d6
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 11, 2024
0c48203
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 11, 2024
8e35f61
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 11, 2024
56b6ad6
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 19, 2024
ed97974
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 19, 2024
90d54a8
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 24, 2024
21a221e
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Jun 25, 2024
39f9c4c
Rebase from master.
DailyDreaming Jun 28, 2024
21bbf5e
Patch test_utils.py.
DailyDreaming Jun 28, 2024
8b3c897
Merge branch 'master' into issues/4088-mv-s3-functions-mk-bucket
DailyDreaming Aug 19, 2024
6be49c0
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 19, 2024
85558eb
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 19, 2024
5ac40e0
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 21, 2024
b9efc4a
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 22, 2024
cc6be4b
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 22, 2024
e89e6ac
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 22, 2024
9a3f53a
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 27, 2024
68954a2
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Aug 27, 2024
3aaf340
Merge master into issues/4088-mv-s3-functions-mk-bucket
github-actions[bot] Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/toil/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
QueueSizeMessage,
gen_message_bus_path)
from toil.fileStores import FileID
from toil.lib.aws import zone_to_region, build_tag_dict_from_env
from toil.lib.aws import zone_to_region
from toil.lib.compatibility import deprecated
from toil.lib.conversions import bytes2human, human2bytes
from toil.lib.io import try_path
Expand Down
137 changes: 52 additions & 85 deletions src/toil/jobStores/aws/jobStore.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,13 @@
# limitations under the License.
import hashlib
import itertools
import json
import logging
import os
import pickle
import re
import reprlib
import stat
import time
import urllib.error
import urllib.request
import uuid
from contextlib import contextmanager
from io import BytesIO
Expand All @@ -35,7 +32,6 @@
from botocore.exceptions import ClientError

import toil.lib.encryption as encryption
from toil.lib.aws import build_tag_dict_from_env
from toil.fileStores import FileID
from toil.jobStores.abstractJobStore import (AbstractJobStore,
ConcurrentFileModificationException,
Expand All @@ -57,10 +53,8 @@
ReadableTransformingPipe,
WritablePipe)
from toil.lib.aws.session import establish_boto3_session
from toil.lib.aws.utils import (create_s3_bucket,
enable_public_objects,
flatten_tags,
get_bucket_region,
from toil.lib.aws.s3 import create_s3_bucket
from toil.lib.aws.utils import (get_bucket_region,
get_object_for_url,
list_objects_for_url,
retry_s3,
Expand Down Expand Up @@ -721,85 +715,58 @@ def bucket_retry_predicate(error):
return False

bucketExisted = True
for attempt in retry_s3(predicate=bucket_retry_predicate):
with attempt:
try:
# the head_bucket() call makes sure that the bucket exists and the user can access it
self.s3_client.head_bucket(Bucket=bucket_name)

bucket = self.s3_resource.Bucket(bucket_name)
except ClientError as e:
error_http_status = get_error_status(e)
if error_http_status == 404:
bucketExisted = False
logger.debug("Bucket '%s' does not exist.", bucket_name)
if create:
bucket = create_s3_bucket(
self.s3_resource, bucket_name, self.region
)
# Wait until the bucket exists before checking the region and adding tags
bucket.wait_until_exists()

# It is possible for create_bucket to return but
# for an immediate request for the bucket region to
# produce an S3ResponseError with code
# NoSuchBucket. We let that kick us back up to the
# main retry loop.
assert (
get_bucket_region(bucket_name) == self.region
), f"bucket_name: {bucket_name}, {get_bucket_region(bucket_name)} != {self.region}"

tags = build_tag_dict_from_env()

if tags:
flat_tags = flatten_tags(tags)
bucket_tagging = self.s3_resource.BucketTagging(bucket_name)
bucket_tagging.put(Tagging={'TagSet': flat_tags})

# Configure bucket so that we can make objects in
# it public, which was the historical default.
enable_public_objects(bucket_name)
elif block:
raise
else:
return None
elif error_http_status == 301:
# This is raised if the user attempts to get a bucket in a region outside
# the specified one, if the specified one is not `us-east-1`. The us-east-1
# server allows a user to use buckets from any region.
raise BucketLocationConflictException(get_bucket_region(bucket_name))
else:
raise
else:
bucketRegion = get_bucket_region(bucket_name)
if bucketRegion != self.region:
raise BucketLocationConflictException(bucketRegion)

if versioning and not bucketExisted:
# only call this method on bucket creation
bucket.Versioning().enable()
# Now wait until versioning is actually on. Some uploads
# would come back with no versions; maybe they were
# happening too fast and this setting isn't sufficiently
# consistent?
time.sleep(1)
while not self._getBucketVersioning(bucket_name):
logger.warning(f"Waiting for versioning activation on bucket '{bucket_name}'...")
time.sleep(1)
elif check_versioning_consistency:
# now test for versioning consistency
# we should never see any of these errors since 'versioning' should always be true
bucket_versioning = self._getBucketVersioning(bucket_name)
if bucket_versioning != versioning:
assert False, 'Cannot modify versioning on existing bucket'
elif bucket_versioning is None:
assert False, 'Cannot use a bucket with versioning suspended'
if bucketExisted:
logger.debug(f"Using pre-existing job store bucket '{bucket_name}'.")
try:
# the head_bucket() call makes sure that the bucket exists and the user can access it
self.s3_client.head_bucket(Bucket=bucket_name)
bucket = self.s3_resource.Bucket(bucket_name)
except ClientError as e:
error_http_status = get_error_status(e)
if error_http_status == 404:
bucketExisted = False
logger.debug("Bucket '%s' does not exist.", bucket_name)
if create:
bucket = create_s3_bucket(self.s3_resource, bucket_name, self.region)
elif block:
raise
else:
logger.debug(f"Created new job store bucket '{bucket_name}' with versioning state {versioning}.")
return None
elif error_http_status == 301:
# This is raised if the user attempts to get a bucket in a region outside
# the specified one, if the specified one is not `us-east-1`. The us-east-1
# server allows a user to use buckets from any region.
raise BucketLocationConflictException(get_bucket_region(bucket_name))
else:
raise
else:
bucketRegion = get_bucket_region(bucket_name)
if bucketRegion != self.region:
raise BucketLocationConflictException(bucketRegion)

if versioning and not bucketExisted:
# only call this method on bucket creation
bucket.Versioning().enable()
# Now wait until versioning is actually on. Some uploads
# would come back with no versions; maybe they were
# happening too fast and this setting isn't sufficiently
# consistent?
time.sleep(1)
while not self._getBucketVersioning(bucket_name):
logger.warning(f"Waiting for versioning activation on bucket '{bucket_name}'...")
time.sleep(1)
elif check_versioning_consistency:
# now test for versioning consistency
# we should never see any of these errors since 'versioning' should always be true
bucket_versioning = self._getBucketVersioning(bucket_name)
if bucket_versioning != versioning:
assert False, 'Cannot modify versioning on existing bucket'
elif bucket_versioning is None:
assert False, 'Cannot use a bucket with versioning suspended'
if bucketExisted:
logger.debug(f"Using pre-existing job store bucket '{bucket_name}'.")
else:
logger.debug(f"Created new job store bucket '{bucket_name}' with versioning state {versioning}.")

return bucket
return bucket

def _bindDomain(self, domain_name, create=False, block=True):
"""
Expand Down
26 changes: 11 additions & 15 deletions src/toil/lib/aws/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,29 +168,25 @@ def file_begins_with(path, prefix):
except (URLError, socket.timeout, HTTPException):
return False


def running_on_ecs() -> bool:
"""
Return True if we are currently running on Amazon ECS, and false otherwise.
"""
# We only care about relatively current ECS
return 'ECS_CONTAINER_METADATA_URI_V4' in os.environ

def build_tag_dict_from_env(environment: MutableMapping[str, str] = os.environ) -> Dict[str, str]:
tags = dict()
owner_tag = environment.get('TOIL_OWNER_TAG')

def tags_from_env() -> Dict[str, str]:
try:
tags = json.loads(os.environ.get('TOIL_AWS_TAGS', '{}'))
except json.decoder.JSONDecodeError:
logger.error('TOIL_AWS_TAGS must be in JSON format: {"key" : "value", ...}')
exit(1)

# TODO: Remove TOIL_OWNER_TAG and only use TOIL_AWS_TAGS .
DailyDreaming marked this conversation as resolved.
Show resolved Hide resolved
owner_tag = os.environ.get('TOIL_OWNER_TAG')
if owner_tag:
tags.update({'Owner': owner_tag})

user_tags = environment.get('TOIL_AWS_TAGS')
if user_tags:
try:
json_user_tags = json.loads(user_tags)
if isinstance(json_user_tags, dict):
tags.update(json.loads(user_tags))
else:
logger.error('TOIL_AWS_TAGS must be in JSON format: {"key" : "value", ...}')
exit(1)
except json.decoder.JSONDecodeError:
logger.error('TOIL_AWS_TAGS must be in JSON format: {"key" : "value", ...}')
exit(1)
return tags
84 changes: 84 additions & 0 deletions src/toil/lib/aws/s3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Copyright (C) 2015-2023 Regents of the University of California
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import sys

from typing import (Any,
Dict,
Optional,
Union)

from toil.lib.retry import retry
from toil.lib.aws import tags_from_env
from toil.lib.aws.utils import enable_public_objects, flatten_tags

if sys.version_info >= (3, 8):
from typing import Literal
else:
from typing_extensions import Literal

try:
from boto.exception import BotoServerError, S3ResponseError
from botocore.exceptions import ClientError
from mypy_boto3_iam import IAMClient, IAMServiceResource
from mypy_boto3_s3 import S3Client, S3ServiceResource
from mypy_boto3_s3.literals import BucketLocationConstraintType
from mypy_boto3_s3.service_resource import Bucket, Object
from mypy_boto3_sdb import SimpleDBClient
except ImportError:
BotoServerError = Exception # type: ignore
S3ResponseError = Exception # type: ignore
ClientError = Exception # type: ignore
# AWS/boto extra is not installed


logger = logging.getLogger(__name__)


@retry(errors=[BotoServerError, S3ResponseError, ClientError])
def create_s3_bucket(
s3_resource: "S3ServiceResource",
bucket_name: str,
region: Union["BucketLocationConstraintType", Literal["us-east-1"]],
tags: Optional[Dict[str, str]] = None,
public: bool = True
) -> "Bucket":
"""
Create an AWS S3 bucket, using the given Boto3 S3 session, with the
given name, in the given region.

Supports the us-east-1 region, where bucket creation is special.

*ALL* S3 bucket creation should use this function.
"""
logger.debug("Creating bucket '%s' in region %s.", bucket_name, region)
if region == "us-east-1": # see https://github.com/boto/boto3/issues/125
bucket = s3_resource.create_bucket(Bucket=bucket_name)
else:
bucket = s3_resource.create_bucket(
Bucket=bucket_name,
CreateBucketConfiguration={"LocationConstraint": region},
)
# wait until the bucket exists before adding tags
bucket.wait_until_exists()

tags = tags_from_env() if tags is None else tags
bucket_tagging = s3_resource.BucketTagging(bucket_name)
bucket_tagging.put(Tagging={'TagSet': flatten_tags(tags)}) # type: ignore

# enabling public objects is the historical default
if public:
enable_public_objects(bucket_name)

return bucket
23 changes: 0 additions & 23 deletions src/toil/lib/aws/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,29 +195,6 @@ def delete_s3_bucket(
printq(f'\n * S3 bucket no longer exists: {bucket}\n\n', quiet)


def create_s3_bucket(
s3_resource: "S3ServiceResource",
bucket_name: str,
region: Union["BucketLocationConstraintType", Literal["us-east-1"]],
) -> "Bucket":
"""
Create an AWS S3 bucket, using the given Boto3 S3 session, with the
given name, in the given region.

Supports the us-east-1 region, where bucket creation is special.

*ALL* S3 bucket creation should use this function.
"""
logger.debug("Creating bucket '%s' in region %s.", bucket_name, region)
if region == "us-east-1": # see https://github.com/boto/boto3/issues/125
bucket = s3_resource.create_bucket(Bucket=bucket_name)
else:
bucket = s3_resource.create_bucket(
Bucket=bucket_name,
CreateBucketConfiguration={"LocationConstraint": region},
)
return bucket

@retry(errors=[ClientError])
def enable_public_objects(bucket_name: str) -> None:
"""
Expand Down
10 changes: 2 additions & 8 deletions src/toil/provisioners/aws/awsProvisioner.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
get_policy_permissions,
policy_permissions_allow)
from toil.lib.aws.session import AWSConnectionManager
from toil.lib.aws.utils import create_s3_bucket
from toil.lib.aws.s3 import create_s3_bucket
from toil.lib.conversions import human2bytes
from toil.lib.ec2 import (a_short_time,
create_auto_scaling_group,
Expand Down Expand Up @@ -259,14 +259,8 @@ def _write_file_to_cloud(self, key: str, contents: bytes) -> str:
bucket = s3.Bucket(bucket_name)
except ClientError as err:
if get_error_status(err) == 404:
bucket = create_s3_bucket(s3, bucket_name=bucket_name, region=self._region)
bucket.wait_until_exists()
bucket = create_s3_bucket(s3, bucket_name, self._region)
bucket.Versioning().enable()

owner_tag = os.environ.get('TOIL_OWNER_TAG')
if owner_tag:
bucket_tagging = s3.BucketTagging(bucket_name)
bucket_tagging.put(Tagging={'TagSet': [{'Key': 'Owner', 'Value': owner_tag}]})
else:
raise

Expand Down
Loading