Skip to content

Ability to pick S3 download multipart threshold, and other settings as well.  #968

Open
@amircohere

Description

@amircohere

Describe the feature

In python with Boto3, I can do the following:

from boto3.s3.transfer import TransferConfig

config = TransferConfig(
    multipart_threshold=4 * 1024 * 1024 * 1024,  # 4GB
    max_concurrency=1,
    multipart_chunksize=32 * 1024 * 1024,  # 32MB
)

# some code here...

self.s3.download_file(
    Bucket="commoncrawl",
    Key="path_to_file.txt",
    Filename="local.txt",
    Config=config,
)

A way to do this from aws-rust-sdk, or at least use the locally configured rules, for example

> aws configure set s3.multipart_threshold 4GB

Use Case

Common Crawl's bucket is always rate limited (on requests) but not bandwidth, so avoiding multipart downloads is the only way to reliably download it. The time difference is between 10 seconds and 10 minutes.

Proposed Solution

No response

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

A note for the community

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue, please leave a comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions