- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
[DO NOT MERGE] TransferManager: DirectoryUploader & DirectoryDownloader #3288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Draft
      
      
            jterapin
  wants to merge
  49
  commits into
  version-3
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
tm-directory-features
  
      
      
   
  
    
  
  
  
 
  
      
    base: version-3
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
  
     Draft
                    Changes from all commits
      Commits
    
    
            Show all changes
          
          
            49 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      6cb37fe
              
                Initial setup
              
              
                jterapin d31ae4d
              
                Minor adjustments
              
              
                jterapin 74ed189
              
                Directory downloader impl
              
              
                jterapin 4e8db17
              
                Directory uploader impl
              
              
                jterapin 098049d
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 8f387d2
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 7749ba5
              
                Add default executor
              
              
                jterapin 99f0de6
              
                Add running check to default executor
              
              
                jterapin 441fa82
              
                Refactor MultipartFileUploader with executor
              
              
                jterapin c792439
              
                Fix typo in MultipartFileUploader
              
              
                jterapin adce496
              
                Update TM upload file with executor
              
              
                jterapin 012c2bc
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin ee9c9da
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 75df844
              
                Merge from version-3
              
              
                jterapin e5d3245
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 173f5e4
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin cf88ff2
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 2758c4d
              
                Update to only spawn workers when needed
              
              
                jterapin b92d3b3
              
                Update directory uploader
              
              
                jterapin 6afb495
              
                Update directory uploader
              
              
                jterapin 86b53e8
              
                Update uploader
              
              
                jterapin d587ae1
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin eae3814
              
                Add minor improvements to directory uploader
              
              
                jterapin 14010ef
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 8ab4edc
              
                Fix specs
              
              
                jterapin face84d
              
                Minor updates to multipart file uploader
              
              
                jterapin 36a1e87
              
                Minor refactors
              
              
                jterapin 7dd9f98
              
                Fix options
              
              
                jterapin 77ab1ba
              
                Refactor DirectoryUploader
              
              
                jterapin e843137
              
                Merge version-3 into branch
              
              
                jterapin 009127d
              
                Update multipartfileuploader
              
              
                jterapin 39912fd
              
                Refactor FileDownloader
              
              
                jterapin f9fb117
              
                Implement Directory Downloader
              
              
                jterapin d307555
              
                Add TODO
              
              
                jterapin a14649a
              
                Merge version-3 into branch
              
              
                jterapin b9231e7
              
                Feedback - update default executor
              
              
                jterapin d991128
              
                Refactor file downloader
              
              
                jterapin bc533a0
              
                Support FileDownloader changes
              
              
                jterapin 9efc77f
              
                Extra updates to FileDownloader
              
              
                jterapin 1cc3fcf
              
                Address feedback for FileUploader and MultipartFileUploader
              
              
                jterapin 7b6b220
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 45d2f5d
              
                Add improvements to directory uploader
              
              
                jterapin 64d481e
              
                Update DirectoryDownloader based on feedbacks
              
              
                jterapin 2ab63fb
              
                Minor feedback updates
              
              
                jterapin 7af3e32
              
                Merge branch 'version-3' into tm-directory-features
              
              
                jterapin 747965f
              
                Update executor
              
              
                jterapin 2230478
              
                Improve Directory Uploader
              
              
                jterapin cb145a0
              
                Handle failure cases correctly
              
              
                jterapin 0cb35cd
              
                Improve Executor
              
              
                jterapin File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -1,6 +1,8 @@ | ||
| Unreleased Changes | ||
| ------------------ | ||
|  | ||
| * Feature - TODO | ||
|  | ||
| 1.199.1 (2025-09-25) | ||
| ------------------ | ||
|  | ||
|  | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| # frozen_string_literal: true | ||
|  | ||
| module Aws | ||
| module S3 | ||
| # @api private | ||
| class DefaultExecutor | ||
| RUNNING = :running | ||
| SHUTTING_DOWN = :shutting_down | ||
| SHUTDOWN = :shutdown | ||
|  | ||
| def initialize(options = {}) | ||
| @max_threads = options[:max_threads] || 10 | ||
| @state = RUNNING | ||
| @queue = Queue.new | ||
| @pool = [] | ||
| @mutex = Mutex.new | ||
| end | ||
|  | ||
| def post(*args, &block) | ||
| @mutex.synchronize do | ||
| raise 'Executor has been shutdown and is no longer accepting tasks' unless @state == RUNNING | ||
|  | ||
| @queue << [args, block] | ||
| ensure_worker_available | ||
| end | ||
| true | ||
| end | ||
|  | ||
| def kill | ||
| @mutex.synchronize do | ||
| @state = SHUTDOWN | ||
| @pool.each(&:kill) | ||
| @pool.clear | ||
| @queue.clear | ||
| end | ||
| true | ||
| end | ||
|  | ||
| def shutdown(timeout = nil) | ||
| @mutex.synchronize do | ||
| return true if @state == SHUTDOWN | ||
|  | ||
| @state = SHUTTING_DOWN | ||
| @pool.size.times { @queue << :shutdown } | ||
| end | ||
|  | ||
| if timeout | ||
| deadline = Time.now + timeout | ||
| @pool.each do |thread| | ||
| remaining = deadline - Time.now | ||
| break if remaining <= 0 | ||
|  | ||
| thread.join([remaining, 0].max) | ||
| end | ||
| @pool.select(&:alive?).each(&:kill) | ||
| else | ||
| @pool.each(&:join) | ||
| end | ||
|  | ||
| @pool.clear | ||
| @state = SHUTDOWN | ||
| true | ||
| end | ||
|  | ||
| def running? | ||
| @state == RUNNING | ||
| end | ||
|  | ||
| def shutting_down? | ||
| @state == SHUTTING_DOWN | ||
| end | ||
|  | ||
| def shutdown? | ||
| @state == SHUTDOWN | ||
| end | ||
|  | ||
| private | ||
|  | ||
| def ensure_worker_available | ||
| return unless @state == RUNNING | ||
|  | ||
| @pool.select!(&:alive?) | ||
| @pool << spawn_worker if @pool.size < @max_threads | ||
| end | ||
|  | ||
| def spawn_worker | ||
| Thread.new do | ||
| while (job = @queue.shift) | ||
| break if job == :shutdown | ||
|  | ||
| args, block = job | ||
| block.call(*args) | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| # frozen_string_literal: true | ||
|  | ||
| module Aws | ||
| module S3 | ||
| # Raised when DirectoryDownloader fails to download objects from S3 bucket | ||
| class DirectoryDownloadError < StandardError | ||
| def initialize(message, errors = []) | ||
| @errors = errors | ||
| super(message) | ||
| end | ||
|  | ||
| # @return [Array<StandardError>] The list of errors encountered when downloading objects | ||
| attr_reader :errors | ||
| end | ||
|  | ||
| # @api private | ||
| class DirectoryDownloader | ||
| def initialize(options = {}) | ||
| @client = options[:client] | ||
| @executor = options[:executor] | ||
| @abort_requested = false | ||
| @mutex = Mutex.new | ||
| end | ||
|  | ||
| attr_reader :abort_requested | ||
|  | ||
| def download(destination, bucket:, **options) | ||
| if File.exist?(destination) | ||
| raise ArgumentError, 'invalid destination, expected a directory' unless File.directory?(destination) | ||
| else | ||
| FileUtils.mkdir_p(destination) | ||
| end | ||
|  | ||
| download_opts = build_download_opts(destination, bucket, options) | ||
| downloader = FileDownloader.new(client: @client, executor: @executor) | ||
| producer = ObjectProducer.new(download_opts.merge(client: @client, directory_downloader: self)) | ||
| downloads, errors = process_download_queue(producer, downloader, download_opts) | ||
| build_result(downloads, errors) | ||
| ensure | ||
| @abort_requested = false | ||
| end | ||
|  | ||
| private | ||
|  | ||
| def request_abort | ||
| @mutex.synchronize { @abort_requested = true } | ||
| end | ||
| def build_download_opts(destination, bucket, opts) | ||
| { | ||
| destination: destination, | ||
| bucket: bucket, | ||
| s3_prefix: opts.delete(:s3_prefix), | ||
| ignore_failure: opts.delete(:ignore_failure) || false, | ||
| filter_callback: opts.delete(:filter_callback), | ||
| progress_callback: opts.delete(:progress_callback) | ||
| } | ||
| end | ||
|  | ||
| def build_result(download_count, errors) | ||
| if @abort_requested | ||
| msg = "directory download failed: #{errors.map(&:message).join('; ')}" | ||
| raise DirectoryDownloadError.new(msg, errors) | ||
| else | ||
| { | ||
| completed_downloads: [download_count - errors.count, 0].max, | ||
| failed_downloads: errors.count, | ||
| errors: errors.any? ? errors : nil | ||
| }.compact | ||
| end | ||
| end | ||
|  | ||
| def handle_error(executor, opts) | ||
| return if opts[:ignore_failure] | ||
|  | ||
| request_abort | ||
| executor.kill | ||
| end | ||
|  | ||
| def process_download_queue(producer, downloader, opts) | ||
| # Separate executor for lightweight queuing tasks, | ||
| # avoiding interference with main @executor lifecycle | ||
| queue_executor = DefaultExecutor.new | ||
|         
                  jterapin marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| progress = DirectoryProgress.new(opts[:progress_callback]) if opts[:progress_callback] | ||
| download_attempts = 0 | ||
| errors = [] | ||
| begin | ||
| producer.each do |object| | ||
| break if @abort_requested | ||
|  | ||
| download_attempts += 1 | ||
| queue_executor.post(object) do |o| | ||
| dir_path = File.dirname(o[:path]) | ||
| FileUtils.mkdir_p(dir_path) unless dir_path == opts[:destination] || Dir.exist?(dir_path) | ||
|  | ||
| downloader.download(o[:path], bucket: opts[:bucket], key: o[:key]) | ||
| progress&.call(File.size(o[:path])) | ||
| rescue StandardError => e | ||
| errors << e | ||
| handle_error(queue_executor, opts) | ||
| end | ||
| end | ||
| rescue StandardError => e | ||
| errors << e | ||
| handle_error(queue_executor, opts) | ||
| end | ||
| queue_executor.shutdown | ||
| [download_attempts, errors] | ||
| end | ||
|  | ||
| # @api private | ||
| class ObjectProducer | ||
| include Enumerable | ||
|  | ||
| DEFAULT_QUEUE_SIZE = 100 | ||
|  | ||
| def initialize(options = {}) | ||
| @destination_dir = options[:destination] | ||
| @client = options[:client] | ||
| @bucket = options[:bucket] | ||
| @s3_prefix = options[:s3_prefix] | ||
| @filter_callback = options[:filter_callback] | ||
| @directory_downloader = options[:directory_downloader] | ||
| @object_queue = SizedQueue.new(DEFAULT_QUEUE_SIZE) | ||
| end | ||
|  | ||
| def each | ||
| producer_thread = Thread.new do | ||
| stream_objects | ||
| ensure | ||
| @object_queue << :done | ||
| end | ||
|  | ||
| # Yield objects from internal queue | ||
| while (object = @object_queue.shift) != :done | ||
| break if @directory_downloader.abort_requested | ||
|  | ||
| yield object | ||
| end | ||
| ensure | ||
| producer_thread.join | ||
| end | ||
|  | ||
| private | ||
|  | ||
| def build_object_entry(key) | ||
| { path: File.join(@destination_dir, normalize_key(key)), key: key } | ||
| end | ||
|  | ||
| # TODO: double check handling of objects that ends with / | ||
| def stream_objects(continuation_token: nil) | ||
| resp = @client.list_objects_v2(bucket: @bucket, prefix: @s3_prefix, continuation_token: continuation_token) | ||
| resp.contents.each do |o| | ||
| break if @directory_downloader.abort_requested | ||
| next if o.key.end_with?('/') | ||
| next unless include_object?(o.key) | ||
|  | ||
| @object_queue << build_object_entry(o.key) | ||
| end | ||
| stream_objects(continuation_token: resp.next_continuation_token) if resp.next_continuation_token | ||
| end | ||
|  | ||
| def include_object?(key) | ||
| return true unless @filter_callback | ||
|  | ||
| @filter_callback.call(key) | ||
| end | ||
|  | ||
| def normalize_key(key) | ||
| key = key.delete_prefix(@s3_prefix) if @s3_prefix | ||
| File::SEPARATOR == '/' ? key : key.tr('/', File::SEPARATOR) | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| # frozen_string_literal: true | ||
|  | ||
| module Aws | ||
| module S3 | ||
| # @api private | ||
| class DirectoryProgress | ||
| def initialize(progress_callback) | ||
| @transferred_bytes = 0 | ||
| @transferred_files = 0 | ||
| @progress_callback = progress_callback | ||
| @mutex = Mutex.new | ||
| end | ||
|  | ||
| def call(bytes_transferred) | ||
| @mutex.synchronize do | ||
| @transferred_bytes += bytes_transferred | ||
| @transferred_files += 1 | ||
|  | ||
| @progress_callback.call(@transferred_bytes, @transferred_files) | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | 
      
      Oops, something went wrong.
        
    
  
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention we were putting these in separate files right? If you want to promote the other two (multipart errors) to the files where they are used that's fine too, but let's stay consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I'm planning to separate them out.