-
Notifications
You must be signed in to change notification settings - Fork 151
Description
#102 at 5276167#diff-b445c304223f019653cd681dd06bbff5 switched out the hostname in the S3 object key (filename) for a UUID, in order to avoid a small possibility of clashes between multiple Logstash hosts if they have the same hostname and timestamp (see review comment 58ad2a1#r79946161):
logstash-output-s3/lib/logstash/outputs/s3/temporary_file_factory.rb
Lines 66 to 74 in 9d02bc2
def generate_name | |
filename = "ls.s3.#{SecureRandom.uuid}.#{current_time}" | |
if tags.size > 0 | |
"#{filename}.tag_#{tags.join('.')}.part#{counter}.#{extension}" | |
else | |
"#{filename}.part#{counter}.#{extension}" | |
end | |
end |
This also has the nice property of ensuring some distribution of prefixes for S3 buckets. However, that is no longer required since Amazon S3's July 2018 announcement that randomizing object prefixes is no longer required for performance.
Therefore, it seems like we could improve this filename format in a way that meets the original intent of avoiding clashes.
Proposed solution
There is a common pattern used by various products (including AWS services) to have some common static prefix, then the dynamic values of a timestamp before the UUID.
There are two large benefits to this:
- we can filter for files on a rough time range (year, month, day, hour, etc) by making S3 ListBucket request for all keys that begin with specified prefix
- we can request for files created after a certain time by making S3 ListBucket request for all keys that are lexicographically after a specific object key
Just swapping the position of the UUID and timestamp would be sufficient.
Related: #134 (for fully configurable filenames?)