Skip to content

Place UUID after timestamp in filename #197

@cflee

Description

@cflee

#102 at 5276167#diff-b445c304223f019653cd681dd06bbff5 switched out the hostname in the S3 object key (filename) for a UUID, in order to avoid a small possibility of clashes between multiple Logstash hosts if they have the same hostname and timestamp (see review comment 58ad2a1#r79946161):

def generate_name
filename = "ls.s3.#{SecureRandom.uuid}.#{current_time}"
if tags.size > 0
"#{filename}.tag_#{tags.join('.')}.part#{counter}.#{extension}"
else
"#{filename}.part#{counter}.#{extension}"
end
end

This also has the nice property of ensuring some distribution of prefixes for S3 buckets. However, that is no longer required since Amazon S3's July 2018 announcement that randomizing object prefixes is no longer required for performance.

Therefore, it seems like we could improve this filename format in a way that meets the original intent of avoiding clashes.

Proposed solution

There is a common pattern used by various products (including AWS services) to have some common static prefix, then the dynamic values of a timestamp before the UUID.

There are two large benefits to this:

  • we can filter for files on a rough time range (year, month, day, hour, etc) by making S3 ListBucket request for all keys that begin with specified prefix
  • we can request for files created after a certain time by making S3 ListBucket request for all keys that are lexicographically after a specific object key

Just swapping the position of the UUID and timestamp would be sufficient.

Related: #134 (for fully configurable filenames?)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions