Use a more resilient s3 object name by default #606
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's in this PR?
This changes the default
s3_object_key_format
.Why?
The s3 plugin uses a default object key that is problematic in a few ways.
If you have uploaded 2000 log files within the same time slice, it will make
2001 HEAD requests to figure out if it exists.
fluent/fluent-plugin-s3#160
use the same
%{index}
value, with the loser of the race overwriting the chunkfrom the winner.
fluent/fluent-plugin-s3#326
This is planned to change for v2, but there's no clear path to v2 right now.
The plugin does warn already if you use multiple threads and don't use either
%{chunk_id}
or%{uuid_hash}
in the object key.Additional context
This does kind of break the API of the logging-operator, in that the final names of artifacts will be different after this is merged. Given that the default object key format is not super descriptive, I'm assuming that people that may need more explicit names have already overridden this variable.
This does preserve ordering of keys by putting the uuid after the
%{time_slice}
, in the case that people use the time-based keys to determine when a log was uploaded rather than any metadata.I chose
%{uuid_hash}
over%{chunk_id}
because I'm a little wary to choose a string that may or may not have strong uniqueness guarantees. Despite a%{random_hex}
variable, this still triggers a warning in the plugin.This seems like a good trade-off between the least impact while mitigating a common pitfall that users can easily run into (we ran into this which caused major issues in our logging infrastructure, between incredibly high latency as well as increased s3 costs due to requests and access logging).