Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a more resilient s3 object name by default #606

Closed
wants to merge 2 commits into from

Conversation

worr
Copy link
Contributor

@worr worr commented Oct 22, 2020

Q A
Bug fix? kinda
New feature? kinda
API breaks? kinda
Deprecations? no
License Apache 2.0

What's in this PR?

This changes the default s3_object_key_format.

Why?

The s3 plugin uses a default object key that is problematic in a few ways.

  1. It makes HEAD requests for each chunk it uploads, starting from 1 each time.
    If you have uploaded 2000 log files within the same time slice, it will make
    2001 HEAD requests to figure out if it exists.

fluent/fluent-plugin-s3#160

  1. The above check is not thread-safe, and two threads can race and decide to
    use the same %{index} value, with the loser of the race overwriting the chunk
    from the winner.

fluent/fluent-plugin-s3#326

This is planned to change for v2, but there's no clear path to v2 right now.
The plugin does warn already if you use multiple threads and don't use either
%{chunk_id} or %{uuid_hash} in the object key.

Additional context

This does kind of break the API of the logging-operator, in that the final names of artifacts will be different after this is merged. Given that the default object key format is not super descriptive, I'm assuming that people that may need more explicit names have already overridden this variable.

This does preserve ordering of keys by putting the uuid after the %{time_slice}, in the case that people use the time-based keys to determine when a log was uploaded rather than any metadata.

I chose %{uuid_hash} over %{chunk_id} because I'm a little wary to choose a string that may or may not have strong uniqueness guarantees. Despite a %{random_hex} variable, this still triggers a warning in the plugin.

This seems like a good trade-off between the least impact while mitigating a common pitfall that users can easily run into (we ran into this which caused major issues in our logging infrastructure, between incredibly high latency as well as increased s3 costs due to requests and access logging).

@worr worr force-pushed the feature/fluentd-s3-efficient branch from 13a9617 to 8c4ead6 Compare October 22, 2020 23:04
The s3 plugin uses a default object key that is problematic in a few ways.

1. It makes HEAD requests for each chunk it uploads, starting from 1 each time.
If you have uploaded 2000 log files within the same time slice, it will make
2001 HEAD requests to figure out if it exists.

fluent/fluent-plugin-s3#160

2. The above check is not thread-safe, and two threads can race and decide to
use the same `%{index}` value, with the loser of the race overwriting the chunk
from the winner.

fluent/fluent-plugin-s3#326

This is planned to change for v2, but there's no clear path to v2 right now.
The plugin does warn already if you use multiple threads and don't use either
`%{chunk_id}` or `%{uuid_hash}` in the object key.
@worr worr force-pushed the feature/fluentd-s3-efficient branch from 8c4ead6 to 3878f7c Compare October 22, 2020 23:14
@ahma ahma requested a review from bshifter October 24, 2020 13:12
@tarokkk
Copy link
Contributor

tarokkk commented Nov 3, 2020

We introduced a similar concept with the One Eye format in PR #609 . I hope this resolves this PR as well.

@tarokkk tarokkk closed this Nov 4, 2020
@worr
Copy link
Contributor Author

worr commented Nov 4, 2020

@tarokkk Looks like it does. Thanks so much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants