Skip to content

Adding feature to support dynamic prefix #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

sdiazb
Copy link

@sdiazb sdiazb commented Feb 17, 2016

No description provided.

@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

@sdiazb
Copy link
Author

sdiazb commented Feb 24, 2016

Hi guys @purbon @suyograo @ph !

First of all I would like to thank all of you for your work. Just to get an idea (no pressure), do you think we could have this reviewed any time soon?

Thanks in advance!

@lmello
Copy link

lmello commented Mar 27, 2016

any chance this can get merged ??
this is a really important configuration for people that are using s3 and want to store the logs following a dynamic path like yyyy/MM/dd/hostname/ as prefix.

@ph
Copy link
Contributor

ph commented Mar 29, 2016

The problem with that pull request is this change isn't thread safe.
Also since the uploading to S3 is async the value of @prefix can possibility changes between the received events so this code cannot guarantee it will be in the right place.

The use case you are looking to have is to dynamically chose where the events are send? I suggest you configure multiple outputs in your configuration and use the conditionals to route the events to the right output.

@sdiazb
Copy link
Author

sdiazb commented Apr 6, 2016

You are absolutely right and if it was thread safe for us it was because we used conditionals to route to different prefixes.

Anyway, having dynamic temporal partitions by date could be an interesting feature, so I have provided a new fix where the prefix is rendered when uploading data to S3 based on the current date and not based on events. Maybe this is a valid solution that allow us to have temporal partitions with a thread safe implementation.

@ph what is your opinion about this approach? Thanks in advance.

@suyograo suyograo added the P2 label Apr 26, 2016
@ph
Copy link
Contributor

ph commented Apr 26, 2016

Anyway, having dynamic temporal partitions by date could be an interesting feature, so I have provided a new fix where the prefix is rendered when uploading data to S3 based on the current date and not based on events. Maybe this is a valid solution that allow us to have temporal partitions with a thread safe implementation.

This make more sense to me, to use the current time as the diviser of the file.

@ph
Copy link
Contributor

ph commented Apr 26, 2016

@sdiazb Can we make that option configurable and make it off by default?

@sdiazb
Copy link
Author

sdiazb commented Apr 26, 2016

Hi @ph!

I am not sure that I have understood the last comment. I think that, with the current implementation, the option is configurable and it is off by default.

In the conf file, it is not mandatory to explicit the prefix, so you can have a conf like this:

...
output {
  s3 {
     access_key_id => "AK"
     secret_access_key => "SK"
     region => "eu-west-1"
     bucket => "mybucket"
     time_file => 1
     codec => "plain"
  }
}

Also it is possible to specify a static prefix like this:

...
output {
  s3 {
     access_key_id => "AK"
     secret_access_key => "SK"
     region => "eu-west-1"
     bucket => "mybucket"
     time_file => 1
     codec => "plain"
     prefix => "test/"
  }
}

And with this feature it would be possible to specify a dynamic prefix, so the date is rendered in the prefix as follows:

...
output {
  s3 {
     access_key_id => "AK"
     secret_access_key => "SK"
     region => "eu-west-1"
     bucket => "mybucket"
     time_file => 1
     codec => "plain"
     prefix => "test/%Y%m%d%H%M%S/"
  }
}

I am afraid that I missed something from your last comment. Could you clarify it a little bit so we can improve the PR?

Thanks in advance!

@lremurphy
Copy link

There are already a number of workarounds for dynamically adding time/date prefixes in the s3 bucket. Is this dynamic prefixing only going to deal with temporal data? What about actual dynamic folder names based on (for instance) root-level attributes passed in from filebeat? Or the hostname of the originating log data (like @lmello suggested)? This can be dealt with using conditional statements but as the infrastructure gets more complicated this can be difficult to maintain....

@ph
Copy link
Contributor

ph commented Sep 13, 2016

@lremurphy everything that string support will be supported in the prefix, so it could be any fields or date.

@ph
Copy link
Contributor

ph commented Feb 24, 2017

Fixed in 4.0, by #102

@ph ph closed this Feb 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants