-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Hi,
We receive the following error when pushing a "chunk" to ElasticSearch:
2017-07-18 10:11:32 +0000 [warn]: failed to flush the buffer. plugin_id="object:15aa644" retry_time=10 next_retry=2017-07-18 10:20:36 +0000 chunk="554949b8663b0bd8416988071dcd1bf3" error_class=Elasticsearch::Transport::Transport::Errors::RequestEntityTooLarge error="[413] {\"Message\":\"Request size exceeded 10485760 bytes\"}"
017-07-18 10:11:32 +0000 [debug]: chunk taken back instance=22660880 chunk_id="554949b8663b0bd8416988071dcd1bf3" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.lib.rkt.pods.run.410d6112-b970-40d7-8b71-c5ee25452c17.stage1.rootfs.opt.stage2.hyperkube.rootfs.var.log.containers.service-2066068252-ns9xw_int_service-9058641f27539563ca400a4c1507ef500c48a6a2daa60d0b9330f0fd9c91b63e.log", variables=nil>
What happens next is that the plugin retries to deliver this chunk indefinitely and doesn't ever "move past it" and continue processing "chunks" which are not "too large".
Therefore, this effectively blocks any progress and logs are not ending up in our ElasticSearch cluster with no functional difference to FluentD being "down".
The plugin should not "retry" the "chunk" in this scenario as there is almost no chance that the ElasticSearch cluster will lift it's "max payload" limits.
I would expect the plugin to:
- Log this as an "error" not a "warning"?
- If the plugin receives this error back from ES, write the "chunk" to disk somewhere (dead letter queue?) and "move on"
We are also looking at filtering out very large log entries before they hit the plugin.