Skip to content

Unable to handle ElasticSearch "RequestEntityTooLarge" error correctly #30

@srkiNZ84

Description

@srkiNZ84

Hi,

We receive the following error when pushing a "chunk" to ElasticSearch:

2017-07-18 10:11:32 +0000 [warn]: failed to flush the buffer. plugin_id="object:15aa644" retry_time=10 next_retry=2017-07-18 10:20:36 +0000 chunk="554949b8663b0bd8416988071dcd1bf3" error_class=Elasticsearch::Transport::Transport::Errors::RequestEntityTooLarge error="[413] {\"Message\":\"Request size exceeded 10485760 bytes\"}"

017-07-18 10:11:32 +0000 [debug]: chunk taken back instance=22660880 chunk_id="554949b8663b0bd8416988071dcd1bf3" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.lib.rkt.pods.run.410d6112-b970-40d7-8b71-c5ee25452c17.stage1.rootfs.opt.stage2.hyperkube.rootfs.var.log.containers.service-2066068252-ns9xw_int_service-9058641f27539563ca400a4c1507ef500c48a6a2daa60d0b9330f0fd9c91b63e.log", variables=nil>

What happens next is that the plugin retries to deliver this chunk indefinitely and doesn't ever "move past it" and continue processing "chunks" which are not "too large".

Therefore, this effectively blocks any progress and logs are not ending up in our ElasticSearch cluster with no functional difference to FluentD being "down".

The plugin should not "retry" the "chunk" in this scenario as there is almost no chance that the ElasticSearch cluster will lift it's "max payload" limits.

I would expect the plugin to:

  • Log this as an "error" not a "warning"?
  • If the plugin receives this error back from ES, write the "chunk" to disk somewhere (dead letter queue?) and "move on"

We are also looking at filtering out very large log entries before they hit the plugin.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions