Out of memory in case of Splunk indexer slowness/failure #423

ludovic-boutros · 2024-02-28T16:23:02Z

Hello,
We are using the Splunk Sink Connector with these main parameters:

{
	"name": "SplunkHECSinkConnector",
	"config":{
		"connector.class": "com.splunk.kafka.connect.SplunkSinkConnector",
		"tasks.max": "6",
		"splunk.hec.ack.enabled": "true",
		"splunk.hec.max.outstanding.events": "50000",
		"splunk.hec.max.retries": "-1",
		"splunk.hec.backoff.threshhold.seconds": "60",
		"splunk.hec.threads": "1"
	}
}

In my understanding, we should never have more than 50000 events per task kept in memory.
But that is not the case if Splunk indexers encounter slowness or failures.

We can observe in the Kafka Connect logs such errors and messages:

[2024-02-27 06:39:24,527] INFO [SplunkHECSinkConnector|task-5] handled 394 failed batches with 193452 events (com.splunk.kafka.connect.SplunkSinkTask:154)

I have attached the Kafka Connect metrics during a Splunk indexer stress test.
You can observe the out of memory and the number of active records.

VihasMakwana · 2024-03-07T13:07:24Z

@ludovic-boutros thanks for this.
I will take a look and give my thoughts on this.

VihasMakwana · 2024-04-01T19:37:18Z

@ludovic-boutros can you attach the entire kafka connect logs?

ludovic-boutros · 2024-04-08T09:24:12Z

@VihasMakwana we will open a case on Splunk side in order to send you the complete logs in a more secure way.
We decreased the outstanding event property to 10000 and still we have OOM issues.

ludovic-boutros · 2024-04-08T09:29:04Z

I still don't understand how the sink-record-active-count can be so high? Shouldn't it stay under the splunk.hec.max.outstanding.events value?

ludovic-boutros · 2024-04-09T08:58:11Z

I have put some classes in debug log level.
Here is on interesting one:

[2024-04-09 10:54:52,049] DEBUG [SplunkHECSinkIntConnector|task-0] tid=152 received 261 records with total -4656978 outstanding events tracked (com.splunk.kafka.connect.SplunkSinkTask:83)

I don't think this negative number is normal ;)

ludovic-boutros · 2024-04-09T13:09:57Z

This incorrect outstanding event count makes the outstanding event limit ineffective and this leads to an out of memory in case of Splunk slowness/failure. This is my understanding.

ludovic-boutros · 2024-04-09T15:48:41Z

@VihasMakwana I did not manage to understand how the outstanding event count could be lower than zero.
Do you think we could just add a small check and keep it greater than or equal to zero in the event tracker?
I don't know if it would add side effects, but it would at least assert that the max outstanding event count would be effective.

ludovic-boutros · 2024-04-24T07:34:57Z

@VihasMakwana I have patched the connector to prevent negative event count and we can see the effect on the number of events kept in memory. It does not fix the real issue but at least the "symptoms".

ludovic-boutros · 2024-05-28T08:51:25Z

Resolved by #431

ludovic-boutros mentioned this issue Apr 25, 2024

Fix: record tracker does not support concurrent event batch removal #431

Merged

ludovic-boutros closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory in case of Splunk indexer slowness/failure #423

Out of memory in case of Splunk indexer slowness/failure #423

ludovic-boutros commented Feb 28, 2024

VihasMakwana commented Mar 7, 2024

VihasMakwana commented Apr 1, 2024 •

edited

Loading

ludovic-boutros commented Apr 8, 2024

ludovic-boutros commented Apr 8, 2024

ludovic-boutros commented Apr 9, 2024

ludovic-boutros commented Apr 9, 2024

ludovic-boutros commented Apr 9, 2024

ludovic-boutros commented Apr 24, 2024 •

edited

Loading

ludovic-boutros commented May 28, 2024

Out of memory in case of Splunk indexer slowness/failure #423

Out of memory in case of Splunk indexer slowness/failure #423

Comments

ludovic-boutros commented Feb 28, 2024

VihasMakwana commented Mar 7, 2024

VihasMakwana commented Apr 1, 2024 • edited Loading

ludovic-boutros commented Apr 8, 2024

ludovic-boutros commented Apr 8, 2024

ludovic-boutros commented Apr 9, 2024

ludovic-boutros commented Apr 9, 2024

ludovic-boutros commented Apr 9, 2024

ludovic-boutros commented Apr 24, 2024 • edited Loading

ludovic-boutros commented May 28, 2024

VihasMakwana commented Apr 1, 2024 •

edited

Loading

ludovic-boutros commented Apr 24, 2024 •

edited

Loading