`batched-parallel` Updates for Occurrence Consumer

`batched-parallel` mode has performance issues because it uses a threadpool. even with rate limiting, there are still performance limitations to it in its current form 

there are some ideas for how to fix this: 
Add a prestep that partitions messages by fingerprint and passes them to `run_task_with_multiprocessing`
- similar to https://github.com/getsentry/sentry/blob/7d723dc6feecd8949910ce730c288b01816a923e/src/sentry/spans/consumers/process/factory.py#L315-L324 , we could do something along the lines of https://gist.github.com/roggenkemper/a782981eed3739d9ee1f4b36160365a4
- BatchStep to process a batch of messages, and produce a list of batches of messages, where each sublist is a list of messages with the same fingerprint
- Unbatch returns each of those individually, so that the multiprocessing step gets batches of messages for each fingerprint, rather than individual messages
- First, parallel step to deserialize, then batch, then process

Adding a timer to https://github.com/getsentry/sentry/blob/7cceb28b0e35d5ef49da29ab102ec2e5c2b1459d/src/sentry/issues/occurrence_consumer.py#L444 could be useful too to gain insight into performance 

	batch_processor = RunTask(
	function=batch_write_to_redis,
	next_step=commit_step,
	)

	batch_step = BatchStep(
	max_batch_size=self.max_batch_size,
	max_batch_time=self.max_batch_time,
	next_step=batch_processor,
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`batched-parallel` Updates for Occurrence Consumer #82044

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

batched-parallel Updates for Occurrence Consumer #82044

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`batched-parallel` Updates for Occurrence Consumer #82044