Skip to content

Conversation

fangnx
Copy link
Member

@fangnx fangnx commented Sep 11, 2025

What

This PR implements comprehensive performance benchmarking for AIOConsumer and adds critical performance guidance to help developers optimize their async Kafka consumer applications.

What This PR Does

  • Adds parameterized benchmarking for AIOConsumer with configurable batch sizes (batch_size=[1, 5, 20])
  • Implements comprehensive metrics collection and reporting for consumer performance analysis
  • Documents performance characteristics directly in AIOConsumer.poll() and consume() method docstrings
  • Creates a complete testing framework for comparing sync vs async consumer performance across different configurations

Performance Results

Consumer Type Method Batch Size Throughput (msg/s) Performance Ratio
SYNC poll() 1 111,683 1.0x (baseline)
SYNC consume() 1 110,995 1.0x
SYNC consume() 5 110,663 1.0x
SYNC consume() 20 111,017 1.0x
ASYNC poll() 1 16,559 0.15x (7x slower)
ASYNC consume() 1 14,903 0.13x (7.5x slower)
ASYNC consume() 5 65,066 0.58x (1.7x slower)
ASYNC consume() 20 112,532 1.0x (matches sync)

Performance Explanation

The dramatic performance difference stems from AIOConsumer's use of ThreadPoolExecutor to make blocking librdkafka calls async-compatible. For single-message operations (poll() or consume(1)), each message pays the full ThreadPool coordination overhead (~7x slower). However, with larger batch sizes, this overhead is amortized across multiple messages, achieving performance parity with sync consumers at batch_size=20.

Developer Guidance Added

  • High-throughput applications: Use consume() with batch_size >= 20 for optimal async performance
  • Latency-sensitive applications: Use consume() with batch_size=5 for balanced performance (65K msg/s)
  • Avoid: poll() for high-throughput scenarios (7x performance penalty)
  • Sync consumers: Consistently excellent performance regardless of batch size

This PR provides developers with concrete, data-driven guidance for optimizing their Kafka consumer performance based on their specific throughput and latency requirements.

Checklist

  • Contains customer facing changes? Including API/behavior changes
  • Did you add sufficient unit test and/or integration test coverage for this PR?
    • If not, please explain why it is not required

References

JIRA: https://confluentinc.atlassian.net/browse/DGS-22195

Test & Review

Open questions / Follow-ups

@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@sonarqube-confluent

This comment has been minimized.

@sonarqube-confluent

This comment has been minimized.

Copy link
Member

@k-raina k-raina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PR! Added initial review comments.

@@ -0,0 +1,363 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have same file tests/ducktape/benchmark_metrics.py for producer and consumer benchmarks. And in bounds.json we can mention default producer and consumer bounds.

Reason : Benchmarks for both producer and consumer should mostly be same i.e "Latency" "Throghput" "Message processed" etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I created this new file to avoid all the merge conflicts we have to deal with otherwise :). I think let's refactor the test suite (consolidating shared code, renaming files properly, etc) once both of our PRs are merged

@fangnx fangnx marked this pull request as ready for review September 11, 2025 14:30
@fangnx fangnx requested review from MSeal and a team as code owners September 11, 2025 14:30
@fangnx fangnx changed the title WIP: ducktape benchmark tests for consumer (sync + async) Add ducktape benchmark tests for consumer (sync + async) Sep 11, 2025
@sonarqube-confluent
Copy link

Failed

  • 68.00% Coverage on New Code (is less than 80.00%)

Analysis Details

21 Issues

  • Bug 2 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 19 Code Smells

Coverage and Duplications

  • Coverage 68.00% Coverage (64.60% Estimated after merge)
  • Duplications No duplication information (5.20% Estimated after merge)

Project ID: confluent-kafka-python

View in SonarQube

return messages_consumed


class AsyncConsumerStrategy(ConsumerStrategy):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of duplicate code in this and the prior class. Would be nice if we could remove some more of that duplication. Not blocking merge on it though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants