-
Notifications
You must be signed in to change notification settings - Fork 930
Update AsyncIO producer architecture to improve performance #2044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: async
Are you sure you want to change the base?
Conversation
🎉 All Contributor License Agreements have been signed. Ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the future resolving contention was a root cause of perf, but moving the producer to direct call being viable is a better solution so long as we don't hit any periodic latency issues with the queue insertion to librdkafka locking it's thread.
# Call user's callback on successful delivery | ||
if user_callback: | ||
try: | ||
user_callback(err, msg) # err is None here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do the check here for if it's a sync or async function with asyncio.iscoroutinefunction(func)
and await if it's true. Since the parent context is async it's likely the user may pass a async callback without thinking much about it.
try: | ||
user_callback(err, msg) # err is None here | ||
except Exception: | ||
# Log but don't propagate user callback errors to avoid breaking delivery confirmation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't swallow / catch general exceptions. This leads to silent failures and can catch system signals that should terminate. If the user really wants to each eat exceptions they can within the function they pass through.
tests/ducktape/producer_strategy.py
Outdated
def create_producer(self): | ||
producer = Producer({'bootstrap.servers': self.bootstrap_servers}) | ||
def create_producer(self, config_overrides=None): | ||
config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun trick, you can do ** expansion to inline update dicts
{'bootstrap.servers': self.bootstrap_servers, **(config_overrides or {})}
"""Return final metrics summary for the sync producer""" | ||
if self.metrics: | ||
return self.metrics | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically not needed since no return defaults to None return but ok to be explicit
Number of callbacks processed during this call | ||
""" | ||
if timeout > 0 or timeout == -1: | ||
# Blocking call - use ThreadPool to avoid blocking event loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you say the auto poll was blocking the event loop before this change? the wait is executed in one of the threads of the ThreadPoolExecutor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He is referencing that the future assignment was getting blocked by poll because it was awaiting on that future instead of returning it. I had suggested trying returning the future without awaiting but I think he was still hitting contention. We should talk through that more
pass | ||
|
||
kwargs['on_delivery'] = on_delivery | ||
self._producer.produce(topic, value, key, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The producer.produce
isn't exactly non-blocking, if everything goes well it is, but in rd_kafka_producev
there are locks, rd_kafka_wrlock(rk);
, rd_kafka_topic_rdlock(rkt);
, mtx_lock(&rk->rk_curr_msgs.lock);
a custom partitioner (that at the moment cannot be used in Python because of a deadlock to solve in librdkafka), some interceptors rd_kafka_interceptors_on_send
, rd_kafka_interceptors_on_acknowledgement
that` could have any code and be used later for tracing.
We can see if it's possible to implement rd_kafka_produce_batch
in Python so if we have N futures still not completed we accumulate to a list and then when one of them finishes send the list of messages (through the ThreadPoolExecutor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think introducing batch-produce at the Python client layer (my understanding is that librdkafka already batches messages internally) can drastically improve the async produce() performance, similar to async consume() w/ batching, as the ThreadPool overhead will be amortized?
n = (Py_ssize_t)rd_kafka_consume_batch_queue(rkqu, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, as happened in the consume
benchmark. Just in the consumer we cannot implement a single message poll with a larger consume call because when we call consume
the full batch can be auto stored, we would need to use manual offset management and we don't want to do complex things here, just follow existing API, so we can just tell users to use the consume
call for high throughput.
For the produce
call instead we can accumulate the calls coming from produce and when finished sending a previous batch we send the next one. This will change a bit the ordering of messages but in the context of concurrent production we aren't ensuring it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make an async batch_produce
that just sends the batch with a single future to resolve does perf catch up to sync produce times? If so we can narrow down if it's future generation or something else in the logic that's slowing it down. If not then we need to look at why batch context swapping isn't working. All of my comments where on correctness or improvements but I don't think any will dramatically affect perf -- maybe the asyncio lock acquisition is affecting things but I doubt it.
…r class - Create new AsyncCallbackHandler class to handle sync/async user callbacks - Move callback execution logic from AIOProducer._handle_user_callback - Improve separation of concerns and testability - Add comprehensive tests for AsyncCallbackHandler - No breaking changes to public API - All existing tests pass This is Step 1 of the architecture refactoring to improve code organization and reduce coupling between components.
…dependence - Update CallbackPool constructor to accept AsyncCallbackHandler instead of producer - Remove producer dependency from ReusableMessageCallback - Update ProducerBatchProcessor to pass callback handler to CallbackPool - Remove aio_producer parameter from _assign_callbacks_to_messages method - Update AIOProducer to inject callback handler into batch processor - Fix all test cases to use new method signatures - Improve separation of concerns and reduce coupling - All tests pass with no breaking changes to public API This is Step 2 of the architecture refactoring to eliminate circular dependencies and improve component independence.
- Create new KafkaBatchExecutor class for Kafka batch operations - Move _execute_batch logic from ProducerBatchProcessor to KafkaBatchExecutor - Update ProducerBatchProcessor to use injected KafkaBatchExecutor - Update AIOProducer to create and inject KafkaBatchExecutor - Improve separation of concerns: BatchProcessor handles batching, KafkaExecutor handles Kafka - Add comprehensive test suite for KafkaBatchExecutor with 6 test cases - Update existing tests to work with new architecture - All tests pass with no breaking changes to public API This is Step 3 of the architecture refactoring to achieve single responsibility and better separation between batch processing and Kafka operations.
…ue independence - Remove aio_producer parameter from flush_buffer() method - Remove aio_producer parameter from _handle_batch_failure() method - Use injected AsyncCallbackHandler instead of AIOProducer._handle_user_callback - Update AIOProducer._flush_buffer() to call BatchProcessor without self reference - Fix all test cases to use new method signatures - Achieve true component independence - no circular dependencies - All tests pass with no breaking changes to public API This is Step 4 of the architecture refactoring - the critical step that eliminates circular dependencies and achieves clean unidirectional component relationships. Architecture now: AIOProducer → [AsyncCallbackHandler, KafkaBatchExecutor] → BatchProcessor
… interfaces Step 5 - Extract BufferManager: - Create dedicated BufferManager class for timeout handling - Move timeout logic out of AIOProducer into focused component - Remove 78 lines of timeout code from AIOProducer - Clean separation between producer lifecycle and timeout management Step 6 - Introduce MessageBatch Value Object: - Create immutable MessageBatch namedtuple for better data modeling - Add create_batches() method to ProducerBatchProcessor - Update BufferManager to use proper MessageBatch objects - Improve type safety and data consistency Step 7 - Clean Up AIOProducer Interface: - Simplify AIOProducer to pure orchestration layer - Add create_batches_preview() method demonstrating clean architecture - Update documentation to reflect new component relationships - AIOProducer now focuses solely on API and coordination Architecture Benefits: ✅ Single Responsibility: Each component has one clear purpose ✅ Clean Interfaces: Well-defined boundaries between components ✅ Immutable Data: MessageBatch objects prevent accidental mutations ✅ Better Testing: Components can be tested independently ✅ Maintainable: Clear separation makes changes safer All tests pass with no breaking changes to public API.
- Rename class from BufferManager to BufferTimeoutManager - Rename file from _buffer_manager.py to _buffer_timeout_manager.py - Update all references in AIOProducer to use new name - Update variable names from _buffer_manager to _buffer_timeout_manager - Enhance documentation to emphasize timeout-specific responsibility Benefits: ✅ More descriptive name clearly indicates timeout management purpose ✅ Eliminates ambiguity about what kind of 'buffer management' this does ✅ Follows naming convention of being specific about component responsibility ✅ Easier to understand the architecture at a glance The name 'BufferTimeoutManager' immediately tells developers this component handles timeout logic for message buffers, preventing messages from being held indefinitely without being flushed. All tests pass with no functional changes.
…r/ module - Create new src/confluent_kafka/aio/producer/ directory structure - Move all 7 producer-related files into organized module: * _AIOProducer.py (main producer API) * _producer_batch_processor.py (batching logic) * _kafka_batch_executor.py (Kafka operations) * _buffer_timeout_manager.py (timeout management) * _callback_handler.py (callback execution) * _callback_pool.py (performance pooling) * _message_batch.py (value objects) - Add comprehensive producer/__init__.py with module documentation - Update all internal import paths to new structure - Update main aio/__init__.py to import from producer module - Update all test files with new import paths - Fix: convert tuple to list for produce_batch in KafkaBatchExecutor - Maintain backward compatibility - public API unchanged Benefits: ✅ Better Organization: Related components grouped logically ✅ Clear Module Boundaries: Producer functionality isolated ✅ Scalable Structure: Easy to add consumer/ or other modules ✅ Professional Layout: Follows Python packaging best practices ✅ Maintainable: Easier to navigate and understand codebase Directory Structure: src/confluent_kafka/aio/ ├── _AIOConsumer.py ├── _common.py ├── __init__.py └── producer/ ├── __init__.py ├── _AIOProducer.py ├── _producer_batch_processor.py ├── _kafka_batch_executor.py ├── _buffer_timeout_manager.py ├── _callback_handler.py ├── _callback_pool.py └── _message_batch.py All tests pass with no breaking changes to public API.
…llbackManager Consolidate callback handling into a single cohesive component that combines both callback execution and performance pooling responsibilities. Changes: - Create new CallbackManager class that unifies: * AsyncCallbackHandler functionality (sync/async callback execution) * CallbackPool functionality (object pooling for performance) - Update ReusableMessageCallback to work with unified manager - Remove separate _callback_handler.py and _callback_pool.py files - Update all components to use CallbackManager: * AIOProducer: Use CallbackManager instead of separate components * ProducerBatchProcessor: Simplified constructor and method calls - Update all test files to test unified CallbackManager - Rename test classes and methods to reflect new unified component Benefits: ✅ Simpler Architecture: 6 components instead of 7 (reduced complexity) ✅ Single Responsibility: One component handles all callback concerns ✅ Cleaner Interfaces: Eliminated dependency injection between pool and handler ✅ Better Cohesion: Related functionality grouped together ✅ Easier Testing: Single component to test instead of two interacting ones ✅ Reduced Coupling: Fewer dependencies between components Architecture Impact: - Reduces total component count from 8 to 7 components - Maintains all existing functionality - No breaking changes to public API - All tests pass with updated unified component
Rename core batch processing classes to better reflect their responsibilities: Class Name Changes: • ProducerBatchProcessor → ProducerBatchManager - Better reflects its role in managing batching logic - Handles message buffering, topic grouping, and batch organization • KafkaBatchExecutor → ProducerBatchExecutor - More consistent with Producer naming convention - Executes Kafka batch operations via thread pool Updated Components: ✅ Core class definitions in _producer_batch_processor.py and _kafka_batch_executor.py ✅ All import statements across the codebase ✅ Class instantiations in AIOProducer and BufferTimeoutManager ✅ Test files and class references ✅ Method calls and documentation Benefits: • Clearer naming that reflects actual responsibilities • More consistent with overall Producer naming convention • Easier to understand the role of each component • Maintains all existing functionality and interfaces All tests pass with the new class names.
Remove duplicate test directory structure to avoid confusion and maintenance overhead. Removed: • tests/aio/producer/ - duplicate test directory • tests/aio/ - now empty parent directory Kept: • tests/test_*producer*.py - original test files in main tests directory • tests/test_*callback*.py - original callback test files • tests/test_*kafka_batch*.py - original executor test files • tests/integration/producer/ - integration test directory Benefits: ✅ Single source of truth for test files ✅ Eliminates duplicate maintenance ✅ Cleaner test directory structure ✅ Consistent with existing test organization
Remove accidentally committed test result files and prevent future commits: Removed Files: • ducktape_test_results_merged.log - Ducktape test execution logs • final_merged_test_results.log - Test result compilation logs • final_test_results.log - Test execution output logs • real_metrics_test_results.log - Performance metrics logs • ducktape_metrics.log - Ducktape metrics output Updated .gitignore: • Added .ducktape/ - Ducktape metadata directory • Added results/ - Test result output directory Benefits: ✅ Cleaner repository without test artifacts ✅ Prevents future accidental commits of test results ✅ Follows best practices for excluding build/test outputs ✅ Reduces repository size and noise
Overview
This comprehensive PR introduces high-performance AsyncIO producer capabilities with advanced message batching, callback pooling, modular architecture, and extensive testing framework. The changes include significant refactoring for better separation of concerns, performance optimizations through object reuse, and robust per-message callback handling that works correctly even with out-of-order delivery reports.
High Level Architecture
Class Diagram
Changes Made
Core Architecture Improvements
1. AIOProducer Modular Redesign (
src/confluent_kafka/aio/_AIOProducer.py
)🏗️ Separation of Concerns:
ProducerBatchProcessor
classCallbackPool
class⚡ Performance Optimizations:
poll()
method always delegates to_call()
for consistent ThreadPool usagebuffer_timeout/2
with sensible bounds (0.1s-1.0s)🔒 Robust Shutdown Handling:
close()
and__del__()
ensure proper cleanup2. Per-Message Callback System (
src/confluent_kafka/aio/_callback_pool.py
)🎯 Perfect Message-to-Future Mapping:
asyncio.Future
♻️ Object Reuse Optimization:
3. Batch Processing Engine (
src/confluent_kafka/aio/_producer_batch_processor.py
)📦 Intelligent Message Batching:
⚡ Optimized Batch Execution:
produce_batch()
+poll(0)
executed together in ThreadPoolTesting Framework Enhancements
4. Comprehensive Unit Testing
🧪 Callback Pool Testing (
tests/test_callback_pool.py
)🧪 Batch Processor Testing (
tests/test_producer_batch_processor.py
)📊 Producer Strategy Integration (
tests/ducktape/producer_strategy.py
)Key Technical Improvements
1. Eliminated Callback Ordering Dependencies
Before (Fragile):
After (Robust):
2. Modular Architecture Benefits
_AIOProducer.py
: 433 lines - Core async operations and lifecycle_callback_pool.py
: 135 lines - Callback management and pooling_producer_batch_processor.py
: 269 lines - Batch processing logic3. Performance Optimizations
Callback Pooling Impact:
Batch Processing Efficiency:
produce_batch()
call per topic instead of per messageUsage Example
Performance Comparison Async vs Sync
Based on ducktape test execution on 2025-09-18 with identical configurations and 5-second test duration.
Data Flow
EDGE CASES