feat(python/sedonadb): Expose memory pool and runtime configuration in Python bindings#608
Conversation
…n Python bindings Add memory_limit, temp_dir, memory_pool_type, and unspillable_reserve_ratio parameters to SedonaContext and connect(), allowing Python users to configure memory pools (greedy/fair) and disk spill settings for out-of-core workloads. Also initializes env_logger in the Python module for debugging support.
There was a problem hiding this comment.
Pull request overview
This PR exposes memory pool and runtime environment configuration to Python users through the SedonaContext and connect() functions. These configuration options were previously only available via the sedona-cli tool (added in PR #599). The changes allow Python users to configure memory limits, disk spill behavior, and choose between greedy and fair memory pool types for out-of-core spatial joins.
Changes:
- Added memory pool configuration parameters (
memory_limit,temp_dir,memory_pool_type,unspillable_reserve_ratio) toSedonaContext.__init__()andconnect()function - Initialized
env_loggerin the Python module for debugging support - Updated
sedona-geo-generic-algto use workspace version oflogdependency
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
python/sedonadb/src/context.rs |
Adds runtime configuration parameters to InternalContext::new() and builds custom RuntimeEnv with memory pool and disk manager |
python/sedonadb/python/sedonadb/context.py |
Adds configuration parameters to SedonaContext.__init__() and connect() with type hints and documentation |
python/sedonadb/src/lib.rs |
Initializes env_logger for debugging support |
python/sedonadb/Cargo.toml |
Adds env_logger dependency |
rust/sedona-geo-generic-alg/Cargo.toml |
Updates log dependency to use workspace version |
Cargo.lock |
Reflects dependency changes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paleolimbot
left a comment
There was a problem hiding this comment.
Just a few high-level notes that I think affect how this is structured:
- We'll need to do this in the ADBC driver, the CLI, in R, and in Python, so if this can be done mostly based on
HashMap<String, String>and implemented in thesedonacrate we can reuse the logic in all four places. We may want aSedonaContextBuildergiven the growing complexity of assembling the pieces. - In Python, the
connect()function that is actually used lives inapache-sedonaand creating aSessionContext()manually is not something anybody should really do. Probably implementing this viasd.options.xxxand deferring the creation of the localInternalContextso that these options can be set before the Rust context is created would work best.
49addaa to
ee07b05
Compare
Introduce SedonaContextBuilder in the sedona crate to centralize runtime env construction (memory pool, disk manager) so it can be reused across CLI, Python, ADBC, and future entry points. In Python, remove memory params from connect()/SedonaContext.__init__() and add runtime config properties to the Options class. The internal context is now lazily created on first query, allowing users to configure options before initialization. Runtime options are frozen once the context is created.
ee07b05 to
70fa523
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Move _runtime_frozen = True after successful InternalContext creation so users can correct options and retry if context init fails - Add type validation to temp_dir setter (accept str/PathLike/None) - Add type validation to unspillable_reserve_ratio setter before range check to produce clear TypeError instead of Python comparison error - Update PR description to reflect options-based configuration API
Add 'from sedonadb.utility import sedona' to _options.py so that doctests using 'sedona.db.connect()' can resolve the name, matching the pattern used in context.py, dataframe.py, and other modules.
1721cb2 to
0bfe556
Compare
Replace direct _runtime_frozen attribute access from context.py with a public freeze_runtime() method on Options.
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for these updates! I'll follow up with the ADBC and R interfaces for setting the memory limit.
I re-ran CI (I think the previous failure was transient)
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Summary
SedonaContextBuilderin thesedonaRust crate to centralize runtime environment construction (memory pool, disk manager) so it can be reused across CLI, Python, ADBC, and future bindings.memory_limit,temp_dir,memory_pool_type,unspillable_reserve_ratio) to the PythonOptionsclass, allowing Python users to configure memory pools (greedy/fair) and disk spill settings for out-of-core workloads.sd.optionsbefore execution begins.InternalContext::new()to accept a singleHashMap<String, String>, delegating toSedonaContextBuilder::from_options.size_parsermodule somemory_limitaccepts human-readable strings (e.g.,"4gb","512m") across CLI and Python.env_loggerin the Python module for debugging support.logdependency insedona-geo-generic-algto workspace version.Motivation
The memory pool and runtime environment configuration added in #599 was only accessible via
sedona-cli. This PR exposes the same configuration through the Python bindings so that Python users can control memory limits and spill behavior when running out-of-core spatial joins.The
SedonaContextBuildercentralizes pool/runtime construction so it can be shared across CLI, Python, ADBC, and future R bindings without duplicating logic.Example usage