Skip to content

feat(python/sedonadb): Expose memory pool and runtime configuration in Python bindings#608

Merged
paleolimbot merged 7 commits intoapache:mainfrom
Kontinuation:dev-python-memory-pool-config
Feb 18, 2026
Merged

feat(python/sedonadb): Expose memory pool and runtime configuration in Python bindings#608
paleolimbot merged 7 commits intoapache:mainfrom
Kontinuation:dev-python-memory-pool-config

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Feb 13, 2026

Summary

  • Add SedonaContextBuilder in the sedona Rust crate to centralize runtime environment construction (memory pool, disk manager) so it can be reused across CLI, Python, ADBC, and future bindings.
  • Add runtime configuration properties (memory_limit, temp_dir, memory_pool_type, unspillable_reserve_ratio) to the Python Options class, allowing Python users to configure memory pools (greedy/fair) and disk spill settings for out-of-core workloads.
  • Implement deferred context initialization in Python: the internal Rust context is created lazily on first query, so users can set runtime options via sd.options before execution begins.
  • Simplify InternalContext::new() to accept a single HashMap<String, String>, delegating to SedonaContextBuilder::from_options.
  • Add shared size_parser module so memory_limit accepts human-readable strings (e.g., "4gb", "512m") across CLI and Python.
  • Initialize env_logger in the Python module for debugging support.
  • Minor cleanup: switch log dependency in sedona-geo-generic-alg to workspace version.

Motivation

The memory pool and runtime environment configuration added in #599 was only accessible via sedona-cli. This PR exposes the same configuration through the Python bindings so that Python users can control memory limits and spill behavior when running out-of-core spatial joins.

The SedonaContextBuilder centralizes pool/runtime construction so it can be shared across CLI, Python, ADBC, and future R bindings without duplicating logic.

Example usage

import sedona.db

sd = sedona.db.connect()
sd.options.memory_limit = "4gb"
sd.options.temp_dir = "/tmp/sedona-spill"
sd.options.memory_pool_type = "fair"
sd.options.unspillable_reserve_ratio = 0.2

# Options are frozen once the first query triggers context creation
df = sd.sql("SELECT ST_Point(1.0, 2.0)")

…n Python bindings

Add memory_limit, temp_dir, memory_pool_type, and unspillable_reserve_ratio
parameters to SedonaContext and connect(), allowing Python users to configure
memory pools (greedy/fair) and disk spill settings for out-of-core workloads.
Also initializes env_logger in the Python module for debugging support.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes memory pool and runtime environment configuration to Python users through the SedonaContext and connect() functions. These configuration options were previously only available via the sedona-cli tool (added in PR #599). The changes allow Python users to configure memory limits, disk spill behavior, and choose between greedy and fair memory pool types for out-of-core spatial joins.

Changes:

  • Added memory pool configuration parameters (memory_limit, temp_dir, memory_pool_type, unspillable_reserve_ratio) to SedonaContext.__init__() and connect() function
  • Initialized env_logger in the Python module for debugging support
  • Updated sedona-geo-generic-alg to use workspace version of log dependency

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
python/sedonadb/src/context.rs Adds runtime configuration parameters to InternalContext::new() and builds custom RuntimeEnv with memory pool and disk manager
python/sedonadb/python/sedonadb/context.py Adds configuration parameters to SedonaContext.__init__() and connect() with type hints and documentation
python/sedonadb/src/lib.rs Initializes env_logger for debugging support
python/sedonadb/Cargo.toml Adds env_logger dependency
rust/sedona-geo-generic-alg/Cargo.toml Updates log dependency to use workspace version
Cargo.lock Reflects dependency changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few high-level notes that I think affect how this is structured:

  • We'll need to do this in the ADBC driver, the CLI, in R, and in Python, so if this can be done mostly based on HashMap<String, String> and implemented in the sedona crate we can reuse the logic in all four places. We may want a SedonaContextBuilder given the growing complexity of assembling the pieces.
  • In Python, the connect() function that is actually used lives in apache-sedona and creating a SessionContext() manually is not something anybody should really do. Probably implementing this via sd.options.xxx and deferring the creation of the local InternalContext so that these options can be set before the Rust context is created would work best.

@Kontinuation Kontinuation force-pushed the dev-python-memory-pool-config branch from 49addaa to ee07b05 Compare February 17, 2026 02:55
Introduce SedonaContextBuilder in the sedona crate to centralize runtime
env construction (memory pool, disk manager) so it can be reused across
CLI, Python, ADBC, and future entry points.

In Python, remove memory params from connect()/SedonaContext.__init__()
and add runtime config properties to the Options class. The internal
context is now lazily created on first query, allowing users to configure
options before initialization. Runtime options are frozen once the
context is created.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Move _runtime_frozen = True after successful InternalContext creation
  so users can correct options and retry if context init fails
- Add type validation to temp_dir setter (accept str/PathLike/None)
- Add type validation to unspillable_reserve_ratio setter before range
  check to produce clear TypeError instead of Python comparison error
- Update PR description to reflect options-based configuration API
Add 'from sedonadb.utility import sedona' to _options.py so that
doctests using 'sedona.db.connect()' can resolve the name, matching
the pattern used in context.py, dataframe.py, and other modules.
@Kontinuation Kontinuation force-pushed the dev-python-memory-pool-config branch from 1721cb2 to 0bfe556 Compare February 17, 2026 13:15
Replace direct _runtime_frozen attribute access from context.py with
a public freeze_runtime() method on Options.
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these updates! I'll follow up with the ADBC and R interfaces for setting the memory limit.

I re-ran CI (I think the previous failure was transient)

Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
@Kontinuation Kontinuation marked this pull request as ready for review February 18, 2026 03:03
@paleolimbot paleolimbot merged commit a64e3a6 into apache:main Feb 18, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments