refactor: reuse the global Ray Pool across distributed operations#5199
Open
LuciferYang wants to merge 1 commit into
Open
refactor: reuse the global Ray Pool across distributed operations#5199LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
compact_files, create_scalar_index, create_index and add_columns each created a fresh Ray Pool, ignoring a Pool configured via init_global_pool/ set_global_pool. Route them through get_or_create_pool so a configured global Pool is reused. Warn when an active global Pool causes per-call ray_remote_args to be ignored, and tolerate processes=None. Update test_map_async_with_pool_closes_and_joins_pool to patch the Pool in lance_ray.pool, since the Pool is now constructed there rather than in lance_ray.index.
0d04692 to
0116642
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
init_global_pool/set_global_poollet you register a process-wide Ray Pool so Lance-Ray operations can share it instead of spinning one up each call, but onlyvector_searchactually reused it.compact_files,create_scalar_index,create_index, andadd_columnseach built their own Pool, so a configured global Pool was ignored and they paid full pool startup every time.This routes those four through the existing
get_or_create_poolhelper. With no global Pool set the behavior is unchanged — a local Pool is created and closed per call; when one is set, it's reused.One wrinkle worth calling out: a reused global Pool is fixed at the size and remote args it was created with, so per-call
num_workers/ray_remote_argsno longer apply. The process-count mismatch was already warned about; this adds a matching warning when per-callray_remote_argsare dropped, and letsget_or_create_poolacceptprocesses=None(whichadd_columnspasses). Clearing the global Pool restores per-call control.