Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: fast initializer for conda environments #2226

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

saikonen
Copy link
Collaborator

@saikonen saikonen commented Jan 24, 2025

adds a feature flag to switch between regular micromamba and a custom binary for bootstrapping of environments when executing remotely.

also adds deferring of micromamba install until the binary location is actually requested.

@saikonen saikonen requested a review from savingoyal January 24, 2025 18:54
'DISABLE_TRACING=True python -m metaflow.plugins.pypi.bootstrap "%s" %s "%s" linux-64'
% (self.flow.name, id_, self.datastore_type),
'DISABLE_TRACING=True python -m metaflow.plugins.pypi.%s "%s" %s "%s" linux-64'
% (bootstrap_module_name, self.flow.name, id_, self.datastore_type),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can we maintain the singular - metaflow.plugins.pypi.bootstrap module and change the bootstrapping logic within it? easier to benchmark changes that way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -88,6 +58,9 @@ def install_fast_initializer(architecture):
return fast_initializer_path

# TODO: take architecture into account
url = CONDA_FAST_INIT_BIN_URL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, does the TODO above still make sense? Or are we going to handle the architecture at a different level, by setting CONDA_FAST_INIT_BIN_URL to an architecture-specific url?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. The client needs to have knowledge of the target architecture either way as it is the one doing the solving for packages. The client also provides the architecture input for the bootstrap script, so we might as well keep the URL defining logic completely client-side.

@saikonen saikonen marked this pull request as ready for review January 27, 2025 10:34
@@ -433,6 +433,12 @@
# should result in an appreciable speedup in flow environment initialization.
CONDA_DEPENDENCY_RESOLVER = from_conf("CONDA_DEPENDENCY_RESOLVER", "conda")

# Conda Fast init binary url
CONDA_FAST_INIT_BIN_URL = from_conf("CONDA_FAST_INIT_BINARY_URL", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this config and introduce the path to actual binary in the install_fast_initializer method instead? also, if you can note some latency improvements in the PR, that will be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants