-
Notifications
You must be signed in to change notification settings - Fork 211
Introduce Function Context Feature to TaskVineExecutor #3724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
||
|
|
||
| @require_taskvine | ||
| @pytest.mark.taskvine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this mark here is what lets you specify you don't want to test taskvine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify? I thought @pytest.mark.taskvine specifies that this test is only to be run with TaskVineExecutor, or does it have other meanings?
| @pytest.mark.taskvine | ||
| @pytest.mark.parametrize('num_tasks', (1, 50)) | ||
| def test_function_context_computation(num_tasks, current_config_name): | ||
| if current_config_name != 'taskvine_ex': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to test against a specific configuration, have a look at tests that are @pytest.mark.local and have their own with parsl.load() in them - rather than using some ambient environment we don't expect the feature to work in.
That would be more consistent with existing tests. parsl/tests/test_monitoring/test_basic.py is a complicated example. or parsl/tests/test_htex/test_priority_queue.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing special about the configuration: this test runs with parsl/tests/configs/taskvine_ex.py. This is my way of saying that this test should only be run with the TaskVineExecutor rather than with thread pool, htex, etc. Using only @pytest.mark.taskvine didn't work for me.
| while written < len(serialized_obj): | ||
| written += f_out.write(serialized_obj[written:]) | ||
|
|
||
| def _cloudpickle_serialize_object_to_file(self, path, obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we talked about this somewhere before but I can't remember where: you should be using the parsl serialization libraries not cloudpickle unless you have a specific reason that needs different serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object I serialize is a list containing a function and other Python objects. https://github.com/Parsl/parsl/pull/3724/files#diff-c5ce2bce42f707d31639e986d8fea5c00d31b5eead8fa510f7fe7e3181e67ccfR458-R461
Because it is a list, Parsl serialize uses methods_for_data to serialize it which eventually uses pickle, and this can't serialize a function by value. So I'm using cloudpickle serialization only for this case. What do you think?
| if not lib_installed: | ||
| # Declare and install common library for serverless tasks. | ||
| if task.func_name not in libs_installed: | ||
| # Declare and install one library for serverless tasks per category, and vice versa. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this one library per function, not per category?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, one library containing one function, not per category. I think many functions per library also works in certain cases, but there will be cases where it doesn't work naively, for example functions A and B each load a huge LLM in a GPU, and the node only has one GPU so the library can't host both A and B simultaneously.
|
This runs serverless functions several times faster than current Parsl |
This bypasses the overhead from This also adds some caching of serialization cost as well. |
Description
This PR introduces the function context feature in TaskVine to the TaskVineExecutor. In short, a traditional function can now specify its computational context to be shared across multiple invocations of the same function, allowing drastic improvements in execution performance.
For example, machine learning models, especially LLMs, have a large overhead of model creation to do one inference. Instead of coupling model creation and inferences in the same function, a user now can specify the model creation as the context of the actual inference function, allowing the de-duplication of the model creation cost.
Helpful blog: https://cclnd.blogspot.com/2025/10/reducing-overhead-of-llm-integrated.html.
Tests are added to make sure the feature works as intented.
Changed Behaviour
TaskVineExecutor now has a new feature allowing functions to specify computational contexts to be shared.
Type of change