Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskVine: Fix library deployment method #3560

Merged
merged 9 commits into from
Dec 5, 2023

Conversation

tphung3
Copy link
Contributor

@tphung3 tphung3 commented Oct 26, 2023

Proposed changes

Before this PR the manager tries to send a library to all known workers, which takes a long time and breaks the logic of the internal wait loop. Now the manager sends the library as necessary only.

Post-change actions

Put an 'x' in the boxes that describe post-change actions that you have done.
The more 'x' ticked, the faster your changes are accepted by maintainers.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update Did you update the manual to reflect your changes, if appropriate? This action should be done after your changes are approved but not merged.
  • Type Labels Select github labels for the type of this change: bug, enhancement, etc.
  • Product Labels Select github labels for the product affected: TaskVine, Makeflow, etc.
  • PR RTM Mark your PR as ready to merge.

@tphung3 tphung3 added TaskVine bug For modifications that fix a flaw in the code. labels Oct 26, 2023
@dthain
Copy link
Member

dthain commented Oct 26, 2023

Could you summarize the principle of operation here?
Previously, the idea was to make library deployment the lowest priority activity, and then to only deploy one library per worker.
I'm sure one-library-per-worker is no longer what we want.
But not sure if we want to increase the priority of deploying libraries.

@tphung3
Copy link
Contributor Author

tphung3 commented Oct 26, 2023

The idea here is to piggyback the library with a function call to send to the worker. Instead of sending all libraries to all known workers at once, we send one function call to one worker as usual, and if that worker doesn't already have the library running, then we send the library (lazily).

@dthain
Copy link
Member

dthain commented Oct 26, 2023

Per our conversation, please do some big runs with this new configuration (large number of tasks and workers) and plot the results to make sure it has the desired effect...

@dthain
Copy link
Member

dthain commented Oct 27, 2023

Another fundamental problem that I see here is that we are not checking for the compatibility of the library task with the worker, nor are we checking to see if the worker has the appropriate resources available for the library. For example, perhaps the library needs a GPU.

We need a scheme by which library tasks are scheduled to workers in a "mostly normal" way so that we are checking all of the properties that are important to the task.

@tphung3
Copy link
Contributor Author

tphung3 commented Oct 27, 2023

A quick experiment shows that the proposed way delivers minimal improvements, even though it is technically more sound than the previous way of delivering libraries to workers. This might be because the scenario when a manager spends a lot of time to send libraries to a large number (100s) of workers isn't real in practice: instead the manager sends to small chunks of workers (in 20-30) as they arrive. Some numbers are provided below to back up this claim. Note that the experiment setup is exactly the same, except the way to deliver libraries:

With 20,000 function calls, each running for 5 seconds, and 100 16-core workers, each can run 16 concurrent function tasks:

  • old: 105.88 seconds
  • new: 104.21 seconds

Same set of function calls, and 400 4-core workers, each can run 4 concurrent function tasks:

  • old: 178.83 seconds
  • new: 175.46 seconds

On the second thought, if a user has rounds of experiments within one workflow, which installs one library then installs another library after 500 workers are connected, then the old way of sending libraries will clearly block the main wait loop. Maybe another experiment can confirm this.

Tagging relevant people @dthain @btovar.

vine_task_copy(hash_table_lookup(q->libraries, t->needs_library)));
debug(D_VINE, "Sending library %s to worker %s\n", t->needs_library, w->workerid);
} else {
debug(D_VINE, "Cannot send library %s to worker %s\n", t->needs_library, w->workerid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug could be misleading b/c vine_manager_send_library_to_worker may return zero if the library simply does not fit the worker, or is incompatible. In short, it's not really a failure, it's just the manager realizing (late in the game) that it doesn't make sense to schedule this task here.

@@ -2723,6 +2728,11 @@ static void commit_task_to_worker(struct vine_manager *q, struct vine_worker_inf
if (result != VINE_SUCCESS) {
debug(D_VINE, "Failed to send task %d to worker %s (%s).", t->task_id, w->hostname, w->addrport);
handle_failure(q, w, t, result);
} else {
if (t->needs_library) {
struct vine_task *library_task = hash_table_lookup(w->libraries, t->needs_library);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use vine_manager_find_library_on_worker to find one of the running libraries on the worker with an available slot, and then store it in t->library_task so that you can correctly un-do the binding when the task ends.

Copy link
Member

@dthain dthain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this approach can work, but it is a mistake to remove the t->library_task pointer. The reason for this pointer is to associate a function call task with one specific library task on that host during scheduling. (This is because a host could run more than one library task at once.). Then the t->library_task pointer is used to perform the needed reference counting on the matched library.

Second, I'm not sure that is necessary to introduce the w->libraries table since the library task is already represented in w->current_tasks. And every table of aliases becomes a new complication to keep track of. You can use the existing vine_manager_find_library_on_worker to find a suitable library, and then keep that binding in t->library_task

@dthain
Copy link
Member

dthain commented Nov 2, 2023

Per our recent discussions, please fix up this PR as follows:

1 - Libraries should consume resources in the same way as normal tasks, using the same routines. (Use explicit resources if given, otherwise fill the worker.)
2 - It is allowed to have multiple libraries per worker, so function call tasks should continue to track what library they are connected to via t->library_task
3 - Libraries should be deployed when needed by function calls, and then removed when the need no longer exists.

I think this can be done with relatively minor changes. And then let's focus on getting this working with Parsl applications at large scale, and evaluate the performance. (And probably there will be some bugs to fix along the way.)

@tphung3 tphung3 requested a review from dthain November 3, 2023 20:23
Copy link
Member

@dthain dthain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think you have convinced me that this is a workable idea. Since it is core to the scheduling loop, please make a few changes to clarify for the next reader.

taskvine/src/manager/vine_manager.c Outdated Show resolved Hide resolved
taskvine/src/manager/vine_manager.c Outdated Show resolved Hide resolved
taskvine/src/manager/vine_manager.c Outdated Show resolved Hide resolved
taskvine/src/manager/vine_manager.c Outdated Show resolved Hide resolved
taskvine/src/manager/vine_schedule.c Show resolved Hide resolved
taskvine/src/manager/vine_schedule.c Show resolved Hide resolved
@dthain
Copy link
Member

dthain commented Nov 6, 2023

Looks like the OSX build just timed out and preahps will succeed with a simple retry.

@tphung3
Copy link
Contributor Author

tphung3 commented Nov 6, 2023

It's the second time the OSX build fails. Can this be related to the recent OSX build changes @dthain?

@dthain
Copy link
Member

dthain commented Nov 7, 2023

Please rebase on master to get a workaround for the OSX build.

@dthain
Copy link
Member

dthain commented Dec 5, 2023

RTM?

@tphung3
Copy link
Contributor Author

tphung3 commented Dec 5, 2023

Yes it is

@dthain dthain merged commit f0248ca into cooperative-computing-lab:master Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For modifications that fix a flaw in the code. TaskVine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants