-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vine: check worker has committable resources #4039
vine: check worker has committable resources #4039
Conversation
Looks good. Are you still testing it? |
Yes, I am going to do some large tests to make sure they work well. |
taskvine/src/manager/vine_schedule.c
Outdated
* @param resource @vine_resources.h:struct vine_resources | ||
* @return 1 if the resource type is defined and can be allocated to a task, 0 otherwise. | ||
*/ | ||
static int is_resource_committable(struct vine_manager *q, struct vine_resource resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are unnecessarily copying the struct vine_resource
data. In this case, I think it will be clearer if you do the resource.total > 0 && resource.inuse < overcommitted_resource_total(q, resource.total);
per resource inside check_worker_have_committable_resources
.
taskvine/src/manager/vine_schedule.c
Outdated
/* If any slots are committable */ | ||
uint64_t task_id; | ||
struct vine_task *t; | ||
ITABLE_ITERATE(w->current_libraries, task_id, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe first check that itable_size > 0
before setting the iteration?
ready to merge? |
RTM |
* init * move empty libraries substract * rename to is_resource_fully_allocated * handle has_cores && has_gpus * check libraries * simplify the code * remove from the data structure * revert * remove is_resource_committable
Proposed Changes
An improved version suggested by @btovar in #4037
The issue is that even though most allocations work fine in #4035 and tasks run pretty densely, the manager still wastes a lot of time checking tasks when there aren’t any usable cores.
Task scheduling seems to be an expensive part of the main loop, we could save a lot of time by making this more efficient, and then the manager has more time to do other jobs like output retrieving and temp file replication.
The most common issue in
check_worker_against_task
is that the worker doesn’t have a free core. But it still goes through the whole resource allocation process, and this keeps happening for a bunch of tasks each time, based on the depth set byq->attempt_schedule_depth
.This PR adds a tag for workers to note whether they have any available resources. This allows the checking process to terminate earlier.
Merge Checklist
The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.