vine: add current_libraries to the worker's data structure #4046

JinZhou5042 · 2025-01-28T22:23:56Z

There many places that may need to get the running libraries on a worker and perform operations on them, the current way to do it is to traverse all running tasks to select libraries.

For example, in kill_empty_libraries_on_worker, we kill unused libraries on a worker to reclaim resources:

static void kill_empty_libraries_on_worker(struct vine_manager *q, struct vine_worker_info *w, struct vine_task *t)
{
	uint64_t task_id;
	struct vine_task *task;
	ITABLE_ITERATE(w->current_tasks, task_id, task)
	{
		if (task->provides_library && task->function_slots_inuse == 0) {
			vine_cancel_by_task_id(q, task->task_id);
		}
	}
}

In check_worker_have_enough_resources, we substract the inuse resources from libraries that are not running any functions at all:

uint64_t task_id;
struct vine_task *ti;
ITABLE_ITERATE(w->current_tasks, task_id, ti)
{
	if (ti->provides_library && ti->function_slots_inuse == 0) {
		worker_net_resources->disk.inuse -= ti->current_resource_box->disk;
		worker_net_resources->cores.inuse -= ti->current_resource_box->cores;
		worker_net_resources->memory.inuse -= ti->current_resource_box->memory;
		worker_net_resources->gpus.inuse -= ti->current_resource_box->gpus;
	}
}

On function scheduling, we can terminate early by identifying if there are any free slots on any library on the worker.

Does it make sense to add a current_libraries to the worker's data structure, so that we don't spend time on traversing non-library related tasks? As it would be as simple as calling itable_insert(w->current_libraries, t->task_id, t); on committing and itable_remove(w->current_libraries, t->task_id); on reaping, but would bring a lot of convenience to those operations.

The text was updated successfully, but these errors were encountered:

btovar · 2025-01-29T13:03:02Z

It makes sense to me. Let's evaluate with a pr as changes to the code should be small.

dthain · 2025-01-29T13:34:12Z

Let's keep in mind the expected orders of magnitude in each data structure:

The manager may have millions of tasks overall in q->tasks.
The manager may have millions of tasks in q->ready list
The manager may have thousands of running tasks in q->running-table
The manager may have hundreds of workers in w->worker_table
Each worker may have a handful of ready/running tasks in w->current_tasks

Because of the sheer number of tasks in q->tasks, there is a lot gained by segregating the tasks by state into q->ready_list and q->running_table, even though that adds complexity.

But if there are only a handful of items at any given time in w->current_tasks, I'm not sure that we gain a lot by dividing it further into several data structures.

Is there some other consideration?

JinZhou5042 · 2025-01-29T14:53:51Z

@dthain One benefit I could see is that when there are hundreds of running tasks, on task scheduling, send_one_task will try to consider a depth of tasks (100) until one is runnable, select_worker_by_files will typically traverse all workers to find the best one, and check_worker_have_enough_resources will traverse every task to substract resources used by empty libraries. That way, in the worst case, we end up with traversing 100*10000 tasks which might be expensive. But if we are able to directly access the running libraries on each worker, the number of traversing would be reduced by 99%.

Also, it provides with us a way to keep track of all the running libraries among all workers, by traversing each worker and get the running libraries on that worker, that saves time in that workers without libraries can be passed directly.

JinZhou5042 added enhancement TaskVine labels Jan 28, 2025

JinZhou5042 mentioned this issue Jan 29, 2025

vine: track running libraries on the worker #4047

Open

7 tasks

JinZhou5042 self-assigned this Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vine: add current_libraries to the worker's data structure #4046

vine: add current_libraries to the worker's data structure #4046

JinZhou5042 commented Jan 28, 2025 •

edited

Loading

btovar commented Jan 29, 2025

dthain commented Jan 29, 2025

JinZhou5042 commented Jan 29, 2025

vine: add current_libraries to the worker's data structure #4046

vine: add current_libraries to the worker's data structure #4046

Comments

JinZhou5042 commented Jan 28, 2025 • edited Loading

btovar commented Jan 29, 2025

dthain commented Jan 29, 2025

JinZhou5042 commented Jan 29, 2025

JinZhou5042 commented Jan 28, 2025 •

edited

Loading