-
Notifications
You must be signed in to change notification settings - Fork 40
Add resident device change call #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d53a23f
to
9d7e12c
Compare
include/umf/memory_provider_ops.h
Outdated
/// @brief Adds or removes devices on which allocations should be made | ||
/// resident. | ||
/// @param provider handle to the memory provider | ||
/// @param device_index identifier of device | ||
/// @param is_adding Boolean indicating if peer is to be removed or added | ||
/// @return UMF_RESULT_SUCCESS on success or appropriate error code on | ||
/// failure. | ||
umf_result_t (*ext_resident_device_change)(void *provider, | ||
uint32_t device_index, | ||
bool is_adding); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not be in OPS. This is something very specific to L0 provider. Resident devices are passed to L0 provider thru provider specific params, so control of them should be olso be done through provider specific API, this is why i think with should be implemented through CTL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rejected. We discussed this in teams (4th Aug, 25) we do it as ops (Piotr, sorry in Polish): co do tego API co Ty robisz (w stylu daj mi wszystkie zaalokowane page'e) ja bym chyba sugerował zrobić API a nie robić przez CTLa, Rafał: no to faktycznie chyba dedykowane API lepiej pasuje niż CTL.
CTL should be for statistics, not a universal, hard-to-read tool to implement any API, Łukasz (me): Być może będziecie chcieli uprościć ext_ctl by nie był maszynką, którą mozna zaimplementować wszystko, a jedynie służył do statystyk. Ale to już ja się na tym nie znam - zostawiam do przemyśleń.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about API to iterate over all allocations, not about API to modify some internal settings of the L0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see it both ways. My comment during that discussion was about a generic functionality implementable for all providers. This isn't that.
I also remember making a point about CTL being useful for provider/pool-specific functionality.
On the other hand, adding an API function is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, adding an API function is simpler.
Thank you for the comment. Chosen simpler way.
/// @brief Set the resident devices in the parameters struct. | ||
/// @param hParams handle to the parameters of the Level Zero Memory Provider. | ||
/// @param hDevices array of devices for which the memory should be made resident. | ||
/// @param deviceCount number of devices for which the memory should be made resident. | ||
/// @param hDevices array of all devices for which the memory can be made resident. | ||
/// @param deviceCount number of devices for which the memory can be made resident. | ||
/// @param residentDevicesIndices array of indices in all devices array to devices for which the memory should be made resident. | ||
/// @param residentDevicesCount number of items in indices array. | ||
/// @return UMF_RESULT_SUCCESS on success or appropriate error code on failure. | ||
umf_result_t umfLevelZeroMemoryProviderParamsSetResidentDevices( | ||
umf_level_zero_memory_provider_params_handle_t hParams, | ||
ze_device_handle_t *hDevices, uint32_t deviceCount); | ||
ze_device_handle_t *hDevices, uint32_t deviceCount, | ||
uint32_t *residentDevicesIndices, uint32_t residentDevicesCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an API(and ABI) break - we are post 1.0 release so you cannot do changes like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I bump some number? Only UR uses this API and I am changing UR right now as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if UR uses only this API, old UR should work with new umf. And we will not do 2.0 release just after 1.0.
You cannot brake API. If changes are needed we have to keep old function working correctly and add new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't break backwards compatibility with SYCL / UR. Changing a major version is a significant undertaking that involves updating the components across all the different layers (UMF is nearly the lowest-most component in the stack). We've been burned on this in the past, and it's very disruptive.
In this case, my suggestion would be to simply create a new function that sets the indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, applied, I rewrote code in 53ec753 to be backward compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 / 16 files reviewed (will be continued)
include/umf/memory_provider_ops.h
Outdated
/// failure. | ||
umf_result_t (*ext_resident_device_change)(void *provider, | ||
uint32_t device_index, | ||
bool is_adding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we have two functions - add/remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not apply this, please, because this will introduce too much copy-pasted code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? can't you still have 1 private helper function, but expose 2 functions...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a few layers here and in sycl, that would require having copy pasted add/remove code.
…OINTER_TESTING_PROPS
@lukaszstolarczuk , @bratpiorka , @ldorau , @lplewa , please do the same wrt your comments when you're back |
include/umf/memory_provider_ops.h
Outdated
/// failure. | ||
umf_result_t (*ext_resident_device_change)(void *provider, | ||
uint32_t device_index, | ||
bool is_adding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? can't you still have 1 private helper function, but expose 2 functions...?
|
||
uint32_t existing_peer_index = 0; | ||
utils_write_lock(&ze_provider->resident_device_rwlock); | ||
while (existing_peer_index < ze_provider->resident_device_count && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do the incrementation within the while loop and have extra &&
to mark the stop, I believe I agree with Łukasz - this would look better as for
.
// you obviously don't have to init anything in for (e.g. for(; index < count; index++)
add_umf_library( | ||
NAME umf_ze_loopback | ||
TYPE SHARED | ||
SRCS ze_loopback.h ze_loopback.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.h is not needed?
|
||
#include "umf/providers/provider_level_zero.h" | ||
#include "utils_load_library.h" | ||
#include <cstdlib> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <cstdlib>
#include <umf/providers/provider_level_zero.h>
#include "level_zero_mocks.h"
#include "utils_load_library.h"
#ifndef UMF_TEST_PROVIDER_LEVEL_ZERO_MOCKS_H | ||
#define UMF_TEST_PROVIDER_LEVEL_ZERO_MOCKS_H | ||
|
||
#include "utils_log.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <vector>
#include <gmock/gmock.h>
#include "utils_log.h"
#include "ze_loopback.h"
* | ||
*/ | ||
|
||
#include "ze_loopback.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <cstdlib>
#include <iostream>
#include "ze_loopback.h"
// Under the Apache License v2.0 with LLVM Exceptions. See LICENSE.TXT. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
#include "../common/level_zero_mocks.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <umf/pools/pool_disjoint.h>
#include <umf/providers/provider_level_zero.h>
#include "gtest/gtest.h"
#include "../common/level_zero_mocks.h"
#include "pool.hpp"
UMF part of https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_peer_access.asciidoc feature.
UR/llvm part is in intel/llvm#19257