-
Notifications
You must be signed in to change notification settings - Fork 792
[SYCL][L0] Implement reuse of l0 events in plugin #5947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -657,6 +657,13 @@ struct _pi_context : _pi_object { | |
// and return the pool to the cache if there are no unreleased events. | ||
pi_result decrementUnreleasedEventsInPool(pi_event Event); | ||
|
||
// Return event if cache has an event available, nullptr otherwise. | ||
ze_event_handle_t getZeEventFromCache(); | ||
|
||
// Put the event in the cache for reuse. | ||
void addZeEventToCache(ze_event_handle_t ZeEvent, | ||
bool CamefromEventRelease = false); | ||
|
||
// Store USM allocator context(internal allocator structures) | ||
// for USM shared and device allocations. There is 1 allocator context | ||
// per each pair of (context, device) per each memory type. | ||
|
@@ -718,6 +725,15 @@ struct _pi_context : _pi_object { | |
// Mutex to control operations on event pool caches and the helper maps | ||
// holding the current pool usage counts. | ||
std::mutex ZeEventPoolCacheMutex; | ||
|
||
// This counts how many PI events used a particular L0 event. If the counter | ||
// becomes zero when we release the PI event, this means it was the last user | ||
// of the L0 event and we can safely put it back in the cache. | ||
std::unordered_map<ze_event_handle_t, size_t> ZeEventUseCount; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why you need any new counts for this. It should be at the time we currently call Also how do you ensure that the pool containing the event is not destroyed to early but is destroyed when all events in it are destroyed? (since you are adding a new structure to cache unused events it needs to sync with the exisitng pool structure). And I think they can use the same mutex. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Since we reuse L0 event multiple times in different PI events, there is a case when we release the PI event but we should not really add the L0 event to cache and we can safely add L0 event into cache only after last PI event is released.
L0 events from the cache will be removed inside _pi_context::finalize right before the pool is destroyed. |
||
// Cache of allocated events that are in non-signaled state and can be reused. | ||
std::unordered_set<ze_event_handle_t> ZeEventsCache; | ||
// Mutex to control operations on L0 event cache and event usage counts. | ||
std::mutex ZeEventsCacheMutex; | ||
}; | ||
|
||
struct _pi_queue : _pi_object { | ||
|
@@ -869,6 +885,10 @@ struct _pi_queue : _pi_object { | |
auto CommandBatch = (IsCopy) ? CopyCommandBatch : ComputeCommandBatch; | ||
return CommandBatch.OpenCommandList != CommandListMap.end(); | ||
} | ||
|
||
// Reset LastCommandEvent->ZeEvent if necessary and put it in cache for reuse. | ||
pi_result resetLastEventIfNeeded(pi_command_list_ptr_t CommandList); | ||
|
||
// Attach a command list to this queue, close, and execute it. | ||
// Note that this command list cannot be appended to after this. | ||
// The "IsBlocking" tells if the wait for completion is required. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is reset of the unused events can't happen in the executeCommandList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably can, but then we should add a new parameter to
executeCommandList
to separate the cases whenresetLastEventIfNeeded
is executed and when it is not.For example, one of the problematic cases: we should not reset the L0 event when closing the command-list if we take this event later as a dependency (this situation occurs when the event is from another queue, this happens inside
createAndRetainPiZeEventList
forLastCommandEvent
which should be used as a dependency.)I'm trying to say that the general rule is to reset the L0 event immediately after queue operations such as
zeCommandListAppendLaunchKernel
,zeCommandListAppendMemoryCopy
etc. (or afterzeCommandListAppendWaitOnEvents
if the queue operation used it, see next paragraph about it).On the other hand, in the future we may do some optimization: currently for cases when we already had
zeCommandListAppendWaitOnEvents
, we could skip it inresetLastEventIfNeeded
and for this we can add one parameter toresetLastEventIfNeeded
function, but if we transfer it toexecuteCommandList
, then for such optimization you have to add two parameters toexecuteCommandList
function. That is, we can do as you suggest, but the only question is whether it is necessary?