Skip to content

[wip]Add event to sync#768

Open
qyh111 wants to merge 2 commits intoModelEngine-Group:developfrom
qyh111:dev_sync
Open

[wip]Add event to sync#768
qyh111 wants to merge 2 commits intoModelEngine-Group:developfrom
qyh111:dev_sync

Conversation

@qyh111
Copy link
Contributor

@qyh111 qyh111 commented Feb 28, 2026

Purpose

Modifications

Test

image image image

if (!handle.Ready()) {
auto cacheStream = stream.NextStream();
if (task->desc.compute_event_handle != 0) {
auto s = cacheStream->WaitEvent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A task has only one event, so it is not necessary to execute a wait in each shard.

using vector::vector; /* Inherit all ctors */
std::string brief; /* Description of Task */
/** Optional: compute-stream event handle for dump. Cache stream waits before D2H. */
uintptr_t compute_event_handle{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming style does not meet requirements

Copy link
Contributor

@mag1c-h mag1c-h Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter represents the handle to the prerequisite events that the task execution depends on. Using prerequisiteHandle might be better.

super().__init__(device_id)

def init_device(self):
torch.cuda.set_device(self.device_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context has already been set up where this class is called, so it does not need to be initialized again.


logger = init_logger(__name__)

class Device(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for runtime difference adaptation already exists, it's best to put the new parts together with them.

/** Wait for compute-stream event before D2H. Event ptr is platform-specific
* (cudaEvent_t or aclrtEvent). No-op when event is nullptr. */
virtual Status WaitEvent(void* event)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to define it as a pure virtual function, emphasizing that all inheritors must implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants