Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ucc_mem_map and ucc_mem_unmap #1070

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

wfaderhold21
Copy link
Collaborator

What

Implementation of PR #1037

NOTE: branch is branched from #1037 and will need to be rebased before merging

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@wfaderhold21 wfaderhold21 force-pushed the topic/explicit-mem-map-impl branch from 61ab0da to 4a2f9c6 Compare January 16, 2025 00:31
ferrol aderholdt added 12 commits January 15, 2025 16:31
CL/BASIC: add support for ucc_mem_map/unmap

CL/HIER: add support for ucc_mem_map/unmap

CL/DOCA_UROM: add support for ucc_mem_map/unmap

TL/CUDA: add support for ucc_mem_map/unmap

TL/MLX5: add support for ucc_mem_map/unmap

TL/NCCL: add support for ucc_mem_map/unmap

TL/RCCL: add support for ucc_mem_map/unmap

TL/SELF: add support for ucc_mem_map/unmap

TL/SHARP: add support for ucc_mem_map/unmap
TL/UCP: add support for communication with memmap

REVIEW: various fixes

TL/UCP: Enable multiple packed buffers
@manjugv
Copy link
Contributor

manjugv commented Jan 18, 2025

@janjust @nsarka Can you test on your setup? Ask @wfaderhold21 for details.

@wfaderhold21 wfaderhold21 force-pushed the topic/explicit-mem-map-impl branch from f02340e to 41f0b4e Compare February 19, 2025 19:54
@wfaderhold21 wfaderhold21 force-pushed the topic/explicit-mem-map-impl branch from 1a327f1 to ebc57c5 Compare February 20, 2025 00:15
@wfaderhold21
Copy link
Collaborator Author

Example user usage of API changes: https://github.com/wfaderhold21/simple-ucc-ex

ucc_context_h context;
void *address;
size_t len;
ucc_rank_t my_ctx_rank;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why my_ctx_rank is needed? my_ctx_rank exists only when context is global. Should we mention in docs that mem_map doesn't work with local contexts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. my_ctx_rank isn't needed. I had used it in a previous iteration of the code, but had removed any use of it. Just removed it the variable completely.

if (flags == UCC_MEM_MAP_IMPORT || flags == UCC_MEM_MAP_IMPORT_OFFLOAD) {
return ucc_mem_map_import(context, flags, params, memh_size, memh);
} else {
if (params->n_segments > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to support multiple mapping at a time in future? what is a use case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A use case would include onesided alltoall where both the destination and global work buffer are associated with the destination memory handles. This would allow use of the current algorithm implementation in UCC already. However, implementing this may not be clean...

pack_params.field_mask = UCP_MEMH_PACK_PARAM_FIELD_FLAGS;
pack_params.flags = UCP_MEMH_PACK_FLAG_EXPORT;

status = ucp_memh_pack(mh, &pack_params, &m_data->packed_memh,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we do ucp_memh_pack and rkey_pack in mem_map function and not in memh_pack function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily for convenience. I've updated this so it's correct and the ucp_memh_pack and rkey_pack are in the memh_pack function now.

/* always treat as a local mem handle */
status = tl_lib->iface->context.mem_map(
(const ucc_base_context_t *)ctx->tl_ctx[i], type,
params->segments[0].address, params->segments[0].len, local_memh,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address and len are allready part of memh, why do we need to pass them separately one more time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants