-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ucc_mem_map and ucc_mem_unmap #1070
base: master
Are you sure you want to change the base?
Add support for ucc_mem_map and ucc_mem_unmap #1070
Conversation
Can one of the admins verify this patch? |
61ab0da
to
4a2f9c6
Compare
CL/BASIC: add support for ucc_mem_map/unmap CL/HIER: add support for ucc_mem_map/unmap CL/DOCA_UROM: add support for ucc_mem_map/unmap TL/CUDA: add support for ucc_mem_map/unmap TL/MLX5: add support for ucc_mem_map/unmap TL/NCCL: add support for ucc_mem_map/unmap TL/RCCL: add support for ucc_mem_map/unmap TL/SELF: add support for ucc_mem_map/unmap TL/SHARP: add support for ucc_mem_map/unmap
TL/UCP: add support for communication with memmap REVIEW: various fixes TL/UCP: Enable multiple packed buffers
@janjust @nsarka Can you test on your setup? Ask @wfaderhold21 for details. |
f02340e
to
41f0b4e
Compare
1a327f1
to
ebc57c5
Compare
Example user usage of API changes: https://github.com/wfaderhold21/simple-ucc-ex |
src/core/ucc_context.h
Outdated
ucc_context_h context; | ||
void *address; | ||
size_t len; | ||
ucc_rank_t my_ctx_rank; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why my_ctx_rank is needed? my_ctx_rank exists only when context is global. Should we mention in docs that mem_map doesn't work with local contexts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. my_ctx_rank
isn't needed. I had used it in a previous iteration of the code, but had removed any use of it. Just removed it the variable completely.
if (flags == UCC_MEM_MAP_IMPORT || flags == UCC_MEM_MAP_IMPORT_OFFLOAD) { | ||
return ucc_mem_map_import(context, flags, params, memh_size, memh); | ||
} else { | ||
if (params->n_segments > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to support multiple mapping at a time in future? what is a use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A use case would include onesided alltoall where both the destination and global work buffer are associated with the destination memory handles. This would allow use of the current algorithm implementation in UCC already. However, implementing this may not be clean...
pack_params.field_mask = UCP_MEMH_PACK_PARAM_FIELD_FLAGS; | ||
pack_params.flags = UCP_MEMH_PACK_FLAG_EXPORT; | ||
|
||
status = ucp_memh_pack(mh, &pack_params, &m_data->packed_memh, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we do ucp_memh_pack and rkey_pack in mem_map function and not in memh_pack function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primarily for convenience. I've updated this so it's correct and the ucp_memh_pack and rkey_pack are in the memh_pack function now.
/* always treat as a local mem handle */ | ||
status = tl_lib->iface->context.mem_map( | ||
(const ucc_base_context_t *)ctx->tl_ctx[i], type, | ||
params->segments[0].address, params->segments[0].len, local_memh, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
address and len are allready part of memh, why do we need to pass them separately one more time?
What
Implementation of PR #1037
NOTE: branch is branched from #1037 and will need to be rebased before merging