Conversation
|
Review updated until commit f8a94fc Description
|
| Relevant files | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement | 5 files
| ||||||||||
| Tests | 1 files
| ||||||||||
| Configuration changes | |||||||||||
| Documentation | 1 files
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Resource Management
|
samnordmann
left a comment
There was a problem hiding this comment.
Thank you very much! This looks great.
Here are some comments requesting some minor changes or explanation. The only point I'm a bit worried about is that we need a way to make the "wait" not blocking for cpu.
| #ifdef USE_NIXL | ||
| return true; | ||
| #else | ||
| return false; | ||
| #endif |
There was a problem hiding this comment.
The logic LGTM but for consistency in the implementation please mimick what is done with nccl_available_ and ucc_available_
| }; | ||
|
|
||
| // ------------------------------------------------------------------ | ||
| // Todo - those functions should be moved to a more global file |
There was a problem hiding this comment.
These helper functions are only used in csrc/multidevice/nixl.cpp file so let's not move their declaration up for now, and even, I would put those definitions in the .cpp file, not the header. Btw, our convention is to put static definitions (I mean definitions only used in their file, not linked against) inside an anonymous namespace:
namespace {
void helper() {
...
}
} // namespace
void exportedFonction() {
...
helper();
...
}TensorDesc though might be needed in the header (IIUC we cannot communicate at::Tensor outside the nvLink domain otherwise providing the at::Tensor::device crashes). Imo we can leave it here for now, later we could move it up ipc_utils.h or multidevice.h if we feel the need.
| }; | ||
| } | ||
|
|
||
| inline at::Tensor fromTensorDesc(const TensorDesc& desc) { |
| ); | ||
| } | ||
|
|
||
| inline std::vector<uint8_t> serializeTensorsDescs( |
There was a problem hiding this comment.
instead of implementing seerialize/deserialize, could you instead use fonctions in
Fuser/csrc/multidevice/ipc_utils.h
Lines 18 to 28 in 0cfc24b
| uint32_t dev; | ||
| }; | ||
| static_assert(std::is_trivially_copyable_v<TensorDesc>, | ||
| "TensorDesc must be trivially copyable for serialization"); |
There was a problem hiding this comment.
I don't think that's needed
| } | ||
|
|
||
| void NixlBackend::cleanup() { | ||
| cleaned_up_ = true; |
There was a problem hiding this comment.
naybe reuse available_ here ?
|
|
||
| void NixlBackend::cleanup() { | ||
| cleaned_up_ = true; | ||
| impl_.reset(); |
| }; | ||
|
|
||
| NixlTransferHandle::NixlTransferHandle() = default; | ||
| NixlTransferHandle::~NixlTransferHandle() = default; |
| NVF_THROW("Failed to create UCX backend for NIXL agent"); | ||
| } | ||
|
|
||
| // Probe: verify that VRAM (CUDA GPU memory) is actually usable with |
There was a problem hiding this comment.
is that really necessary? It looks suspicious to me, can you help me understand?
| #endif | ||
| } | ||
|
|
||
| void NixlBackend::Impl::waitTransfer(NixlTransferHandle& handle) { |
There was a problem hiding this comment.
This wait function is cpu blocking so in practice it is more or less unusable in our context. Do you have an idea how to make this not blocking for cpu -- and ideally cuda-graph capturable?
|
@x41lakazam Can you provide instructions on how to build nixl, say, from pjnl docker image? We'll probably need to think how to add the library is the base image and/or the CI, unless it is already shipped in some DLFW package |
https://nvidia.slack.com/archives/C08KL9MNQ3U/p1771951941351029 |
No description provided.