Skip to content

Commit ce7f3e2

Browse files
gciepawloch00
andauthored
Documentation fixes for NCCL tests (#376)
* add draft Signed-off-by: Piotr Pawłowski <[email protected]> * Fixed nccl test script and command * Wording, parametrizing --device-type * capitalizing ultra, link in readme --------- Signed-off-by: Piotr Pawłowski <[email protected]> Co-authored-by: Piotr Pawłowski <[email protected]>
1 parent c1eff0b commit ce7f3e2

File tree

2 files changed

+6
-2
lines changed

2 files changed

+6
-2
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -564,6 +564,8 @@ To submit jobs on a cluster with A3 machines, run the below command. To create a
564564
```
565565
> The docker image flags/arguments introduced in [workloads section](#workload-create) can be used with A3 machines as well.
566566

567+
In order to run NCCL test on A3 Ultra machines check out [this guide](/examples/nccl/nccl.md).
568+
567569
### Workload Priority and Preemption
568570
* Set the priority level of your workload with `--priority=LEVEL`
569571

examples/nccl/nccl.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,13 @@
22

33
This document provides an introduction to running tests for the NVIDIA Collective Communications Library (NCCL). NCCL is a high-performance, multi-GPU communications library used in deep learning and other applications. The test suite helps verify the correct functionality and performance of NCCL on your system. Please visit [NCCL tests github](https://github.com/NVIDIA/nccl-tests?tab=readme-ov-file#nccl-tests) to learn more about NCCL and running it.
44

5-
Steps presented in this document are designed to run on A3 ultra machines (`DEVICE_TYPE=h200-141gb-8`).
5+
Steps presented in this document are designed to run on A3 Ultra machines (`DEVICE_TYPE=h200-141gb-8`).
66

77
### 1. Create cluster
88

9-
First step is to create a cluster with A3 ultra machine. Execute below step:
9+
Skip this step if you have already provisioned a GKE cluster with A3 Ultra machines.
10+
11+
First step is to create a cluster with A3 Ultra machine. Execute command below:
1012

1113
```
1214
python3 xpk.py cluster create \

0 commit comments

Comments
 (0)