Skip to content

Commit 63333e2

Browse files
malfetgunandrose4u
andauthored
[1.8] Update api doc for enabling TcpStore on Windows (pytorch#52601)
Summary: Fixes #{issue number} Pull Request resolved: pytorch#51847 Reviewed By: albanD Differential Revision: D26405678 Pulled By: malfet fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c Co-authored-by: Joe Zhu <[email protected]>
1 parent 8e7eebf commit 63333e2

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

docs/source/distributed.rst

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -58,16 +58,16 @@ distributed (NCCL only when building with CUDA). MPI is an optional backend that
5858
included if you build PyTorch from source. (e.g.building PyTorch on a host that has MPI
5959
installed.)
6060

61-
.. warning ::
62-
As of PyTorch v1.7, Windows support for the distributed package only covers collective
63-
communications with Gloo backend, `FileStore`, and `DistributedDataParallel`. Therefore,
64-
the `init_method` argument in :func:`init_process_group` must point to a file. This works
65-
for both local and shared file systems:
61+
.. note ::
62+
As of PyTorch v1.8, Windows supports all collective communications backend but NCCL,
63+
If the `init_method` argument of :func:`init_process_group` points to a file it must adhere
64+
to the following schema:
6665
6766
- Local file system, ``init_method="file:///d:/tmp/some_file"``
6867
- Shared file system, ``init_method="file://////{machine_name}/{share_folder_name}/some_file"``
6968
70-
Similarly, if you directly pass in a `store` argument, it must be a ``FileStore`` instance.
69+
Same as on Linux platform, you can enable TcpStore by setting environment variables,
70+
MASTER_ADDR and MASTER_PORT.
7171
7272
Which backend to use?
7373
^^^^^^^^^^^^^^^^^^^^^
@@ -330,13 +330,13 @@ as they should never be created manually, but they are guaranteed to support two
330330

331331
Synchronous and asynchronous collective operations
332332
--------------------------------------------------
333-
Every collective operation function supports the following two kinds of operations,
333+
Every collective operation function supports the following two kinds of operations,
334334
depending on the setting of the ``async_op`` flag passed into the collective:
335335

336336
**Synchronous operation** - the default mode, when ``async_op`` is set to ``False``.
337337
When the function returns, it is guaranteed that
338338
the collective operation is performed. In the case of CUDA operations, it is not guaranteed
339-
that the CUDA operation is completed, since CUDA operations are asynchronous. For CPU collectives, any
339+
that the CUDA operation is completed, since CUDA operations are asynchronous. For CPU collectives, any
340340
further function calls utilizing the output of the collective call will behave as expected. For CUDA collectives,
341341
function calls utilizing the output on the same CUDA stream will behave as expected. Users must take care of
342342
synchronization under the scenario of running under different streams. For details on CUDA semantics such as stream
@@ -347,12 +347,12 @@ See the below script to see examples of differences in these semantics for CPU a
347347
returns a distributed request object. In general, you don't need to create it manually and it
348348
is guaranteed to support two methods:
349349

350-
* ``is_completed()`` - in the case of CPU collectives, returns ``True`` if completed. In the case of CUDA operations,
351-
returns ``True`` if the operation has been successfully enqueued onto a CUDA stream and the output can be utilized on the
352-
default stream without further synchronization.
350+
* ``is_completed()`` - in the case of CPU collectives, returns ``True`` if completed. In the case of CUDA operations,
351+
returns ``True`` if the operation has been successfully enqueued onto a CUDA stream and the output can be utilized on the
352+
default stream without further synchronization.
353353
* ``wait()`` - in the case of CPU collectives, will block the process until the operation is completed. In the case
354-
of CUDA collectives, will block until the operation has been successfully enqueued onto a CUDA stream and the
355-
output can be utilized on the default stream without further synchronization.
354+
of CUDA collectives, will block until the operation has been successfully enqueued onto a CUDA stream and the
355+
output can be utilized on the default stream without further synchronization.
356356

357357
**Example**
358358

@@ -368,7 +368,7 @@ It shows the explicit need to synchronize when using collective outputs on diffe
368368
handle = dist.all_reduce(output, async_op=True)
369369
# Wait ensures the operation is enqueued, but not necessarily complete.
370370
handle.wait()
371-
# Using result on non-default stream.
371+
# Using result on non-default stream.
372372
with torch.cuda.stream(s):
373373
s.wait_stream(torch.cuda.default_stream())
374374
output.add_(100)
@@ -382,7 +382,7 @@ It shows the explicit need to synchronize when using collective outputs on diffe
382382
Collective functions
383383
--------------------
384384

385-
.. autofunction:: broadcast
385+
.. autofunction:: broadcast
386386

387387
.. autofunction:: broadcast_object_list
388388

@@ -426,7 +426,7 @@ you can find an implementation of those in the `torch.distributed.nn.*` module.
426426
Functions here are synchronous and will be inserted in the autograd graph, so
427427
you need to ensure that all the processes that participated in the collective operation
428428
will do the backward pass for the backward communication to effectively happen and
429-
don't cause a deadlock.
429+
don't cause a deadlock.
430430

431431
Please notice that currently the only backend where all the functions are guaranteed to work is ``gloo``.
432432
.. autofunction:: torch.distributed.nn.broadcast

0 commit comments

Comments
 (0)