Skip to content

Commit cf15de7

Browse files
weifengpymikaylagawarecki
authored andcommitted
[distributed] point to fsdp2 and remove fsdp1 in distributed overview (#3477)
* [distributed] point to fsdp2 and remove fsdp1 in distributed overview * use fsdp2 * mark fsdp1 tutorial as deprecated and no page is linking to fsdp1 * revert fsdp1 tutorial change
1 parent d73f64e commit cf15de7

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

beginner_source/dist_overview.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
PyTorch Distributed Overview
22
============================
3-
**Author**: `Will Constable <https://github.com/wconstab/>`_
3+
**Author**: `Will Constable <https://github.com/wconstab/>`_, `Wei Feng <https://github.com/weifengpy>`_
44

55
.. note::
66
|edit| View and edit this tutorial in `github <https://github.com/pytorch/tutorials/blob/main/beginner_source/dist_overview.rst>`__.
@@ -26,7 +26,7 @@ Parallelism APIs
2626
These Parallelism Modules offer high-level functionality and compose with existing models:
2727

2828
- `Distributed Data-Parallel (DDP) <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
29-
- `Fully Sharded Data-Parallel Training (FSDP) <https://pytorch.org/docs/stable/fsdp.html>`__
29+
- `Fully Sharded Data-Parallel Training (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__
3030
- `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__
3131
- `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__
3232

@@ -74,11 +74,11 @@ When deciding what parallelism techniques to choose for your model, use these co
7474

7575
* See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__
7676

77-
#. Use `FullyShardedDataParallel (FSDP) <https://pytorch.org/docs/stable/fsdp.html>`__ when your model cannot fit on one GPU.
77+
#. Use `FullyShardedDataParallel (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__ when your model cannot fit on one GPU.
7878

79-
* See also: `Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
79+
* See also: `Getting Started with FSDP2 <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
8080

81-
#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP.
81+
#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP2.
8282

8383
* Try our `Tensor Parallelism Tutorial <https://pytorch.org/tutorials/intermediate/TP_tutorial.html>`__
8484

0 commit comments

Comments
 (0)