Skip to content

Commit 4ed2ab7

Browse files
committedJun 7, 2020
Rebuild
1 parent 6af07fd commit 4ed2ab7

File tree

194 files changed

+5870
-9957
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

194 files changed

+5870
-9957
lines changed
 

‎docs/_downloads/032d653a4f5a9c1ec32b9fc7c989ffe1/seq2seq_translation_tutorial.ipynb

+4-4
Large diffs are not rendered by default.

‎docs/_downloads/03a48646520c277662581e858e680809/model_parallel_tutorial.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\nSingle-Machine Model Parallel Best Practices\n================================\n**Author**: `Shen Li <https://mrshenli.github.io/>`_\n\nModel parallel is widely-used in distributed training\ntechniques. Previous posts have explained how to use\n`DataParallel <https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html>`_\nto train a neural network on multiple GPUs; this feature replicates the\nsame model to all GPUs, where each GPU consumes a different partition of the\ninput data. Although it can significantly accelerate the training process, it\ndoes not work for some use cases where the model is too large to fit into a\nsingle GPU. This post shows how to solve that problem by using **model parallel**,\nwhich, in contrast to ``DataParallel``, splits a single model onto different GPUs,\nrather than replicating the entire model on each GPU (to be concrete, say a model\n``m`` contains 10 layers: when using ``DataParallel``, each GPU will have a\nreplica of each of these 10 layers, whereas when using model parallel on two GPUs,\neach GPU could host 5 layers).\n\nThe high-level idea of model parallel is to place different sub-networks of a\nmodel onto different devices, and implement the ``forward`` method accordingly\nto move intermediate outputs across devices. As only part of a model operates\non any individual device, a set of devices can collectively serve a larger\nmodel. In this post, we will not try to construct huge models and squeeze them\ninto a limited number of GPUs. Instead, this post focuses on showing the idea\nof model parallel. It is up to the readers to apply the ideas to real-world\napplications.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>For distributed model parallel training where a model spans multiple\n servers, please refer to\n `Getting Started With Distributed RPC Framework <rpc_tutorial.html>`__\n for examples and details.</p></div>\n\nBasic Usage\n-----------\n\n"
18+
"\nSingle-Machine Model Parallel Best Practices\n================================\n**Author**: `Shen Li <https://mrshenli.github.io/>`_\n\nModel parallel is widely-used in distributed training\ntechniques. Previous posts have explained how to use\n`DataParallel <https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html>`_\nto train a neural network on multiple GPUs; this feature replicates the\nsame model to all GPUs, where each GPU consumes a different partition of the\ninput data. Although it can significantly accelerate the training process, it\ndoes not work for some use cases where the model is too large to fit into a\nsingle GPU. This post shows how to solve that problem by using **model parallel**,\nwhich, in contrast to ``DataParallel``, splits a single model onto different GPUs,\nrather than replicating the entire model on each GPU (to be concrete, say a model\n``m`` contains 10 layers: when using ``DataParallel``, each GPU will have a\nreplica of each of these 10 layers, whereas when using model parallel on two GPUs,\neach GPU could host 5 layers).\n\nThe high-level idea of model parallel is to place different sub-networks of a\nmodel onto different devices, and implement the ``forward`` method accordingly\nto move intermediate outputs across devices. As only part of a model operates\non any individual device, a set of devices can collectively serve a larger\nmodel. In this post, we will not try to construct huge models and squeeze them\ninto a limited number of GPUs. Instead, this post focuses on showing the idea\nof model parallel. It is up to the readers to apply the ideas to real-world\napplications.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>For distributed model parallel training where a model spans multiple\n servers, please refer to\n `Getting Started With Distributed RPC Framework <rpc_tutorial.html>`__\n for examples and details.</p></div>\n\nBasic Usage\n-----------\n"
1919
]
2020
},
2121
{
@@ -175,7 +175,7 @@
175175
"name": "python",
176176
"nbconvert_exporter": "python",
177177
"pygments_lexer": "ipython3",
178-
"version": "3.6.7"
178+
"version": "3.7.4"
179179
}
180180
},
181181
"nbformat": 4,

0 commit comments

Comments
 (0)
Please sign in to comment.