Skip to content

Commit 83e6559

Browse files
authored
beginner_source/bettertransformer_tutorial.rst ๋ฒˆ์—ญ (#916)
* beginner_source/bettertransformer_tutorial.rst ๋ฒˆ์—ญ
1 parent 279d079 commit 83e6559

File tree

1 file changed

+68
-72
lines changed

1 file changed

+68
-72
lines changed

โ€Žbeginner_source/bettertransformer_tutorial.rst

+68-72
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,61 @@
1-
Fast Transformer Inference with Better Transformer
1+
Better Transformer๋ฅผ ์ด์šฉํ•œ ๊ณ ์† ํŠธ๋žœ์Šคํฌ๋จธ ์ถ”๋ก 
22
===============================================================
33

4-
**Author**: `Michael Gschwind <https://github.com/mikekgfb>`__
4+
**์ €์ž**: `๋งˆ์ดํด ๊ทธ์‰ฌ๋นˆ๋“œ <https://github.com/mikekgfb>`__
5+
**๋ฒˆ์—ญ**: `์ด์ง„ํ˜ <https://github.com/uddk6215>`__
56

6-
This tutorial introduces Better Transformer (BT) as part of the PyTorch 1.12 release.
7-
In this tutorial, we show how to use Better Transformer for production
8-
inference with torchtext. Better Transformer is a production ready fastpath to
9-
accelerate deployment of Transformer models with high performance on CPU and GPU.
10-
The fastpath feature works transparently for models based either directly on
11-
PyTorch core ``nn.module`` or with torchtext.
7+
8+
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” PyTorch 1.12 ๋ฒ„์ „์˜ ์ผ๋ถ€๋กœ Better Transformer (BT)๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.
9+
์—ฌ๊ธฐ์„œ๋Š” torchtext๋ฅผ ์‚ฌ์šฉํ•ด ์ƒ์šฉํ™”๋œ ์ œํ’ˆ ์ˆ˜์ค€์˜ ์ถ”๋ก ์—์„œ Better Transformer๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
10+
Better Transformer๋Š” ์ƒ์šฉ ์ œํ’ˆ ์ˆ˜์ค€์œผ๋กœ ๋ฐ”๋กœ ์ ์šฉ๊ฐ€๋Šฅํ•œ fastpath์ž…๋‹ˆ๋‹ค.
11+
์ด๋Š”, CPU์™€ GPU์—์„œ ๊ณ ์„ฑ๋Šฅ์œผ๋กœ ๋” ๋น ๋ฅด๊ฒŒ Transformer ๋ชจ๋ธ์„ ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ๊ฒŒ๋” ํ•ด์ค๋‹ˆ๋‹ค.
12+
์ด fastpath ๊ธฐ๋Šฅ์€ PyTorch ์ฝ”์–ด nn.module์„ ์ง์ ‘ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๊ฑฐ๋‚˜ torchtext๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ณ  ๋ช…ํ™•ํ•˜๊ฒŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
13+
14+
Better Transformer fastpath๋กœ ๊ฐ€์†ํ™”๋  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์€ PyTorch ์ฝ”์–ด torch.nn.module ํด๋ž˜์Šค์ธ TransformerEncoder, TransformerEncoderLayer,
15+
๊ทธ๋ฆฌ๊ณ  MultiHeadAttention์„ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
16+
๋˜ํ•œ, torchtext๋Š” fastpath ๊ฐ€์†ํ™”์˜ ์ด์ ์„ ์–ป๊ธฐ ์œ„ํ•ด ์ฝ”์–ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ชจ๋“ˆ๋“ค์„ ์‚ฌ์šฉํ•˜๋„๋ก ์—…๋ฐ์ดํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
17+
(์ถ”ํ›„ ๋” ๋งŽ์€ ๋ชจ๋“ˆ์ด fastpath ์‹คํ–‰์„ ์ง€์›ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.)
1218

13-
Models which can be accelerated by Better Transformer fastpath execution are those
14-
using the following PyTorch core ``torch.nn.module`` classes ``TransformerEncoder``,
15-
``TransformerEncoderLayer``, and ``MultiHeadAttention``. In addition, torchtext has
16-
been updated to use the core library modules to benefit from fastpath acceleration.
17-
(Additional modules may be enabled with fastpath execution in the future.)
1819

19-
Better Transformer offers two types of acceleration:
20+
Better Transformer๋Š” ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ๊ฐ€์†ํ™”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:
2021

21-
* Native multihead attention (MHA) implementation for CPU and GPU to improve overall execution efficiency.
22-
* Exploiting sparsity in NLP inference. Because of variable input lengths, input
23-
tokens may contain a large number of padding tokens for which processing may be
24-
skipped, delivering significant speedups.
22+
* CPU์™€ GPU์— ๋Œ€ํ•œ Native multihead attention(MHA) ๊ตฌํ˜„์œผ๋กœ ์ „๋ฐ˜์ ์ธ ์‹คํ–‰ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
23+
* NLP ์ถ”๋ก ์—์„œ์˜ sparsity๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€๋ณ€ ๊ธธ์ด ์ž…๋ ฅ(variable input lengths)์œผ๋กœ ์ธํ•ด ์ž…๋ ฅ ํ† ํฐ์— ๋งŽ์€ ์ˆ˜์˜
24+
ํŒจ๋”ฉ ํ† ํฐ์ด ํฌํ•จ๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ํ† ํฐ๋“ค์˜ ์ฒ˜๋ฆฌ๋ฅผ ๊ฑด๋„ˆ๋›ฐ์–ด ์ƒ๋‹นํ•œ ์†๋„ ํ–ฅ์ƒ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
2525

26-
Fastpath execution is subject to some criteria. Most importantly, the model
27-
must be executed in inference mode and operate on input tensors that do not collect
28-
gradient tape information (e.g., running with torch.no_grad).
26+
Fastpath ์‹คํ–‰์€ ๋ช‡ ๊ฐ€์ง€ ๊ธฐ์ค€์„ ์ถฉ์กฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฑด, ๋ชจ๋ธ์ด ์ถ”๋ก  ๋ชจ๋“œ์—์„œ ์‹คํ–‰๋˜์–ด์•ผ ํ•˜๋ฉฐ
27+
gradient tape ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•˜์ง€ ์•Š๋Š” ์ž…๋ ฅ ํ…์„œ์— ๋Œ€ํ•ด ์ž‘๋™ํ•ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค(์˜ˆ: torch.no_grad๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹คํ–‰).
2928

30-
To follow this example in Google Colab, `click here
29+
์ด ์˜ˆ์ œ๋ฅผ Google Colab์—์„œ ๋”ฐ๋ผํ•˜๋ ค๋ฉด, `์—ฌ๊ธฐ๋ฅผ ํด๋ฆญ
3130
<https://colab.research.google.com/drive/1KZnMJYhYkOMYtNIX5S3AGIYnjyG0AojN?usp=sharing>`__.
3231

33-
Better Transformer Features in This Tutorial
32+
33+
34+
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ Better Transformer์˜ ๊ธฐ๋Šฅ๋“ค
3435
--------------------------------------------
3536

36-
* Load pretrained models (created before PyTorch version 1.12 without Better Transformer)
37-
* Run and benchmark inference on CPU with and without BT fastpath (native MHA only)
38-
* Run and benchmark inference on (configurable) DEVICE with and without BT fastpath (native MHA only)
39-
* Enable sparsity support
40-
* Run and benchmark inference on (configurable) DEVICE with and without BT fastpath (native MHA + sparsity)
37+
* ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ ๋กœ๋“œ (Better Transformer ์—†์ด PyTorch ๋ฒ„์ „ 1.12 ์ด์ „์— ์ƒ์„ฑ๋œ ๋ชจ๋ธ)
38+
* CPU์—์„œ BT fastpath๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์™€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์˜ ์ถ”๋ก ์˜ ์‹คํ–‰ ๋ฐ ๋ฒค์น˜๋งˆํฌ (๋„ค์ดํ‹ฐ๋ธŒ MHA๋งŒ ํ•ด๋‹น)
39+
* (๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ)๋””๋ฐ”์ด์Šค์—์„œ BT fastpath๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์™€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์˜ ์ถ”๋ก ์˜ ์‹คํ–‰ ๋ฐ ๋ฒค์น˜๋งˆํฌ (๋„ค์ดํ‹ฐ๋ธŒ MHA๋งŒ ํ•ด๋‹น)
40+
* sparsity ์ง€์› ํ™œ์„ฑํ™”
41+
* (๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ) ๋””๋ฐ”์ด์Šค์—์„œ BT fastpath๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์™€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์˜ ์ถ”๋ก ์˜ ์‹คํ–‰ ๋ฐ ๋ฒค์น˜๋งˆํฌ (๋„ค์ดํ‹ฐ๋ธŒ MHA + ํฌ์†Œ์„ฑ)
42+
4143

42-
Additional Information
44+
45+
์ถ”๊ฐ€์ ์ธ ์ •๋ณด๋“ค
4346
-----------------------
44-
Additional information about Better Transformer may be found in the PyTorch.Org blog
45-
`A Better Transformer for Fast Transformer Inference
47+
๋” ๋‚˜์€ ํŠธ๋žœ์Šคํฌ๋จธ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์ •๋ณด๋Š” PyTorch.Org ๋ธ”๋กœ๊ทธ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
48+
`๊ณ ์† ํŠธ๋žœ์Šคํฌ๋จธ ์ถ”๋ก ์„ ์œ„ํ•œ Better Transformer
4649
<https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference//>`__.
4750

4851

4952

50-
1. Setup
53+
1. ์„ค์ •
5154

52-
1.1 Load pretrained models
55+
1.1 ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
5356

54-
We download the XLM-R model from the predefined torchtext models by following the instructions in
55-
`torchtext.models <https://pytorch.org/text/main/models.html>`__. We also set the DEVICE to execute
56-
on-accelerator tests. (Enable GPU execution for your environment as appropriate.)
57+
`torchtext.models <https://pytorch.org/text/main/models.html>`__ ์˜ ์ง€์นจ์— ๋”ฐ๋ผ ๋ฏธ๋ฆฌ ์ •์˜๋œ torchtext ๋ชจ๋ธ์—์„œ XLM-R ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
58+
๋˜ํ•œ ๊ฐ€์†๊ธฐ ์ƒ์—์„œ์˜ ํ…Œ์ŠคํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด DEVICE๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. (ํ•„์š”์— ๋”ฐ๋ผ ์‚ฌ์šฉ ํ™˜๊ฒฝ์— ๋งž๊ฒŒ GPU ์‹คํ–‰์„ ํ™œ์„ฑํ™”๋ฉด ๋ฉ๋‹ˆ๋‹ค.)
5759

5860
.. code-block:: python
5961
@@ -74,9 +76,9 @@ on-accelerator tests. (Enable GPU execution for your environment as appropriate
7476
model = xlmr_large.get_model(head=classifier_head)
7577
transform = xlmr_large.transform()
7678
77-
1.2 Dataset Setup
79+
1.2 ๋ฐ์ดํ„ฐ์…‹ ์„ค์ •
7880

79-
We set up two types of inputs: a small input batch and a big input batch with sparsity.
81+
๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ์ž…๋ ฅ์„ ์„ค์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ž‘์€ ์ž…๋ ฅ ๋ฐฐ์น˜์™€ sparsity๋ฅผ ๊ฐ€์ง„ ํฐ ์ž…๋ ฅ ๋ฐฐ์น˜์ž…๋‹ˆ๋‹ค.
8082

8183
.. code-block:: python
8284
@@ -104,7 +106,7 @@ We set up two types of inputs: a small input batch and a big input batch with sp
104106
St. Petersburg, used only by the elite."""
105107
]
106108
107-
Next, we select either the small or large input batch, preprocess the inputs and test the model.
109+
๋‹ค์Œ์œผ๋กœ, ์ž‘์€ ์ž…๋ ฅ ๋ฐฐ์น˜ ๋˜๋Š” ํฐ ์ž…๋ ฅ ๋ฐฐ์น˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๊ณ , ์ž…๋ ฅ์„ ์ „์ฒ˜๋ฆฌํ•œ ํ›„ ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.
108110

109111
.. code-block:: python
110112
@@ -114,23 +116,23 @@ Next, we select either the small or large input batch, preprocess the inputs and
114116
output = model(model_input)
115117
output.shape
116118
117-
Finally, we set the benchmark iteration count:
119+
๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฒค์น˜๋งˆํฌ ๋ฐ˜๋ณต ํšŸ์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
118120

119121
.. code-block:: python
120122
121123
ITERATIONS=10
122124
123-
2. Execution
125+
2. ์‹คํ–‰
124126

125-
2.1 Run and benchmark inference on CPU with and without BT fastpath (native MHA only)
127+
2.1 CPU์—์„œ BT fastpath๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์™€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์˜ ์ถ”๋ก ์˜ ์‹คํ–‰ ๋ฐ ๋ฒค์น˜๋งˆํฌ (๋„ค์ดํ‹ฐ๋ธŒ MHA๋งŒ ํ•ด๋‹น)
126128

127-
We run the model on CPU, and collect profile information:
129+
CPU์—์„œ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ณ  ํ”„๋กœํŒŒ์ผ ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค:
128130

129-
* The first run uses traditional ("slow path") execution.
130-
* The second run enables BT fastpath execution by putting the model in inference mode using `model.eval()` and disables gradient collection with `torch.no_grad()`.
131+
* ์ฒซ ๋ฒˆ์งธ ์‹คํ–‰์€ ์ „ํ†ต์ ์ธ ์‹คํ–‰('slow path')์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
132+
* ๋‘ ๋ฒˆ์งธ ์‹คํ–‰์€ model.eval()์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ถ”๋ก  ๋ชจ๋“œ๋กœ ์„ค์ •ํ•˜๊ณ  torch.no_grad()๋กœ ๋ณ€ํ™”๋„(gradient) ์ˆ˜์ง‘์„ ๋น„ํ™œ์„ฑํ™”ํ•˜์—ฌ BT fastpath ์‹คํ–‰์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
131133

132-
You can see an improvement (whose magnitude will depend on the CPU model) when the model is executing on CPU. Notice that the fastpath profile shows most of the execution time
133-
in the native `TransformerEncoderLayer` implementation `aten::_transformer_encoder_layer_fwd`.
134+
CPU์—์„œ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•  ๋•Œ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒ๋‹ˆ๋‹ค.(ํ–ฅ์ƒ ์ •๋„๋Š” CPU ๋ชจ๋ธ์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค)
135+
fastpath ํ”„๋กœํŒŒ์ผ์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ์‹คํ–‰ ์‹œ๊ฐ„์ด ๋„ค์ดํ‹ฐ๋ธŒ `TransformerEncoderLayer`์˜ ์ €์ˆ˜์ค€ ์—ฐ์‚ฐ์„ ๊ตฌํ˜„ํ•œ `aten::_transformer_encoder_layer_fwd`์— ์†Œ์š”๋˜๋Š” ๊ฒƒ์„ ์ฃผ๋ชฉํ•˜์„ธ์š”:
134136

135137
.. code-block:: python
136138
@@ -152,29 +154,28 @@ in the native `TransformerEncoderLayer` implementation `aten::_transformer_encod
152154
print(prof)
153155
154156
155-
2.2 Run and benchmark inference on (configurable) DEVICE with and without BT fastpath (native MHA only)
157+
2.2 (๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ)๋””๋ฐ”์ด์Šค์—์„œ BT fastpath๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์™€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์˜ ์ถ”๋ก ์˜ ์‹คํ–‰ ๋ฐ ๋ฒค์น˜๋งˆํฌ (๋„ค์ดํ‹ฐ๋ธŒ MHA๋งŒ ํ•ด๋‹น)
156158

157-
We check the BT sparsity setting:
159+
BT sparsity ์„ค์ •์„ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
158160

159161
.. code-block:: python
160162
161163
model.encoder.transformer.layers.enable_nested_tensor
162164
163165
164-
We disable the BT sparsity:
166+
์ด๋ฒˆ์—” BT sparsity์„ ๋น„ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
165167

166168
.. code-block:: python
167169
168170
model.encoder.transformer.layers.enable_nested_tensor=False
169171
170172
171-
We run the model on DEVICE, and collect profile information for native MHA execution on DEVICE:
173+
DEVICE์—์„œ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ณ , DEVICE์—์„œ์˜ ๋„ค์ดํ‹ฐ๋ธŒ MHA ์‹คํ–‰์— ๋Œ€ํ•œ ํ”„๋กœํŒŒ์ผ ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค:
172174

173-
* The first run uses traditional ("slow path") execution.
174-
* The second run enables BT fastpath execution by putting the model in inference mode using `model.eval()`
175-
and disables gradient collection with `torch.no_grad()`.
175+
* ์ฒซ ๋ฒˆ์งธ ์‹คํ–‰์€ ์ „ํ†ต์ ์ธ ('slow path') ์‹คํ–‰์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
176+
* ๋‘ ๋ฒˆ์งธ ์‹คํ–‰์€ model.eval()์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ถ”๋ก  ๋ชจ๋“œ๋กœ ์„ค์ •ํ•˜๊ณ  torch.no_grad()๋กœ ๋ณ€ํ™”๋„(gradient) ์ˆ˜์ง‘์„ ๋น„ํ™œ์„ฑํ™”ํ•˜์—ฌ BT fastpath ์‹คํ–‰์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
176177

177-
When executing on a GPU, you should see a significant speedup, in particular for the small input batch setting:
178+
GPU์—์„œ ์‹คํ–‰ํ•  ๋•Œ, ํŠนํžˆ ์ž‘์€ ์ž…๋ ฅ ๋ฐฐ์น˜๋กœ ์„ค์ •ํ•œ ๊ฒฝ์šฐ ์†๋„๊ฐ€ ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒ๋‹ˆ๋‹ค.
178179

179180
.. code-block:: python
180181
@@ -199,20 +200,20 @@ When executing on a GPU, you should see a significant speedup, in particular for
199200
print(prof)
200201
201202
202-
2.3 Run and benchmark inference on (configurable) DEVICE with and without BT fastpath (native MHA + sparsity)
203+
2.3 (๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ) ๋””๋ฐ”์ด์Šค์—์„œ BT fastpath๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์™€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์˜ ์ถ”๋ก ์˜ ์‹คํ–‰ ๋ฐ ๋ฒค์น˜๋งˆํฌ (๋„ค์ดํ‹ฐ๋ธŒ MHA + ํฌ์†Œ์„ฑ)
203204

204-
We enable sparsity support:
205+
sparsity ์ง€์›์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
205206

206207
.. code-block:: python
207208
208209
model.encoder.transformer.layers.enable_nested_tensor = True
209210
210-
We run the model on DEVICE, and collect profile information for native MHA and sparsity support execution on DEVICE:
211+
DEVICE์—์„œ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ณ , DEVICE์—์„œ์˜ ๋„ค์ดํ‹ฐ๋ธŒ MHA์™€ sparsity ์ง€์› ์‹คํ–‰์— ๋Œ€ํ•œ ํ”„๋กœํŒŒ์ผ ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค:
211212

212-
* The first run uses traditional ("slow path") execution.
213-
* The second run enables BT fastpath execution by putting the model in inference mode using `model.eval()` and disables gradient collection with `torch.no_grad()`.
213+
* ์ฒซ ๋ฒˆ์งธ ์‹คํ–‰์€ ์ „ํ†ต์ ์ธ ('slow path') ์‹คํ–‰์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
214+
* ๋‘ ๋ฒˆ์งธ ์‹คํ–‰์€ model.eval()์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ถ”๋ก  ๋ชจ๋“œ๋กœ ์„ค์ •ํ•˜๊ณ  torch.no_grad()๋กœ ๋ณ€ํ™”๋„(gradient) ์ˆ˜์ง‘์„ ๋น„ํ™œ์„ฑํ™”ํ•˜์—ฌ BT fastpath ์‹คํ–‰์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.
214215

215-
When executing on a GPU, you should see a significant speedup, in particular for the large input batch setting which includes sparsity:
216+
GPU์—์„œ ์‹คํ–‰ํ•  ๋•Œ, ํŠนํžˆ sparsity๋ฅผ ํฌํ•จํ•˜๋Š” ํฐ ์ž…๋ ฅ ๋ฐฐ์น˜ ์„ค์ •์—์„œ ์ƒ๋‹นํ•œ ์†๋„ ํ–ฅ์ƒ์„ ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒ๋‹ˆ๋‹ค.
216217

217218
.. code-block:: python
218219
@@ -237,15 +238,10 @@ When executing on a GPU, you should see a significant speedup, in particular for
237238
print(prof)
238239
239240
240-
Summary
241+
์š”์•ฝ
241242
-------
242-
243-
In this tutorial, we have introduced fast transformer inference with
244-
Better Transformer fastpath execution in torchtext using PyTorch core
245-
Better Transformer support for Transformer Encoder models. We have
246-
demonstrated the use of Better Transformer with models trained prior to
247-
the availability of BT fastpath execution. We have demonstrated and
248-
benchmarked the use of both BT fastpath execution modes, native MHA execution
249-
and BT sparsity acceleration.
250-
251-
243+
244+
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” torchtext์—์„œ PyTorch ์ฝ”์–ด์˜ ํŠธ๋žœ์Šคํฌ๋จธ ์ธ์ฝ”๋” ๋ชจ๋ธ์„ ์œ„ํ•œ Better Transformer ์ง€์›์„ ํ™œ์šฉํ•˜์—ฌ,
245+
Better Transformer๋ฅผ ์ด์šฉํ•œ ๊ณ ์† ํŠธ๋žœ์Šคํฌ๋จธ ์ถ”๋ก ์„ ์†Œ๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค.
246+
BT fastpath ์‹คํ–‰์ด ๊ฐ€๋Šฅํ•ด์ง€๊ธฐ ์ด์ „์— ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์—์„œ Better Transformer์˜ ์‚ฌ์šฉ์„ ์‹œ์—ฐํ–ˆ์Šต๋‹ˆ๋‹ค.
247+
๋˜ํ•œ BT fastpath ์‹คํ–‰์˜ ๋‘ ๊ฐ€์ง€ ๋ชจ๋“œ์ธ ๋„ค์ดํ‹ฐ๋ธŒ MHA ์‹คํ–‰๊ณผ BT sparsity ๊ฐ€์†ํ™”์˜ ์‚ฌ์šฉ์„ ์‹œ์—ฐ ๋ฐ ๋ฒค์น˜๋งˆํฌ๋ฅผ ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

0 commit comments

Comments
ย (0)