1
1
"""
2
- Zeroing out gradients in PyTorch
2
+ PyTorch์์ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค๊ธฐ
3
3
================================
4
- It is beneficial to zero out gradients when building a neural network.
5
- This is because by default, gradients are accumulated in buffers (i.e,
6
- not overwritten) whenever ``.backward()`` is called.
4
+ ์ ๊ฒฝ๋ง์ ๊ตฌ์ถํ ๋๋ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค์ด ์ฃผ๋ ๊ฒ์ด ์ข์ต๋๋ค. ๊ธฐ๋ณธ์ ์ผ๋ก
5
+ ``.backward()`` ๋ฅผ ํธ์ถํ ๋๋ง๋ค ๋ณํ๋๊ฐ ๋ฒํผ์ ์์ด๊ธฐ ๋๋ฌธ์
๋๋ค. (๋ฎ์ด์ฐ์ง ์๋๋ค๋ ์๋ฏธ์
๋๋ค.)
7
6
8
- Introduction
7
+ ๊ฐ์
9
8
------------
10
- When training your neural network, models are able to increase their
11
- accuracy through gradient decent. In short, gradient descent is the
12
- process of minimizing our loss (or error) by tweaking the weights and
13
- biases in our model.
14
-
15
- ``torch.Tensor`` is the central class of PyTorch. When you create a
16
- tensor, if you set its attribute ``.requires_grad`` as ``True``, the
17
- package tracks all operations on it. This happens on subsequent backward
18
- passes. The gradient for this tensor will be accumulated into ``.grad``
19
- attribute. The accumulation (or sum) of all the gradients is calculated
20
- when .backward() is called on the loss tensor.
21
-
22
- There are cases where it may be necessary to zero-out the gradients of a
23
- tensor. For example: when you start your training loop, you should zero
24
- out the gradients so that you can perform this tracking correctly.
25
- In this recipe, we will learn how to zero out gradients using the
26
- PyTorch library. We will demonstrate how to do this by training a neural
27
- network on the ``CIFAR10`` dataset built into PyTorch.
28
-
29
- Setup
9
+ ์ ๊ฒฝ๋ง์ ํ์ต์ํฌ ๋, ๊ฒฝ์ฌ ํ๊ฐ๋ฒ์ ๊ฑฐ์ณ ๋ชจ๋ธ ์ ํ๋๋ฅผ ๋์ผ ์ ์์ต๋๋ค. ๊ฒฝ์ฌ ํ๊ฐ๋ฒ์ ๊ฐ๋จํ
10
+ ์ค๋ช
ํด ๋ชจ๋ธ์ ๊ฐ์ค์น์ ํธํฅ์ ์ฝ๊ฐ์ฉ ์์ ํ๋ฉด์ ์์ค(๋๋ ์ค๋ฅ)๋ฅผ ์ต์ํํ๋ ๊ณผ์ ์
๋๋ค.
11
+
12
+ ``torch.Tensor`` ๋ PyTorch ์ ํต์ฌ์ด ๋๋ ํด๋์ค ์
๋๋ค. ํ
์๋ฅผ ์์ฑํ ๋
13
+ ``.requires_grad`` ์์ฑ์ ``True`` ๋ก ์ค์ ํ๋ฉด, ํ
์์ ๊ฐํด์ง ๋ชจ๋ ์ฐ์ฐ์ ์ถ์ ํฉ๋๋ค.
14
+ ๋ค๋ฐ๋ฅด๋ ๋ชจ๋ ์ญ์ ํ ๋จ๊ณ์์๋ ๋ง์ฐฌ๊ฐ์ง๋ก, ์ด ํ
์์ ๋ณํ๋๋ ``.grad`` ์์ฑ์ ๋์ ๋ ๊ฒ์
๋๋ค.
15
+ ๋ชจ๋ ๋ณํ๋์ ์ถ์ ๋๋ ํฉ์ ์์ค ํ
์์์ ``.backward()``๋ฅผ ํธ์ถํ ๋ ๊ณ์ฐ๋ฉ๋๋ค.
16
+
17
+ ํ
์์ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค์ด ์ฃผ์ด์ผ ํ๋ ๊ฒฝ์ฐ๋ ์์ต๋๋ค. ์๋ฅผ ๋ค์ด ํ์ต ๊ณผ์ ๋ฐ๋ณต๋ฌธ์
18
+ ์์ํ ๋, ๋์ ๋๋ ๋ณํ๋๋ฅผ ์ ํํ๊ฒ ์ถ์ ํ๊ธฐ ์ํด์๋ ๋ณํ๋๋ฅผ ์ฐ์ 0์ผ๋ก ๋ง๋ค์ด ์ฃผ์ด์ผ ํฉ๋๋ค.
19
+ ์ด ๋ ์ํผ์์๋ PyTorch ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํ์ฌ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋๋ ๋ฐฉ๋ฒ์ ๋ฐฐ์๋ด
๋๋ค.
20
+ PyTorch์ ๋ด์ฅ๋ ``CIFAR10`` ๋ฐ์ดํฐ์
์ ๋ํ์ฌ ์ ๊ฒฝ๋ง์ ํ๋ จ์ํค๋ ๊ณผ์ ์ ํตํด ์์๋ด
์๋ค.
21
+
22
+ ์ค์
30
23
-----
31
- Since we will be training data in this recipe, if you are in a runable
32
- notebook, it is best to switch the runtime to GPU or TPU.
33
- Before we begin, we need to install ``torch`` and ``torchvision`` if
34
- they arenโt already available.
24
+ ์ด ๋ ์ํผ์๋ ๋ฐ์ดํฐ๋ฅผ ํ์ต์ํค๋ ๋ด์ฉ์ด ํฌํจ๋์ด ์๊ธฐ ๋๋ฌธ์, ์คํ ๊ฐ๋ฅํ ๋
ธํธ๋ถ ํ์ผ์ด ์๋ค๋ฉด
25
+ ๋ฐํ์์ GPU ๋๋ TPU๋ก ์ ํํ๋ ๊ฒ์ด ์ข์ต๋๋ค. ์์ํ๊ธฐ์ ์์, ``torch`` ์
26
+ ``torchvision`` ํจํค์ง๊ฐ ์๋ค๋ฉด ์ค์นํฉ๋๋ค.
35
27
36
28
::
37
29
42
34
43
35
44
36
######################################################################
45
- # Steps
37
+ # ๋จ๊ณ( Steps)
46
38
# -----
47
39
#
48
- # Steps 1 through 4 set up our data and neural network for training. The
49
- # process of zeroing out the gradients happens in step 5. If you already
50
- # have your data and neural network built, skip to 5.
40
+ # 1๋จ๊ณ๋ถํฐ 4๋จ๊ณ๊น์ง๋ ํ์ต์ ์ํ ๋ฐ์ดํฐ์ ์ ๊ฒฝ๋ง์ ์ค๋นํฉ๋๋ค. 5๋จ๊ณ์์ ๋ณํ๋๋ฅผ 0์ผ๋ก
41
+ # ๋ง๋ค์ด ์ค๋๋ค. ์ด๋ฏธ ์ค๋นํ ๋ฐ์ดํฐ์ ์ ๊ฒฝ๋ง์ด ์๋ค๋ฉด, 5๋จ๊ณ๋ก ๊ฑด๋๋ฐ์ด๋ ์ข์ต๋๋ค.
51
42
#
52
- # 1. Import all necessary libraries for loading our data
53
- # 2. Load and normalize the dataset
54
- # 3. Build the neural network
55
- # 4. Define the loss function
56
- # 5. Zero the gradients while training the network
43
+ # 1. ๋ฐ์ดํฐ๋ฅผ ๋ถ๋ฌ์ค๊ธฐ ์ํด ํ์ํ ๋ชจ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ import ํ๊ธฐ
44
+ # 2. ๋ฐ์ดํฐ์
๋ถ๋ฌ์ค๊ณ ์ ๊ทํํ๊ธฐ
45
+ # 3. ์ ๊ฒฝ๋ง ๊ตฌ์ถํ๊ธฐ
46
+ # 4. ์์ค ํจ์ ์ ์ํ๊ธฐ
47
+ # 5. ์ ๊ฒฝ๋ง์ ํ์ต์ํฌ ๋ ๋ณํ๋ 0์ผ๋ก ๋ง๋ค๊ธฐ
57
48
#
58
- # 1. Import necessary libraries for loading our data
49
+ # 1. ๋ฐ์ดํฐ๋ฅผ ๋ถ๋ฌ์ค๊ธฐ ์ํด ํ์ํ ๋ชจ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ import ํ๊ธฐ
59
50
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60
51
#
61
- # For this recipe, we will just be using ``torch`` and ``torchvision`` to
62
- # access the dataset.
52
+ # ์ด ๋ ์ํผ์์๋ ๋ฐ์ดํฐ์
์ ์ ๊ทผํ๊ธฐ ์ํด ``torch`` ์ ``torchvision`` ์ ์ฌ์ฉํฉ๋๋ค.
63
53
#
64
54
65
55
import torch
74
64
75
65
76
66
######################################################################
77
- # 2. Load and normalize the dataset
67
+ # 2. ๋ฐ์ดํฐ์
๋ถ๋ฌ์ค๊ณ ์ ๊ทํํ๊ธฐ
78
68
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
79
69
#
80
- # PyTorch features various built-in datasets (see the Loading Data recipe
81
- # for more information).
70
+ # PyTorch ๋ ๋ค์ํ ๋ด์ฅ ๋ฐ์ดํฐ์
์ ์ ๊ณตํฉ๋๋ค. ( Loading Data ๋ ์ํผ๋ฅผ ์ฐธ๊ณ ํด
71
+ # ๋ ๋ง์ ์ ๋ณด๋ฅผ ์ป์ ์ ์์ต๋๋ค.)
82
72
#
83
73
84
74
transform = transforms .Compose (
100
90
101
91
102
92
######################################################################
103
- # 3. Build the neural network
93
+ # 3. ์ ๊ฒฝ๋ง ๊ตฌ์ถํ๊ธฐ
104
94
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
105
95
#
106
- # We will use a convolutional neural network. To learn more see the
107
- # Defining a Neural Network recipe .
96
+ # ์ปจ๋ณผ๋ฃจ์
์ ๊ฒฝ๋ง์ ์ ์ํ๊ฒ ์ต๋๋ค. ์์ธํ ๋ด์ฉ์ Defining a Neural Network ๋ ์ํผ๋ฅผ
97
+ # ์ฐธ์กฐํด์ฃผ์ธ์ .
108
98
#
109
99
110
100
class Net (nn .Module ):
@@ -128,10 +118,10 @@ def forward(self, x):
128
118
129
119
130
120
######################################################################
131
- # 4. Define a Loss function and optimizer
121
+ # 4. ์์ค ํจ์๊ณผ ์ตํฐ๋ง์ด์ ์ ์ํ๊ธฐ
132
122
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
133
123
#
134
- # Letโs use a Classification Cross-Entropy loss and SGD with momentum .
124
+ # ๋ถ๋ฅ๋ฅผ ์ํ Cross-Entropy ์์ค ํจ์์ ๋ชจ๋ฉํ
์ ์ค์ ํ SGD ์ตํฐ๋ง์ด์ ๋ฅผ ์ฌ์ฉํฉ๋๋ค .
135
125
#
136
126
137
127
net = Net ()
@@ -140,36 +130,36 @@ def forward(self, x):
140
130
141
131
142
132
######################################################################
143
- # 5. Zero the gradients while training the network
133
+ # 5. ์ ๊ฒฝ๋ง์ ํ์ต์ํค๋ ๋์ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค๊ธฐ
144
134
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145
135
#
146
- # This is when things start to get interesting. We simply have to loop
147
- # over our data iterator, and feed the inputs to the network and optimize.
136
+ # ์ด์ ๋ถํฐ ํฅ๋ฏธ๋ก์ด ๋ถ๋ถ์ ์ดํด๋ณด๋ ค๊ณ ํฉ๋๋ค.
137
+ # ์ฌ๊ธฐ์ ํ ์ผ์ ๋ฐ์ดํฐ ์ดํฐ๋ ์ดํฐ๋ฅผ ์ํํ๋ฉด์, ์ ๊ฒฝ๋ง์ ์
๋ ฅ์ ์ฃผ๊ณ
138
+ # ์ต์ ํํ๋ ๊ฒ์
๋๋ค.
148
139
#
149
- # Notice that for each entity of data, we zero out the gradients. This is
150
- # to ensure that we arenโt tracking any unnecessary information when we
151
- # train our neural network.
140
+ # ๋ฐ์ดํฐ์ ์ํฐํฐ ๊ฐ๊ฐ์ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค์ด์ฃผ๋ ๊ฒ์ ์ ์ํ์ญ์์ค.
141
+ # ์ ๊ฒฝ๋ง์ ํ์ต์ํฌ ๋ ๋ถํ์ํ ์ ๋ณด๋ฅผ ์ถ์ ํ์ง ์๋๋ก ํ๊ธฐ ์ํจ์
๋๋ค.
152
142
#
153
143
154
- for epoch in range (2 ): # loop over the dataset multiple times
144
+ for epoch in range (2 ): # ์ ์ฒด ๋ฐ์ดํฐ์
์ ์ฌ๋ฌ๋ฒ ๋ฐ๋ณตํ๊ธฐ
155
145
156
146
running_loss = 0.0
157
147
for i , data in enumerate (trainloader , 0 ):
158
- # get the inputs; data is a list of [inputs, labels]
148
+ # ์
๋ ฅ ๋ฐ๊ธฐ: ๋ฐ์ดํฐ๋ [inputs, labels] ํํ์ ๋ฆฌ์คํธ
159
149
inputs , labels = data
160
150
161
- # zero the parameter gradients
151
+ # ํ๋ผ๋ฏธํฐ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค๊ธฐ
162
152
optimizer .zero_grad ()
163
153
164
- # forward + backward + optimize
154
+ # ์์ ํ + ์ญ์ ํ + ์ต์ ํ
165
155
outputs = net (inputs )
166
156
loss = criterion (outputs , labels )
167
157
loss .backward ()
168
158
optimizer .step ()
169
159
170
- # print statistics
160
+ # ํต๊ณ ์ถ๋ ฅ
171
161
running_loss += loss .item ()
172
- if i % 2000 == 1999 : # print every 2000 mini-batches
162
+ if i % 2000 == 1999 : # ๋ฏธ๋๋ฐฐ์น 2000๊ฐ ๋ง๋ค ์ถ๋ ฅ
173
163
print ('[%d, %5d] loss: %.3f' %
174
164
(epoch + 1 , i + 1 , running_loss / 2000 ))
175
165
running_loss = 0.0
@@ -178,16 +168,16 @@ def forward(self, x):
178
168
179
169
180
170
######################################################################
181
- # You can also use ``model.zero_grad()``. This is the same as using
182
- # ``optimizer.zero_grad()`` as long as all your model parameters are in
183
- # that optimizer. Use your best judgement to decide which one to use.
171
+ # ``model.zero_grad()`` ๋ฅผ ์ฌ์ฉํด๋ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค ์ ์์ต๋๋ค.
172
+ # ์ด๋ ์ตํฐ๋ง์ด์ ์ ๋ชจ๋ ๋ชจ๋ธ ํ๋ผ๋ฏธํฐ๊ฐ ํฌํจ๋๋ ํ ``optimizer.zero_grad()`` ๋ฅผ
173
+ # ์ฌ์ฉํ๋ ๊ฒ๊ณผ ๋์ผํฉ๋๋ค. ์ด๋ค ๊ฒ์ ์ฌ์ฉํ ๊ฒ์ธ์ง ์ต์ ์ ์ ํ์ ํ๊ธฐ ๋ฐ๋๋๋ค.
184
174
#
185
- # Congratulations! You have successfully zeroed out gradients PyTorch.
175
+ # ์ถํํฉ๋๋ค! ์ด์ PyTorch์์ ๋ณํ๋๋ฅผ 0์ผ๋ก ๋ง๋ค ์ ์์ต๋๋ค.
186
176
#
187
- # Learn More
177
+ # ๋ ์์๋ณด๊ธฐ
188
178
# ----------
189
179
#
190
- # Take a look at these other recipes to continue your learning :
180
+ # ๋ค๋ฅธ ๋ ์ํผ๋ฅผ ๋๋ฌ๋ณด๊ณ ๊ณ์ ๋ฐฐ์๋ณด์ธ์ :
191
181
#
192
- # - `Loading data in PyTorch <https://pytorch.org/tutorials/ recipes/recipes/loading_data_recipe.html>`__
193
- # - `Saving and loading models across devices in PyTorch <https://pytorch.org/tutorials/ recipes/recipes/save_load_across_devices.html>`__
182
+ # - :doc:`/ recipes/recipes/loading_data_recipe`
183
+ # - :doc:`/ recipes/recipes/save_load_across_devices`
0 commit comments