forked from PyTorchKorea/tutorials-kr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmodelsyt_tutorial.py
415 lines (357 loc) ยท 17.4 KB
/
modelsyt_tutorial.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
"""
`Introduction <introyt1_tutorial.html>`_ ||
`Tensors <tensors_deeper_tutorial.html>`_ ||
`Autograd <autogradyt_tutorial.html>`_ ||
**Building Models** ||
`TensorBoard Support <tensorboardyt_tutorial.html>`_ ||
`Training Models <trainingyt.html>`_ ||
`Model Understanding <captumyt.html>`_
ํ์ดํ ์น๋ก ๋ชจ๋ธ ๋ง๋ค๊ธฐ
============================
์๋ ๋น๋์ค๋ฅผ ๋ฐ๋ผ ํ๊ฑฐ๋ ๋๋ `youtube <https://www.youtube.com/watch?v=OSqIP-mOWOI>`__์์ ํ์ธํ์ธ์.
.. raw:: html
<div style="margin-top:10px; margin-bottom:10px;">
<iframe width="560" height="315" src="https://www.youtube.com/embed/OSqIP-mOWOI" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
``torch.nn.Module`` and ``torch.nn.Parameter``
----------------------------------------------
In this video, weโll be discussing some of the tools PyTorch makes
available for building deep learning networks.
Except for ``Parameter``, the classes we discuss in this video are all
subclasses of ``torch.nn.Module``. This is the PyTorch base class meant
to encapsulate behaviors specific to PyTorch Models and their
components.
One important behavior of ``torch.nn.Module`` is registering parameters.
If a particular ``Module`` subclass has learning weights, these weights
are expressed as instances of ``torch.nn.Parameter``. The ``Parameter``
class is a subclass of ``torch.Tensor``, with the special behavior that
when they are assigned as attributes of a ``Module``, they are added to
the list of that modules parameters. These parameters may be accessed
through the ``parameters()`` method on the ``Module`` class.
As a simple example, hereโs a very simple model with two linear layers
and an activation function. Weโll create an instance of it and ask it to
report on its parameters:
"""
import torch
class TinyModel(torch.nn.Module):
def __init__(self):
super(TinyModel, self).__init__()
self.linear1 = torch.nn.Linear(100, 200)
self.activation = torch.nn.ReLU()
self.linear2 = torch.nn.Linear(200, 10)
self.softmax = torch.nn.Softmax()
def forward(self, x):
x = self.linear1(x)
x = self.activation(x)
x = self.linear2(x)
x = self.softmax(x)
return x
tinymodel = TinyModel()
print('The model:')
print(tinymodel)
print('\n\nJust one layer:')
print(tinymodel.linear2)
print('\n\nModel params:')
for param in tinymodel.parameters():
print(param)
print('\n\nLayer params:')
for param in tinymodel.linear2.parameters():
print(param)
#########################################################################
# This shows the fundamental structure of a PyTorch model: there is an
# ``__init__()`` method that defines the layers and other components of a
# model, and a ``forward()`` method where the computation gets done. Note
# that we can print the model, or any of its submodules, to learn about
# its structure.
#
# Common Layer Types
# ------------------
#
# Linear Layers
# ~~~~~~~~~~~~~
#
# The most basic type of neural network layer is a *linear* or *fully
# connected* layer. This is a layer where every input influences every
# output of the layer to a degree specified by the layerโs weights. If a
# model has *m* inputs and *n* outputs, the weights will be an *m* x *n*
# matrix. For example:
#
lin = torch.nn.Linear(3, 2)
x = torch.rand(1, 3)
print('Input:')
print(x)
print('\n\nWeight and Bias parameters:')
for param in lin.parameters():
print(param)
y = lin(x)
print('\n\nOutput:')
print(y)
#########################################################################
# If you do the matrix multiplication of ``x`` by the linear layerโs
# weights, and add the biases, youโll find that you get the output vector
# ``y``.
#
# One other important feature to note: When we checked the weights of our
# layer with ``lin.weight``, it reported itself as a ``Parameter`` (which
# is a subclass of ``Tensor``), and let us know that itโs tracking
# gradients with autograd. This is a default behavior for ``Parameter``
# that differs from ``Tensor``.
#
# Linear layers are used widely in deep learning models. One of the most
# common places youโll see them is in classifier models, which will
# usually have one or more linear layers at the end, where the last layer
# will have *n* outputs, where *n* is the number of classes the classifier
# addresses.
#
# Convolutional Layers
# ~~~~~~~~~~~~~~~~~~~~
#
# *Convolutional* layers are built to handle data with a high degree of
# spatial correlation. They are very commonly used in computer vision,
# where they detect close groupings of features which the compose into
# higher-level features. They pop up in other contexts too - for example,
# in NLP applications, where a wordโs immediate context (that is, the
# other words nearby in the sequence) can affect the meaning of a
# sentence.
#
# We saw convolutional layers in action in LeNet5 in an earlier video:
#
import torch.functional as F
class LeNet(torch.nn.Module):
def __init__(self):
super(LeNet, self).__init__()
# 1 input image channel (black & white), 6 output channels, 5x5 square convolution
# kernel
self.conv1 = torch.nn.Conv2d(1, 6, 5)
self.conv2 = torch.nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = torch.nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
self.fc2 = torch.nn.Linear(120, 84)
self.fc3 = torch.nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
##########################################################################
# Letโs break down whatโs happening in the convolutional layers of this
# model. Starting with ``conv1``:
#
# - LeNet5 is meant to take in a 1x32x32 black & white image. **The first
# argument to a convolutional layerโs constructor is the number of
# input channels.** Here, it is 1. If we were building this model to
# look at 3-color channels, it would be 3.
# - A convolutional layer is like a window that scans over the image,
# looking for a pattern it recognizes. These patterns are called
# *features,* and one of the parameters of a convolutional layer is the
# number of features we would like it to learn. **This is the second
# argument to the constructor is the number of output features.** Here,
# weโre asking our layer to learn 6 features.
# - Just above, I likened the convolutional layer to a window - but how
# big is the window? **The third argument is the window or kernel
# size.** Here, the โ5โ means weโve chosen a 5x5 kernel. (If you want a
# kernel with height different from width, you can specify a tuple for
# this argument - e.g., ``(3, 5)`` to get a 3x5 convolution kernel.)
#
# The output of a convolutional layer is an *activation map* - a spatial
# representation of the presence of features in the input tensor.
# ``conv1`` will give us an output tensor of 6x28x28; 6 is the number of
# features, and 28 is the height and width of our map. (The 28 comes from
# the fact that when scanning a 5-pixel window over a 32-pixel row, there
# are only 28 valid positions.)
#
# We then pass the output of the convolution through a ReLU activation
# function (more on activation functions later), then through a max
# pooling layer. The max pooling layer takes features near each other in
# the activation map and groups them together. It does this by reducing
# the tensor, merging every 2x2 group of cells in the output into a single
# cell, and assigning that cell the maximum value of the 4 cells that went
# into it. This gives us a lower-resolution version of the activation map,
# with dimensions 6x14x14.
#
# Our next convolutional layer, ``conv2``, expects 6 input channels
# (corresponding to the 6 features sought by the first layer), has 16
# output channels, and a 3x3 kernel. It puts out a 16x12x12 activation
# map, which is again reduced by a max pooling layer to 16x6x6. Prior to
# passing this output to the linear layers, it is reshaped to a 16 \* 6 \*
# 6 = 576-element vector for consumption by the next layer.
#
# There are convolutional layers for addressing 1D, 2D, and 3D tensors.
# There are also many more optional arguments for a conv layer
# constructor, including stride length(e.g., only scanning every second or
# every third position) in the input, padding (so you can scan out to the
# edges of the input), and more. See the
# `documentation <https://pytorch.org/docs/stable/nn.html#convolution-layers>`__
# for more information.
#
# Recurrent Layers
# ~~~~~~~~~~~~~~~~
#
# *Recurrent neural networks* (or *RNNs)* are used for sequential data -
# anything from time-series measurements from a scientific instrument to
# natural language sentences to DNA nucleotides. An RNN does this by
# maintaining a *hidden state* that acts as a sort of memory for what it
# has seen in the sequence so far.
#
# The internal structure of an RNN layer - or its variants, the LSTM (long
# short-term memory) and GRU (gated recurrent unit) - is moderately
# complex and beyond the scope of this video, but weโll show you what one
# looks like in action with an LSTM-based part-of-speech tagger (a type of
# classifier that tells you if a word is a noun, verb, etc.):
#
class LSTMTagger(torch.nn.Module):
def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
super(LSTMTagger, self).__init__()
self.hidden_dim = hidden_dim
self.word_embeddings = torch.nn.Embedding(vocab_size, embedding_dim)
# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = torch.nn.LSTM(embedding_dim, hidden_dim)
# The linear layer that maps from hidden state space to tag space
self.hidden2tag = torch.nn.Linear(hidden_dim, tagset_size)
def forward(self, sentence):
embeds = self.word_embeddings(sentence)
lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
########################################################################
# The constructor has four arguments:
#
# - ``vocab_size`` is the number of words in the input vocabulary. Each
# word is a one-hot vector (or unit vector) in a
# ``vocab_size``-dimensional space.
# - ``tagset_size`` is the number of tags in the output set.
# - ``embedding_dim`` is the size of the *embedding* space for the
# vocabulary. An embedding maps a vocabulary onto a low-dimensional
# space, where words with similar meanings are close together in the
# space.
# - ``hidden_dim`` is the size of the LSTMโs memory.
#
# The input will be a sentence with the words represented as indices of
# one-hot vectors. The embedding layer will then map these down to an
# ``embedding_dim``-dimensional space. The LSTM takes this sequence of
# embeddings and iterates over it, fielding an output vector of length
# ``hidden_dim``. The final linear layer acts as a classifier; applying
# ``log_softmax()`` to the output of the final layer converts the output
# into a normalized set of estimated probabilities that a given word maps
# to a given tag.
#
# If youโd like to see this network in action, check out the `Sequence
# Models and LSTM
# Networks <https://tutorials.pytorch.kr/beginner/nlp/sequence_models_tutorial.html>`__
# tutorial on pytorch.org.
#
# Transformers
# ~~~~~~~~~~~~
#
# *Transformers* are multi-purpose networks that have taken over the state
# of the art in NLP with models like BERT. A discussion of transformer
# architecture is beyond the scope of this video, but PyTorch has a
# ``Transformer`` class that allows you to define the overall parameters
# of a transformer model - the number of attention heads, the number of
# encoder & decoder layers, dropout and activation functions, etc. (You
# can even build the BERT model from this single class, with the right
# parameters!) The ``torch.nn.Transformer`` class also has classes to
# encapsulate the individual components (``TransformerEncoder``,
# ``TransformerDecoder``) and subcomponents (``TransformerEncoderLayer``,
# ``TransformerDecoderLayer``). For details, check out the
# `documentation <https://pytorch.org/docs/stable/nn.html#transformer-layers>`__
# on transformer classes, and the relevant
# `tutorial <https://tutorials.pytorch.kr/beginner/transformer_tutorial.html>`__
# on pytorch.org.
#
# Other Layers and Functions
# --------------------------
#
# Data Manipulation Layers
# ~~~~~~~~~~~~~~~~~~~~~~~~
#
# There are other layer types that perform important functions in models,
# but donโt participate in the learning process themselves.
#
# **Max pooling** (and its twin, min pooling) reduce a tensor by combining
# cells, and assigning the maximum value of the input cells to the output
# cell (we saw this). For example:
#
my_tensor = torch.rand(1, 6, 6)
print(my_tensor)
maxpool_layer = torch.nn.MaxPool2d(3)
print(maxpool_layer(my_tensor))
#########################################################################
# ์์ ๊ฐ์ ์์ธํ ๋ณด๋ฉด, ๋งฅ์คํ๋ง๋ ์ถ๋ ฅ์ ๊ฐ ๊ฐ์ด 6x6 ์
๋ ฅ์ ๊ฐ ์ฌ๋ถ๋ฉด์์
# ์ต๋๊ฐ์ด๋ผ๋ ๊ฒ์ ์ ์ ์์ต๋๋ค.
#
# **์ ๊ทํ ๋ ์ด์ด**๋ ํ ๋ ์ด์ด์ ์ถ๋ ฅ์ ๋ค๋ฅธ ๋ ์ด์ด์ ์ ๋ฌํ๊ธฐ ์ ์ ๋ค์ ์ค์ฌํํ๊ณ
# ์ ๊ทํํฉ๋๋ค. ์ค๊ฐ ํ
์๋ฅผ ์ค์ฌํํ๊ณ ์ค์ผ์ผ๋งํ๋ ๊ฒ์ ๊ธฐ์ธ๊ธฐ ํญ๋ฐ/์์ค ์์ด ๋
# ๋์ ํ์ต๋ฅ ์ ์ฌ์ฉํ ์ ์๊ฒ ํ๋ ๋ฑ ์ฌ๋ฌ ๊ฐ์ง ์ ์ตํ ํจ๊ณผ๋ฅผ ์ ๊ณตํฉ๋๋ค.
#
my_tensor = torch.rand(1, 4, 4) * 20 + 5
print(my_tensor)
print(my_tensor.mean())
norm_layer = torch.nn.BatchNorm1d(4)
normed_tensor = norm_layer(my_tensor)
print(normed_tensor)
print(normed_tensor.mean())
##########################################################################
# ์์ ์
์ ์คํํ๋ฉด ์
๋ ฅ ํ
์์ ํฐ ์ค์ผ์ผ๋ง ์์์ ์คํ์
์ ์ถ๊ฐํ์ต๋๋ค.
# ์
๋ ฅ ํ
์์ ``mean()`` ๊ฐ์ด ์ฝ 15์ ๊ฐ๊น์ด ๊ฒ์ ๋ณผ ์ ์์ต๋๋ค.
# ์ด๋ฅผ ์ ๊ทํ ๋ ์ด์ด๋ฅผ ํตํด ์คํํ๋ฉด ๊ฐ๋ค์ด ๋ ์์์ง๊ณ 0 ์ฃผ์๋ก ๊ทธ๋ฃนํ๋ฉ๋๋ค.
# ์ค์ ๋ก ํ๊ท ์ ๋งค์ฐ ์์์ผ ํฉ๋๋ค (> 1e-8).
#
# ์ด๋ ์ ์ตํ๋ฐ, ์๋ํ๋ฉด ๋ง์ ํ์ฑํ ํจ์๋ค(์๋์์ ๋
ผ์)์ 0 ๊ทผ์ฒ์์
# ๊ฐ์ฅ ๊ฐํ ๊ธฐ์ธ๊ธฐ๋ฅผ ๊ฐ์ง๋ง, ๋๋๋ก ์
๋ ฅ์ด 0์์ ๋ฉ๋ฆฌ ๋จ์ด์ง๊ฒ ํ๋ ๊ฒฝ์ฐ
# ๊ธฐ์ธ๊ธฐ ์์ค ๋๋ ํญ๋ฐ ๋ฌธ์ ๊ฐ ๋ฐ์ํ ์ ์๊ธฐ ๋๋ฌธ์
๋๋ค.
# ๋ฐ์ดํฐ๋ฅผ ๊ฐ์ฅ ๊ฐํ๋ฅธ ๊ธฐ์ธ๊ธฐ ์ฃผ๋ณ์ ์ ์งํ๋ฉด ์ผ๋ฐ์ ์ผ๋ก ๋ ๋น ๋ฅด๊ณ ,
# ๋ ๋์ ํ์ต๊ณผ ๋ ๋์ ํ์ต๋ฅ ์ด ๊ฐ๋ฅํฉ๋๋ค.
#
# **๋๋กญ์์ ๋ ์ด์ด**๋ ๋ชจ๋ธ ๋ด์์ *ํฌ์ ํํ*์ ์ฅ๋ คํ๊ธฐ ์ํ ๋๊ตฌ์
๋๋ค.
# ์ฆ, ๋ ์ ์ ๋ฐ์ดํฐ๋ก ์ถ๋ก ์ ์ํํ๋๋ก ๋ชจ๋ธ์ ํธ์ํ๋ ๊ฒ์
๋๋ค.
#
# ๋๋กญ์์ ๋ ์ด์ด๋ ํ์ต ์ค์ ์
๋ ฅ ํ
์์ ์ผ๋ถ๋ฅผ ๋ฌด์์๋ก ์ค์ ํ์ฌ ์๋ํฉ๋๋ค
# - ๋๋กญ์์ ๋ ์ด์ด๋ ํญ์ ์ถ๋ก ์์๋ ๊บผ์ ธ ์์ต๋๋ค.
# ์ด๋ ๋ชจ๋ธ์ด ์ด ๋ง์คํน๋๊ฑฐ๋ ์ถ์๋ ๋ฐ์ดํฐ์
์ ํ์ตํ๋๋ก ๊ฐ์ ํฉ๋๋ค.
# ์๋ฅผ ๋ค์ด:
#
my_tensor = torch.rand(1, 4, 4)
dropout = torch.nn.Dropout(p=0.4)
print(dropout(my_tensor))
print(dropout(my_tensor))
##########################################################################
# ์์์ ๋๋กญ์์์ด ์ํ ํ
์์ ๋ฏธ์น๋ ํจ๊ณผ๋ฅผ ๋ณผ ์ ์์ต๋๋ค. ๊ฐ๋ณ ๊ฐ์ค์น๊ฐ
# ๋๋กญ์์๋ ํ๋ฅ ์ ์ค์ ํ๊ธฐ ์ํด ์ ํ์ ์ผ๋ก `p` ์ธ์๋ฅผ ์ฌ์ฉํ ์ ์์ผ๋ฉฐ,
# ์ค์ ํ์ง ์์ผ๋ฉด ๊ธฐ๋ณธ๊ฐ์ 0.5์
๋๋ค.
#
# ํ์ฑํ ํจ์
# ~~~~~~~~~~~~~~~~~~~~
#
# ํ์ฑํ ํจ์๋ ๋ฅ๋ฌ๋์ ๊ฐ๋ฅํ๊ฒ ๋ง๋ญ๋๋ค. ์ ๊ฒฝ๋ง์ ์ฌ์ค ๋ง์ ํ๋ผ๋ฏธํฐ๋ฅผ
# ๊ฐ์ง *์ํ์ ํจ์๋ฅผ ์๋ฎฌ๋ ์ด์
*ํ๋ ํ๋ก๊ทธ๋จ์
๋๋ค. ๋ง์ฝ ์ฐ๋ฆฌ๊ฐ ํ
์๋ฅผ
# ๋ ์ด์ด ๊ฐ์ค์น๋ก ๋ฐ๋ณต์ ์ผ๋ก ๊ณฑํ๊ธฐ๋ง ํ๋ค๋ฉด, *์ ํ ํจ์*๋ง์ ์๋ฎฌ๋ ์ด์
ํ
# ์ ์์ ๋ฟ์
๋๋ค. ๊ฒ๋ค๊ฐ, ๋ชจ๋ ๋ ์ด์ด๋ฅผ ํ๋์ ํ๋ ฌ ๊ณฑ์
์ผ๋ก ์ถ์ํ ์
# ์๊ธฐ ๋๋ฌธ์ ์ฌ๋ฌ ๋ ์ด์ด๋ฅผ ๊ฐ์ง ํ์๊ฐ ์์ ๊ฒ์
๋๋ค. ๋ ์ด์ด ์ฌ์ด์
# *๋น์ ํ* ํ์ฑํ ํจ์๋ฅผ ์ฝ์
ํ๋ ๊ฒ์ด ๋ฅ๋ฌ๋ ๋ชจ๋ธ์ด ๋จ์ํ ์ ํ ํจ์๊ฐ
# ์๋ ์ด๋ค ํจ์๋ ์๋ฎฌ๋ ์ด์
ํ ์ ์๊ฒ ํ๋ ์์์
๋๋ค.
#
# `torch.nn.Module`์ ReLU ๋ฐ ๊ทธ ๋ณํ๋ค, Tanh, Hardtanh, sigmoid ๋ฑ์
# ์ฃผ์ ํ์ฑํ ํจ์๋ฅผ ์บก์ํํ ๊ฐ์ฒด๋ฅผ ํฌํจํ๊ณ ์์ต๋๋ค. ๋ํ, ๋ชจ๋ธ์ ์ถ๋ ฅ
# ๋จ๊ณ์์ ๊ฐ์ฅ ์ ์ฉํ Softmax์ ๊ฐ์ ๋ค๋ฅธ ํจ์๋ค๋ ํฌํจํ๊ณ ์์ต๋๋ค.
#
# ์์ค ํจ์
# ~~~~~~~~~~~~~~
#
# ์์ค ํจ์๋ ๋ชจ๋ธ์ ์์ธก์ด ์ ๋ต๊ณผ ์ผ๋ง๋ ์ฐจ์ด๊ฐ ๋๋์ง๋ฅผ ์๋ ค์ค๋๋ค.
# PyTorch์๋ ์ผ๋ฐ์ ์ธ MSE (ํ๊ท ์ ๊ณฑ ์ค์ฐจ = L2 ๋
ธ๋ฆ), ๊ต์ฐจ ์ํธ๋กํผ
# ์์ค, ๊ทธ๋ฆฌ๊ณ ๋ถ๋ฅ๊ธฐ์ ์ ์ฉํ ์์ ๊ฐ๋ฅ๋ ์์ค ๋ฑ ๋ค์ํ ์์ค ํจ์๊ฐ ํฌํจ๋์ด ์์ต๋๋ค.
#