Skip to content

Commit ac45e8c

Browse files
translate swap_tensors.py
Co-authored-by: kookyungseon <[email protected] Co-authored-by: dlcodns <[email protected]>
1 parent b306ec1 commit ac45e8c

File tree

1 file changed

+91
-91
lines changed

1 file changed

+91
-91
lines changed
+91-91
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
11
"""
2-
Extension points in ``nn.Module`` for ``load_state_dict`` and tensor subclasses
2+
nn.Moduleμ—μ„œ ``load_state_dict`` 및 ν…μ„œ μ„œλΈŒν΄λž˜μŠ€μ˜ ν™•μž₯ 포인트
33
===============================================================================
4-
**Author:** `Mikayla Gawarecki <https://github.com/mikaylagawarecki>`_
4+
**μ €μž:** `Mikayla Gawarecki <https://github.com/mikaylagawarecki>`_
55
6-
This recipe introduces a new utility function ``torch.utils.swap_tensors``
7-
as well as two new extension points where it has been integrated in
6+
이 λ ˆμ‹œν”ΌλŠ” μƒˆλ‘œμš΄ μœ ν‹Έλ¦¬ν‹° ν•¨μˆ˜ ``torch.utils.swap_tensors``
7+
뿐만 μ•„λ‹ˆλΌ 이λ₯Ό ν†΅ν•©ν•œ 두 가지 μƒˆλ‘œμš΄ ν™•μž₯ 지점을 μ†Œκ°œν•©λ‹ˆλ‹€
88
``nn.Module``:
99
10-
* ``nn.Module.to()`` and related methods
10+
* ``nn.Module.to()`` 및 κ΄€λ ¨ λ©”μ„œλ“œ
1111
* ``nn.Module.load_state_dict()``
1212
13-
.. note::
14-
This recipe requires PyTorch 2.3.0 or later.
13+
.. 주의::
14+
이 λ ˆμ‹œν”ΌλŠ” PyTorch 2.3.0 이상이 ν•„μš”ν•©λ‹ˆλ‹€.
1515
"""
1616

1717
###############################################################################
1818
# ``torch.utils.swap_tensors``
1919
# ----------------------------
20-
# ``torch.utils.swap_tensors`` (hereafter referred to as ``swap_tensors``) is a
21-
# utility function that takes in two Python tensors and swaps them.
20+
# ``torch.utils.swap_tensors`` (μ΄ν•˜ ``swap_tensors``둜 언급됨)은
21+
# 두 개의 파이썬 ν…μ„œλ₯Ό μž…λ ₯λ°›μ•„ μ„œλ‘œ κ΅ν™˜ν•˜λŠ” μœ ν‹Έλ¦¬ν‹° ν•¨μˆ˜μž…λ‹ˆλ‹€.
2222

2323
import torch
2424
import torch.nn as nn
@@ -29,19 +29,19 @@
2929
print(f"After swapping, t1: {t1}, t2: {t2}")
3030

3131
################################################################################
32-
# More specifically, ``swap_tensors`` swaps the Python ``__class__``, ``__dict__``
33-
# and ``__slots__`` of the two tensors, as well as their associated ``at::Tensor``.
32+
# 더 ꡬ체적으둜, ``swap_tensors``λŠ” 두 ν…μ„œμ˜ 파이썬 ``__class__``, ``__dict__``와
33+
# ``__slots__``뿐만 μ•„λ‹ˆλΌ κ΄€λ ¨λœ ``at::Tensor``도 κ΅ν™˜ν•©λ‹ˆλ‹€.
3434
#
3535
#
36-
# Application to ``nn.Module``
36+
# ``nn.Module``μ—μ˜ 적용
3737
# ----------------------------
38-
# This utility is pertinent to ``nn.Module`` when a Python object outside
39-
# of the module holds a reference to parameters of the module. If an ``nn.Module``
40-
# modifies any of its parameters out of place, the object holding references to
41-
# the parameters will not see the change. A classic example of this is the
42-
# optimizer, which holds a reference to the parameters of the ``nn.Module``.
43-
# This leads to a silent correctness issue where the ``optimizer.step()`` will
44-
# run without error but the weights of the ``nn.Module`` will not be updated.
38+
# 이 μœ ν‹Έλ¦¬ν‹°λŠ” λͺ¨λ“ˆ μ™ΈλΆ€μ˜ 파이썬 객체가 λͺ¨λ“ˆμ˜ νŒŒλΌλ―Έν„°μ— λŒ€ν•œ
39+
# μ°Έμ‘°λ₯Ό λ³΄μœ ν•˜κ³  μžˆμ„ λ•Œ ``nn.Module``에 관련이 μžˆμŠ΅λ‹ˆλ‹€. λ§Œμ•½ ``nn.Module``
40+
# 이 νŒŒλΌλ―Έν„°λ₯Ό μ œμžλ¦¬μ— μˆ˜μ •ν•˜λ©΄, νŒŒλΌλ―Έν„°μ— λŒ€ν•œ μ°Έμ‘°λ₯Ό λ³΄μœ ν•œ κ°μ²΄λŠ”
41+
# λ³€κ²½ 사항을 λ³Ό 수 μ—†μŠ΅λ‹ˆλ‹€. 고전적인 μ˜ˆλ‘œλŠ” ``nn.Module``의 νŒŒλΌλ―Έν„°μ— λŒ€ν•œ
42+
# μ°Έμ‘°λ₯Ό λ³΄μœ ν•˜λŠ” μ˜΅ν‹°λ§ˆμ΄μ €κ°€ μžˆμŠ΅λ‹ˆλ‹€. 이둜 인해 ``optimizer.step()``이
43+
# 였λ₯˜ 없이 μ‹€ν–‰λ˜μ§€λ§Œ, ``nn.Module``의 κ°€μ€‘μΉ˜λŠ” μ—…λ°μ΄νŠΈλ˜μ§€ μ•ŠλŠ”
44+
# λ¬΄μ„±μ˜ μ •ν™•μ„± 문제λ₯Ό μ΄ˆλž˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
4545

4646
mod = torch.nn.Linear(1, 2, bias=False)
4747
optimizer = torch.optim.SGD(mod.parameters())
@@ -52,28 +52,28 @@
5252
print(f"weight in optimizer: {optimizer.param_groups[0]['params']}")
5353

5454
################################################################################
55-
# ``nn.Module.to()`` and related methods
55+
# ``nn.Module.to()`` 및 κ΄€λ ¨ λ©”μ„œλ“œ
5656
# --------------------------------------
57-
# This includes methods that change the device of the module (such as ``nn.Module.cpu()``),
58-
# methods that change the ``dtype`` of the module (such as ``nn.Module.float()``)
59-
# as well as methods that allow the module to be materialized
60-
# (such as ``nn.Module.to_empty()``).
57+
# μ—¬κΈ°μ—λŠ” λͺ¨λ“ˆμ˜ λ””λ°”μ΄μŠ€λ₯Ό λ³€κ²½ν•˜λŠ” λ©”μ„œλ“œ(예: ``nn.Module.cpu()``),
58+
# λͺ¨λ“ˆμ˜ ``dtype``을 λ³€κ²½ν•˜λŠ” λ©”μ„œλ“œ(예: ``nn.Module.float()``)
59+
# 뿐만 μ•„λ‹ˆλΌ λͺ¨λ“ˆμ„ ꡬ체화할 수 있게 ν•΄μ£ΌλŠ” λ©”μ„œλ“œ
60+
# (예: ``nn.Module.to_empty()``)κ°€ ν¬ν•¨λ©λ‹ˆλ‹€.
6161
#
62-
# At first glance, it might be non-intuitive that these methods are able to
63-
# modify the parameters of the module in-place. The existing approach has been
64-
# to use a nasty hack dating back from the first days of PyTorch.
62+
# μ²˜μŒμ—λŠ” μ΄λŸ¬ν•œ λ©”μ„œλ“œκ°€ λͺ¨λ“ˆμ˜ νŒŒλΌλ―Έν„°λ₯Ό μ œμžλ¦¬μ—μ„œ μˆ˜μ •ν•  수 μžˆλ‹€λŠ” 것이
63+
# 직관적이지 μ•Šμ„ 수 μžˆμŠ΅λ‹ˆλ‹€. 기쑴의 μ ‘κ·Ό 방식은 PyTorch μ΄ˆκΈ°λΆ€ν„° μ‚¬μš©λœ
64+
# λ³΅μž‘ν•œ ν•΄ν‚Ή 방법을 μ‚¬μš©ν–ˆμŠ΅λ‹ˆλ‹€.
6565
#
66-
# Notably, the existing approach does not work in these cases:
66+
# 특히, κΈ°μ‘΄ μ ‘κ·Ό 방식은 λ‹€μŒκ³Ό 같은 κ²½μš°μ— μž‘λ™ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€:
6767
#
68-
# * when using ``__torch_dispatch__`` subclasses
69-
# * when ``param`` and ``new_param`` do not have the same Python ``type()``
70-
# * For tensors with special C++ representations (such as sparse tensors and ``XLA`` tensors)
68+
# * ``__torch_dispatch__`` μ„œλΈŒν΄λž˜μŠ€λ₯Ό μ‚¬μš©ν•  λ•Œ
69+
# * ``param``κ³Ό ``new_param``의 파이썬 ``type()``이 λ™μΌν•˜μ§€ μ•Šμ„ λ•Œ
70+
# * 특수 C++ ν‘œν˜„μ„ 가진 ν…μ„œ(예: ν¬μ†Œ ν…μ„œ 및 ``XLA`` ν…μ„œ)
7171
#
72-
# In the following part of this recipe, we will define a toy ``__torch_dispatch__``
73-
# subclass ``MyQuantizedLinearWeight`` that represents quantized linear weights.
74-
# This subclass will be used for illustration purposes throughout the rest of
75-
# the tutorial. For brevity, we omit most of the ``__torch_dispatch__``
76-
# implementation.
72+
# 이 λ ˆμ‹œν”Όμ˜ λ‹€μŒ λΆ€λΆ„μ—μ„œλŠ” μ–‘μžν™”λœ μ„ ν˜• κ°€μ€‘μΉ˜λ₯Ό λ‚˜νƒ€λ‚΄λŠ”
73+
# μž₯λ‚œκ° ``__torch_dispatch__`` μ„œλΈŒν΄λž˜μŠ€ ``MyQuantizedLinearWeight``λ₯Ό μ •μ˜ν•  κ²ƒμž…λ‹ˆλ‹€.
74+
# 이 μ„œλΈŒν΄λž˜μŠ€λŠ” νŠœν† λ¦¬μ–Όμ˜ λ‚˜λ¨Έμ§€ λΆ€λΆ„μ—μ„œ μ„€λͺ…을 μœ„ν•΄ μ‚¬μš©λ©λ‹ˆλ‹€.
75+
# 간결함을 μœ„ν•΄ λŒ€λΆ€λΆ„μ˜ ``__torch_dispatch__``
76+
# κ΅¬ν˜„μ€ μƒλž΅ν•©λ‹ˆλ‹€.
7777
aten = torch.ops.aten
7878

7979
class MyQuantizedLinearWeight(torch.Tensor):
@@ -108,10 +108,10 @@ def __torch_dispatch__(cls, func, types, args, kwargs):
108108
raise NotImplementedError(f"Unsupported function {func}")
109109

110110
#################################################################################
111-
# Let us create an ``nn.Linear`` layer of ``dtype`` ``torch.float32`` where the weight is
112-
# a ``MyQuantizedLinearWeight`` and try to convert it to ``torch.bfloat16``.
113-
# Observe that the weight's ``dtype`` changes as expected. However, the ``dtype``
114-
# of the subclass' payload (``elem``) does not change.
111+
# ``dtype``κ°€ ``torch.float32``인 ``nn.Linear`` λ ˆμ΄μ–΄λ₯Ό μƒμ„±ν•˜κ³ , κ°€μ€‘μΉ˜λ₯Ό
112+
# ``MyQuantizedLinearWeight``둜 μ„€μ •ν•œ ν›„, 이λ₯Ό ``torch.bfloat16``으둜 λ³€ν™˜ν•΄ λ΄…λ‹ˆλ‹€.
113+
# κ°€μ€‘μΉ˜μ˜ ``dtype``이 μ˜ˆμƒλŒ€λ‘œ λ³€κ²½λ˜λŠ” 것을 κ΄€μ°°ν•  수 μžˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜
114+
# μ„œλΈŒν΄λž˜μŠ€μ˜ νŽ˜μ΄λ‘œλ“œ(``elem``)의 ``dtype``은 λ³€κ²½λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.
115115

116116
m = nn.Linear(3, 5, dtype=torch.float32)
117117
m.weight = torch.nn.Parameter(MyQuantizedLinearWeight(m.weight, 0.5))
@@ -123,12 +123,12 @@ def __torch_dispatch__(cls, func, types, args, kwargs):
123123
print(f"m.bias.dtype: {m.bias.dtype}")
124124

125125
################################################################################
126-
# To this end, we introduce a global config
127-
# ``torch.__future__.set_swap_module_params_on_conversion`` that will use
128-
# ``swap_tensors`` to swap the parameters of the module while preserving
129-
# references in place of ``.data`` setting. When this config is set,
130-
# ``swap_tensors`` will be used during the conversion, which ensures that
131-
# the ``dtype`` of the payload is properly converted.
126+
# 이λ₯Ό μœ„ν•΄ κΈ€λ‘œλ²Œ ꡬ성을 λ„μž…ν•©λ‹ˆλ‹€
127+
# ``torch.__future__.set_swap_module_params_on_conversion``을 μ‚¬μš©ν•  κ²ƒμž…λ‹ˆλ‹€.
128+
# 이 ꡬ성은 ``swap_tensors``λ₯Ό μ‚¬μš©ν•˜μ—¬ λͺ¨λ“ˆμ˜ λ§€κ°œλ³€μˆ˜λ₯Ό κ΅ν™˜ν•˜λ©°,
129+
# ``.data`` μ„€μ • λŒ€μ‹  μ°Έμ‘°λ₯Ό λ³΄μ‘΄ν•©λ‹ˆλ‹€. 이 ꡬ성이 μ„€μ •λ˜λ©΄,
130+
# λ³€ν™˜ κ³Όμ •μ—μ„œ ``swap_tensors``κ°€ μ‚¬μš©λ˜λ©°, 이λ₯Ό 톡해
131+
# νŽ˜μ΄λ‘œλ“œμ˜ ``dtype``이 μ˜¬λ°”λ₯΄κ²Œ λ³€ν™˜λ˜λ„λ‘ 보μž₯ν•©λ‹ˆλ‹€.
132132

133133
torch.__future__.set_swap_module_params_on_conversion(True)
134134
m = nn.Linear(3, 5, dtype=torch.float32)
@@ -144,42 +144,42 @@ def __torch_dispatch__(cls, func, types, args, kwargs):
144144
################################################################################
145145
# ``nn.Module.load_state_dict()``
146146
# --------------------------------
147-
# Depending on the value of the ``assign`` keyword argument passed
148-
# to ``load_state_dict()``, there are two ways to load the ``state_dict``:
147+
# ``load_state_dict()``에 μ „λ‹¬λœ ``assign`` ν‚€μ›Œλ“œ 인수의 값에 따라,
148+
# ``state_dict``λ₯Ό λ‘œλ“œν•˜λŠ” 두 가지 방법이 μžˆμŠ΅λ‹ˆλ‹€:
149149
#
150-
# * ``assign=False``: preserves the properties of ``module.param`` and only takes the values
151-
# from ``state_dict['param_name']``
152-
# * ``assign=True``: preserves the properties and values of ``state_dict['param_name']``.
150+
# * ``assign=False``: ``module.param``의 속성을 λ³΄μ‘΄ν•˜κ³ , ``state_dict['param_name']``의
151+
# κ°’λ§Œ κ°€μ Έμ˜΅λ‹ˆλ‹€.
152+
# * ``assign=True``: ``state_dict['param_name']``의 속성과 값을 λͺ¨λ‘ λ³΄μ‘΄ν•©λ‹ˆλ‹€.
153153
#
154154
#
155-
# Previously, these were implemented with in-place ``copy_`` and ``__setattr__`` respectively.
156-
# With the existing implementation, each approach had its own limitations -- ``assign=False``
157-
# imposes the constraint that the type of the parameter in the ``state_dict`` must
158-
# be the same as the type of the parameter in the module while ``assign=True`` imposes
159-
# the constraint that anything that holds references to the module's parameters must
160-
# be initialized after ``nn.Module.load_state_dict()``.
155+
# μ΄μ „μ—λŠ” 각각 μ œμžλ¦¬μ—μ„œ ``copy_``와 ``__setattr__``둜 κ΅¬ν˜„λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
156+
# κΈ°μ‘΄ κ΅¬ν˜„μ—μ„œλŠ” 각각의 μ ‘κ·Ό 방식에 κ³ μœ ν•œ μ œν•œ 사항이 μžˆμ—ˆμŠ΅λ‹ˆλ‹€ -- ``assign=False``λŠ”
157+
# ``state_dict``의 λ§€κ°œλ³€μˆ˜ νƒ€μž…μ΄
158+
# λͺ¨λ“ˆμ˜ λ§€κ°œλ³€μˆ˜ νƒ€μž…κ³Ό 동일해야 ν•œλ‹€λŠ” μ œμ•½μ„ λΆ€κ³Όν•˜λŠ” 반면, ``assign=True``λŠ”
159+
# λͺ¨λ“ˆμ˜ λ§€κ°œλ³€μˆ˜μ— λŒ€ν•œ μ°Έμ‘°λ₯Ό λ³΄μœ ν•˜λŠ” λͺ¨λ“  것이
160+
# ``nn.Module.load_state_dict()`` 이후에 μ΄ˆκΈ°ν™”λ˜μ–΄μ•Ό ν•œλ‹€λŠ” μ œμ•½μ„ λΆ€κ³Όν•©λ‹ˆλ‹€.
161161
#
162-
# Now, we address both constraints by adding a ``swap_tensors`` path to ``load_state_dict()``
163-
# and introducing a new extension point ``torch.Tensor.module_load(self, other, assign=False)``.
164-
# When the ``swap_tensors`` path is enabled via the ``__future__`` mentioned above,
165-
# we can use a ``__torch_function__`` handler for ``module_load`` to apply a
166-
# custom transformation to the value in the ``state_dict``. The result of this
167-
# transformation will be swapped with the parameter in the module.
162+
# 이제 μš°λ¦¬λŠ” ``load_state_dict()``에 ``swap_tensors`` 경둜λ₯Ό μΆ”κ°€ν•˜μ—¬ 두 가지 μ œμ•½μ„ ν•΄κ²°ν•©λ‹ˆλ‹€.
163+
# 그리고 μƒˆλ‘œμš΄ ν™•μž₯ 포인트 ``torch.Tensor.module_load(self, other, assign=False)``λ₯Ό λ„μž…ν•©λ‹ˆλ‹€.
164+
# μœ„μ—μ„œ μ–ΈκΈ‰ν•œ ``__future__``λ₯Ό 톡해 ``swap_tensors`` κ²½λ‘œκ°€ ν™œμ„±ν™”λ˜λ©΄,
165+
# ``module_load``에 λŒ€ν•œ ``__torch_function__`` ν•Έλ“€λŸ¬λ₯Ό μ‚¬μš©ν•˜μ—¬
166+
# ``state_dict``의 값에 μ‚¬μš©μž μ •μ˜ λ³€ν™˜μ„ μ μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 이 λ³€ν™˜μ˜ κ²°κ³ΌλŠ”
167+
# λͺ¨λ“ˆμ˜ λ§€κ°œλ³€μˆ˜μ™€ κ΅μ²΄λ©λ‹ˆλ‹€.
168168
#
169-
# In the following example, we will use the ``MyQuantizedLinearWeight`` subclass
170-
# defined above to illustrate how we can use these features to apply a
171-
# custom quantization scheme to the weights of a linear layer when
172-
# loading the ``state_dict``.
169+
# λ‹€μŒ μ˜ˆμ œμ—μ„œλŠ” ``MyQuantizedLinearWeight`` μ„œλΈŒν΄λž˜μŠ€λ₯Ό μ‚¬μš©ν•˜μ—¬
170+
# μœ„μ—μ„œ μ •μ˜λœ κΈ°λŠ₯을 μ‚¬μš©ν•˜μ—¬
171+
# μ„ ν˜• λ ˆμ΄μ–΄μ˜ κ°€μ€‘μΉ˜μ— μ‚¬μš©μž μ •μ˜ μ–‘μžν™” 방식을 μ μš©ν•˜λŠ” 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€.
172+
# ``state_dict``λ₯Ό λ‘œλ“œν•  λ•Œ.
173173
#
174-
# Recall that the ``__torch_function__`` handler for ``module_load`` will be
175-
# invoked if either ``self`` or ``other`` (in this case ``param`` or
176-
# ``state_dict[param_key]``) are ``MyQuantizedLinearWeight`` subclasses.
174+
# ``module_load``에 λŒ€ν•œ ``__torch_function__`` ν•Έλ“€λŸ¬λŠ” ν˜ΈμΆœλ©λ‹ˆλ‹€.
175+
# ``self`` λ˜λŠ” ``other`` (이 경우 ``param`` λ˜λŠ”
176+
# ``state_dict[param_key]``)κ°€ ``MyQuantizedLinearWeight`` μ„œλΈŒν΄λž˜μŠ€μΈ 경우.
177177
#
178-
# Assume that we expect the ``state_dict`` to contain plain tensors and the
179-
# module to contain ``MyQuantizedLinearWeight`` parameters where we want the
180-
# tensors in the ``state_dict`` to be transformed into the subclass. Then we
181-
# can define a ``__torch_function__`` handler for ``torch.Tensor.module_load``
182-
# as such:
178+
# ``state_dict``κ°€ 일반 ν…μ„œλ₯Ό ν¬ν•¨ν•˜κ³  μžˆλ‹€κ³  κ°€μ •ν•˜κ³ ,
179+
# λͺ¨λ“ˆμ΄ ``MyQuantizedLinearWeight`` νŒŒλΌλ―Έν„°λ₯Ό ν¬ν•¨ν•˜κ³  있으며,
180+
# ``state_dict``의 ν…μ„œκ°€ μ„œλΈŒν΄λž˜μŠ€λ‘œ λ³€ν™˜λ˜κΈ°λ₯Ό μ›ν•©λ‹ˆλ‹€. 그럼,
181+
# μš°λ¦¬λŠ” ``torch.Tensor.module_load``에 λŒ€ν•œ ``__torch_function__`` ν•Έλ“€λŸ¬λ₯Ό λ‹€μŒκ³Ό 같이 μ •μ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€:
182+
# λ‹€μŒκ³Ό 같이:
183183

184184
@classmethod
185185
def custom_torch_function(cls, func, types, args=(), kwargs=None):
@@ -196,9 +196,9 @@ def custom_torch_function(cls, func, types, args=(), kwargs=None):
196196
MyQuantizedLinearWeight.__torch_function__ = custom_torch_function
197197

198198
#################################################################################
199-
# First, let us create a skeleton of a model on the meta device to avoid
200-
# materializing storages. We convert all weights in the modules to
201-
# ``MyQuantizedLinearWeight`` subclasses while leaving biases intact.
199+
# λ¨Όμ €, 메타 λ””λ°”μ΄μŠ€μ—μ„œ λͺ¨λΈμ˜ μŠ€μΌˆλ ˆν†€μ„ μƒμ„±ν•˜μ—¬ μ €μž₯μ†Œλ₯Ό μ‹€μ²΄ν™”ν•˜λŠ” 것을 ν”Όν•©μ‹œλ‹€.
200+
# μ €μž₯μ†Œλ₯Ό μ‹€μ²΄ν™”ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. μš°λ¦¬λŠ” λͺ¨λ“ˆμ˜ λͺ¨λ“  κ°€μ€‘μΉ˜λ₯Ό
201+
# `MyQuantizedLinearWeight` μ„œλΈŒν΄λž˜μŠ€λ‘œ λ³€ν™˜ν•˜λ©΄μ„œ λ°”μ΄μ–΄μŠ€λŠ” κ·ΈλŒ€λ‘œ μœ μ§€ν•©λ‹ˆλ‹€.
202202

203203
def fn(m):
204204
if isinstance(m, nn.Linear):
@@ -212,9 +212,9 @@ def fn(m):
212212
m.apply(fn)
213213

214214
#################################################################################
215-
# We can then load the ``state_dict``. Observe that we use ``assign=True`` because
216-
# for biases, we want to preserve the properties of the tensor in the ``state_dict``
217-
# (for example, we do not want the bias to be on the ``meta`` device after loading).
215+
# 그러면 ``state_dict``λ₯Ό λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ°”μ΄μ–΄μŠ€μ˜ 경우 ``assign=True``λ₯Ό μ‚¬μš©ν•˜λŠ”λ°,
216+
# λ°”μ΄μ–΄μŠ€μ˜ 경우, ``state_dict``에 μžˆλŠ” ν…μ„œμ˜ 속성을 μœ μ§€ν•˜κ³ μž ν•©λ‹ˆλ‹€.
217+
# ``state_dict``에 μžˆλŠ” ν…μ„œμ˜ 속성을 μœ μ§€ν•˜κΈ° μœ„ν•΄μ„œμž…λ‹ˆλ‹€ (예λ₯Ό λ“€μ–΄, λ‘œλ“œ ν›„ λ°”μ΄μ–΄μŠ€κ°€ ``meta`` λ””λ°”μ΄μŠ€μ— μžˆμ§€ μ•Šλ„λ‘).
218218

219219
torch.__future__.set_swap_module_params_on_conversion(True)
220220
print(f"Before: id(weight)={id(m.weight)}, id(bias)={id(m.bias)}")
@@ -226,16 +226,16 @@ def fn(m):
226226
print(f"m.state_dict() after load_state_dict():\n {m.state_dict()}")
227227

228228
#################################################################################
229-
# The above is a toy example of how we can use the new extension point in
230-
# ``nn.Module.load_state_dict()``. One can also imagine alternate scenarios such
231-
# as when we have tensor subclasses in the ``state_dict`` and plain ``nn.Parameters``/
232-
# tensors in the module or when both are tensor subclasses. Based on the use
233-
# case, we can define the ``__torch_function__`` handler for ``module_load``
234-
# to apply the transforms as needed.
229+
# μœ„μ˜ μ˜ˆμ œλŠ” ``nn.Module.load_state_dict()``μ—μ„œ μƒˆλ‘œμš΄ ν™•μž₯ 지점을 μ‚¬μš©ν•˜λŠ” 방법을 λ³΄μ—¬μ£ΌλŠ” μž₯λ‚œκ° μ˜ˆμ œμž…λ‹ˆλ‹€.
230+
# ``nn.Module.load_state_dict()``μ—μ„œ μƒˆλ‘œμš΄ ν™•μž₯ 지점을 μ‚¬μš©ν•˜λŠ” 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€. λ˜ν•œ λ‹€λ₯Έ μ‹œλ‚˜λ¦¬μ˜€λ₯Ό 상상할 μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
231+
# 예λ₯Ό λ“€μ–΄, ``state_dict``에 ν…μ„œ μ„œλΈŒν΄λž˜μŠ€κ°€ 있고 λͺ¨λ“ˆμ— 일반 ``nn.Parameters``/
232+
# λͺ¨λ“ˆμ— ν…μ„œκ°€ μžˆκ±°λ‚˜ λ‘˜ λ‹€ ν…μ„œ μ„œλΈŒν΄λž˜μŠ€μΌ λ•Œ λ“± λ‹€μ–‘ν•œ μ‹œλ‚˜λ¦¬μ˜€λ₯Ό 상상할 수 μžˆμŠ΅λ‹ˆλ‹€. μ‚¬μš©μ— 따라
233+
# μ‹œλ‚˜λ¦¬μ˜€μ— 따라 ``module_load``에 λŒ€ν•œ ``__torch_function__`` ν•Έλ“€λŸ¬λ₯Ό μ •μ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
234+
# ν•„μš”μ— 따라 λ³€ν™˜μ„ μ μš©ν•©λ‹ˆλ‹€.
235235
#
236236
# Conclusion
237237
# ----------
238-
# In this recipe, we learned about ``swap_tensors``, the importance
239-
# of preserving references for parameters in ``nn.Module`` as well as how to
240-
# use the two new extension points that are gated by
241-
# ``torch.__future__.set_swap_module_params_on_conversion``.
238+
# 이번 λ ˆμ‹œν”Όμ—μ„œλŠ” ``swap_tensors``와 ``nn.Module``μ—μ„œ νŒŒλΌλ―Έν„°μ˜ μ°Έμ‘°λ₯Ό λ³΄μ‘΄ν•˜λŠ” κ²ƒμ˜ μ€‘μš”μ„±μ— λŒ€ν•΄ λ°°μ› μŠ΅λ‹ˆλ‹€.
239+
# ``nn.Module``μ—μ„œ νŒŒλΌλ―Έν„°μ˜ μ°Έμ‘°λ₯Ό λ³΄μ‘΄ν•˜λŠ” 것과
240+
# μ œμ–΄λ˜λŠ” 두 가지 μƒˆλ‘œμš΄ ν™•μž₯ 지점을 μ‚¬μš©ν•˜λŠ” 방법에 λŒ€ν•΄μ„œλ„ λ°°μ› μŠ΅λ‹ˆλ‹€.
241+
# ``torch.__future__.set_swap_module_params_on_conversion``에 μ˜ν•΄

0 commit comments

Comments
Β (0)