You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+8Lines changed: 8 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,14 @@
2
2
3
3
See also [github's page](https://github.com/FluxML/Flux.jl/releases) for a complete list of PRs merged before each release.
4
4
5
+
## v0.15.0
6
+
* Recurrent layers have undergone a complete redesign in [PR 2500](https://github.com/FluxML/Flux.jl/pull/2500).
7
+
*`RNN`, `LSTM`, and `GRU` no longer store the hidden state internally. Instead, they now take the previous state as input and return the updated state as output.
8
+
* These layers (`RNN`, `LSTM`, `GRU`) now process entire sequences at once, rather than one element at a time.
9
+
* The `Recur` wrapper has been deprecated and removed.
10
+
* The `reset!` function has also been removed; state management is now entirely up to the user.
11
+
*`RNNCell`, `LSTMCell`, and `GRUCell` are now exported and provide functionality for single time-step processing.
12
+
5
13
## v0.14.22
6
14
* Data movement between devices is now provided by [MLDataDevices.jl](https://github.com/LuxDL/MLDataDevices.jl).
Copy file name to clipboardExpand all lines: docs/src/guide/models/recurrence.md
+2-4Lines changed: 2 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -169,14 +169,13 @@ X = [seq_1, seq_2]
169
169
Y = [y1, y2]
170
170
data =zip(X,Y)
171
171
172
-
Flux.reset!(m)
173
172
[m(x) for x in seq_init]
174
173
175
174
opt = Flux.setup(Adam(1e-3), m)
176
175
Flux.train!(loss, m, data, opt)
177
176
```
178
177
179
-
In this previous example, model's state is first reset with `Flux.reset!`. Then, there's a warmup that is performed over a sequence of length 1 by feeding it with `seq_init`, resulting in a warmup state. The model can then be trained for 1 epoch, where 2 batches are provided (`seq_1` and `seq_2`) and all the timesteps outputs are considered for the loss.
178
+
Then, there's a warmup that is performed over a sequence of length 1 by feeding it with `seq_init`, resulting in a warmup state. The model can then be trained for 1 epoch, where 2 batches are provided (`seq_1` and `seq_2`) and all the timesteps outputs are considered for the loss.
180
179
181
180
In this scenario, it is important to note that a single continuous sequence is considered. Since the model state is not reset between the 2 batches, the state of the model flows through the batches, which only makes sense in the context where `seq_1` is the continuation of `seq_init` and so on.
182
181
@@ -187,7 +186,7 @@ x = [rand(Float32, 2, 4) for i = 1:3]
187
186
y = [rand(Float32, 1, 4) for i =1:3]
188
187
```
189
188
190
-
That would mean that we have 4 sentences (or samples), each with 2 features (let's say a very small embedding!) and each with a length of 3 (3 words per sentence). Computing `m(batch[1])`, would still represent `x1 -> y1` in our diagram and returns the first word output, but now for each of the 4 independent sentences (second dimension of the input matrix). We do not need to use `Flux.reset!(m)` here; each sentence in the batch will output in its own "column", and the outputs of the different sentences won't mix.
189
+
That would mean that we have 4 sentences (or samples), each with 2 features (let's say a very small embedding!) and each with a length of 3 (3 words per sentence). Computing `m(batch[1])`, would still represent `x1 -> y1` in our diagram and returns the first word output, but now for each of the 4 independent sentences (second dimension of the input matrix). Each sentence in the batch will output in its own "column", and the outputs of the different sentences won't mix.
191
190
192
191
To illustrate, we go through an example of batching with our implementation of `rnn_cell`. The implementation doesn't need to change; the batching comes for "free" from the way Julia does broadcasting and the rules of matrix multiplication.
193
192
@@ -223,7 +222,6 @@ In many situations, such as when dealing with a language model, the sentences in
0 commit comments