Skip to content

Commit 88a4ba4

Browse files
news
1 parent 982cc0d commit 88a4ba4

File tree

4 files changed

+11
-7
lines changed

4 files changed

+11
-7
lines changed

NEWS.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
See also [github's page](https://github.com/FluxML/Flux.jl/releases) for a complete list of PRs merged before each release.
44

5+
## v0.15.0
6+
* Recurrent layers have undergone a complete redesign in [PR 2500](https://github.com/FluxML/Flux.jl/pull/2500).
7+
* `RNN`, `LSTM`, and `GRU` no longer store the hidden state internally. Instead, they now take the previous state as input and return the updated state as output.
8+
* These layers (`RNN`, `LSTM`, `GRU`) now process entire sequences at once, rather than one element at a time.
9+
* The `Recur` wrapper has been deprecated and removed.
10+
* The `reset!` function has also been removed; state management is now entirely up to the user.
11+
* `RNNCell`, `LSTMCell`, and `GRUCell` are now exported and provide functionality for single time-step processing.
12+
513
## v0.14.22
614
* Data movement between devices is now provided by [MLDataDevices.jl](https://github.com/LuxDL/MLDataDevices.jl).
715

docs/src/guide/models/recurrence.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,14 +169,13 @@ X = [seq_1, seq_2]
169169
Y = [y1, y2]
170170
data = zip(X,Y)
171171

172-
Flux.reset!(m)
173172
[m(x) for x in seq_init]
174173

175174
opt = Flux.setup(Adam(1e-3), m)
176175
Flux.train!(loss, m, data, opt)
177176
```
178177

179-
In this previous example, model's state is first reset with `Flux.reset!`. Then, there's a warmup that is performed over a sequence of length 1 by feeding it with `seq_init`, resulting in a warmup state. The model can then be trained for 1 epoch, where 2 batches are provided (`seq_1` and `seq_2`) and all the timesteps outputs are considered for the loss.
178+
Then, there's a warmup that is performed over a sequence of length 1 by feeding it with `seq_init`, resulting in a warmup state. The model can then be trained for 1 epoch, where 2 batches are provided (`seq_1` and `seq_2`) and all the timesteps outputs are considered for the loss.
180179

181180
In this scenario, it is important to note that a single continuous sequence is considered. Since the model state is not reset between the 2 batches, the state of the model flows through the batches, which only makes sense in the context where `seq_1` is the continuation of `seq_init` and so on.
182181

@@ -187,7 +186,7 @@ x = [rand(Float32, 2, 4) for i = 1:3]
187186
y = [rand(Float32, 1, 4) for i = 1:3]
188187
```
189188

190-
That would mean that we have 4 sentences (or samples), each with 2 features (let's say a very small embedding!) and each with a length of 3 (3 words per sentence). Computing `m(batch[1])`, would still represent `x1 -> y1` in our diagram and returns the first word output, but now for each of the 4 independent sentences (second dimension of the input matrix). We do not need to use `Flux.reset!(m)` here; each sentence in the batch will output in its own "column", and the outputs of the different sentences won't mix.
189+
That would mean that we have 4 sentences (or samples), each with 2 features (let's say a very small embedding!) and each with a length of 3 (3 words per sentence). Computing `m(batch[1])`, would still represent `x1 -> y1` in our diagram and returns the first word output, but now for each of the 4 independent sentences (second dimension of the input matrix). Each sentence in the batch will output in its own "column", and the outputs of the different sentences won't mix.
191190

192191
To illustrate, we go through an example of batching with our implementation of `rnn_cell`. The implementation doesn't need to change; the batching comes for "free" from the way Julia does broadcasting and the rules of matrix multiplication.
193192

@@ -223,7 +222,6 @@ In many situations, such as when dealing with a language model, the sentences in
223222

224223
```julia
225224
function loss(x, y)
226-
Flux.reset!(m)
227225
sum(mse(m(xi), yi) for (xi, yi) in zip(x, y))
228226
end
229227
```

docs/src/reference/models/layers.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,6 @@ RNN
112112
LSTM
113113
GRU
114114
GRUv3
115-
Flux.Recur
116-
Flux.reset!
117115
```
118116

119117
## Normalisation & Regularisation

src/layers/show.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ function _macro_big_show(ex)
1414
end
1515
end
1616

17-
# Don't show Chain(Tuple(...)), always splat that. And ignore Recur's non-trainable state:
17+
# Don't show Chain(Tuple(...)), always splat that. And ignore non-trainable buffers:
1818
Flux._show_children(x::$ex) = _flat_children(trainable(x))
1919
end
2020
end

0 commit comments

Comments
 (0)