-
-
Notifications
You must be signed in to change notification settings - Fork 611
RNN and GRU give mutation error; LSTM gives ArgumentError about number of fields #1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just a hunch from the phone: Try x = [rand(5, 1) for i in 1:2] Flux rnns wants 2D input (size x batch). Cant think of why this would trigger mutation, but Ive seen stranger things happen :) |
The same errors occur for me when having |
I could reproduce the error on current master but it's a bit puzzling as the RNN code hasn't changed since 0.11.3 and your test script works fine with latest releases (0.11.6, 0.11.4...). The Manifest on master seems to lag compared to latest release, with Zygote being 0.6.0 rather than 0.6.2 as in Flux 0.11.6. However even after making a pkg update from master, I still get the same mutating array error msg and couldn't isolate which package is in cause. @DhairyaLGandhi Any idea if some changes in Project/Manifest may have introduced a mutating array issue? I can't think of any other cause. |
The env shouldn't have caused this. |
It shouldn't, but it seems like all points into that direction. I've just made a local PR test from 0.11.6 to master branch, and strangely and the only differences are the Project/Manifest, along the data-test and the It shouldn't be the env, but as the Manifest/Project are about the parts that has moved since 0.11.6, what else could be the cause? |
Sorry for the noise, I was incorrect in my assumption that the RNN modifs were integrated in Flux 0.11.6. x = [rand(Float32, 5) for i in 1:2]
Flux.train!(loss, Flux.params(m), x, ADAM()) This stems from the fix for the type inference stability of the RNN layers, as the initial state get initialized by |
Definitely the latter, although where exactly does the difference show up |
Neither I guess. We should just follow the julia promotion rules for the math operations involved. So ideally we shouldn't be seeing an error, or at least have a meaningful one. |
To make Recur work fine with mutable struct Recur{T,S}
cell::T
state::S
end However, it results in a non defined type output: julia> @code_warntype m(X[1])
Variables
m::Flux.Recur{Flux.RNNCell{typeof(tanh),Array{Float32,2},Array{Float32,1},Array{Float32,2}}}
x::Array{Float32,2}
@_3::Int64
y::Any
Body::Any
1 ─ Core.NewvarNode(:(@_3))
│ Core.NewvarNode(:(y))
│ %3 = Flux.eltype(x)::Core.Compiler.Const(Float32, false)
│ %4 = Base.getproperty(m, :state)::Any
│ %5 = Flux.eltype(%4)::Any
│ %6 = (%3 == %5)::Any
└── goto #3 if not %6
2 ─ goto #4
3 ─ %9 = Base.AssertionError("Recur input elements must have the same type has its state.")::AssertionError
└── Base.throw(%9)
4 ┄ %11 = Base.getproperty(m, :cell)::Flux.RNNCell{typeof(tanh),Array{Float32,2},Array{Float32,1},Array{Float32,2}}
│ %12 = Base.getproperty(m, :state)::Any
│ %13 = (%11)(%12, x)::Tuple{Any,Any}
│ %14 = Base.indexed_iterate(%13, 1)::Core.Compiler.PartialStruct(Tuple{Any,Int64}, Any[Any, Core.Compiler.Const(2, false)])
│ %15 = Core.getfield(%14, 1)::Any
│ Base.setproperty!(m, :state, %15)
│ (@_3 = Core.getfield(%14, 2))
│ %18 = Base.indexed_iterate(%13, 2, @_3::Core.Compiler.Const(2, false))::Core.Compiler.PartialStruct(Tuple{Any,Int64}, Any[Any, Core.Compiler.Const(3, false)])
│ (y = Core.getfield(%18, 1))
└── return y This doesn't affect performance on a simple Chain model, but there was an issue opened regarding important impact when there was downstream tasks performed: #1092 If keeping the Recur's state type parameter in Recur, adding an assertion on input types would work but looks to adds ~2-3% overhead: function (m::Recur)(x::AbstractArray)
@assert eltype(x) == eltype(m.state[1]) "Recur input elements must have the same type has its state."
m.state, y = m.cell(m.state, x)
return y
end What you the preferred option between dropping the infrence type stability vs. adding a type assertion? Or a third option? |
What if we could error appropriately in the forward pass. much of the error would be better understood. It would just require typing things a bit more strongly. |
Just to clarify, forward pass here means the callable cells? Does stronger typing there (or the assert in |
So the assert should actually go away, it's use is also mentioned in the docs. But yes, the stronger typing can infact hurt mixed precision, but that would require some kernel forwarding and accumulation logic anyway which is somewhat separate and can be handled down the road. |
Would that be in line with what you refer by forward pass: mutable struct Recur{T,S}
cell::T
state::Union{S, Tuple{S,S}}
end
function (m::Recur{T,S})(x::S) where{T,S}
m.state, y = m.cell(m.state, x)
return y
end It seems to almost be a workable approach. But it fails when the |
Let's definitely avoid unions. Those increase the surface of issues we would face and muddy the intent with the assumptions we make. |
With hidden state of RNN and GRU being different than that of LSTM, plus the need to handle both Vector and Matrix input, I'm unfortunately short of ideas on how to articulate the type parametrization. Suggestions welcome! |
Here's another proposal. It involves having the hidden state defined as a tuple for all types of recurrent cells, as opposed to a Matrix for RNN/GRU and Tuple of Matrix for LSTM. It performs a reshape to force a 2D input, similar in spirit to what is done with Dense layer. mutable struct Recur{T,S}
cell::T
state::Tuple{Vararg{S}}
end
function (m::Recur)(x)
m.state, y = m.cell(m.state, reshape(x, size(x,1), :))
return y
end Then, the consistency check on input parameters is performed through the forward definition of the cells: function (m::RNNCell{F,A,V,S})((h,), x::S) where {F,A,V,S}
σ, Wi, Wh, b = m.σ, m.Wi, m.Wh, m.b
h = σ.(Wi*x .+ Wh*h .+ b)
return (h,), h
end So requiring input julia> m(X[1])
ERROR: MethodError: no method matching (::Flux.RNNCell{typeof(tanh),Array{Float32,2},Array{Float32,1},Array{Float32,2}})(::Tuple{Array{Float32,2}}, ::Array{Float64,2})
Closest candidates are:
Any(::Any, ::S) where {F, A, V, S} at c:\github\Flux.jl\src\layers\recurrent.jl:85 Also, mixing types would work under this scheme. For example, if one wish to work with Flota64 input and Float32 parameters, it will work fine as long as the initial state is also initialized with Float64. |
@DhairyaLGandhi Any opinion on the previous proposal? It would provide with both type inference stability as well as a meaningful type mismatch error if type of state is different than input. I could move forward with a PR. |
Having the hidden state as tuple might be fine, but we typically don't restrict the type of input letting Julia's promotion mechanism take care of it. Also that might trip up mixed precision work in the future. I'm wondering if we could deal with the reshape via dispatching on the kind of cell we are dealing with |
Maybe there's a misunderstanding as the above proposition would support mixed precision. The constraint is only to have the hidden state to be of the same type than the input, which seems reasonable from my perspective as both are essentially inputs. The parameters could well be of Float32 while the input/state are Float64. |
Although this change makes sense and brings more consistency, I'd like to avoid another breaking change if we really don't have to. Wouldn't something like this be enough to error out on mismatched types? function (m::RNNCell{F,A,V,<:AbstractArray{T}})(h, x::AbstractArray{T}) where {F,A,V,T}
σ, Wi, Wh, b = m.σ, m.Wi, m.Wh, m.b
h = σ.(Wi*x .+ Wh*h .+ b)
return h, h
end Something similar could be done for GRU and LSTM, although it would be more complicated. As a side note, the output of Dense is a vector when the input is a vector: julia> Dense(3,2)(rand(3))
2-element Array{Float64,1}:
1.7997510852225493
0.6333258983290979
julia> Dense(3,2)(rand(3,1))
2×1 Array{Float64,2}:
0.3791885941573536
0.5453548860286661 so probably we should have the same behavior for recurrent layers (which is already the case I guess) |
This would effectively captures the mismatched types, but wouldn't solve the issue of type inference. mutable struct Recur{T,S}
cell::T
state::Tuple{Vararg{S}}
end This requirements however come at the cost that the type of the state cannot change after being initialized. This is why it got initialized as a Matrix. It is compatible with both Vector and Matrix inputs, however, as the function (m::Recur)(x)
m.state, y = m.cell(m.state, x)
sz = size(x)
return reshape(y, :, sz[2:end]...)
end To sum up, if it is desirable to have type inference of the Recur output (to me it is), then I see 2 options:
And finally there's option 3, which is Recur's type inference is dropped. However, I don't see a reason to go that route. At least from my usage perspective, I would only end up rewriting RNN's structs for my own usage. |
Sorry to bump, if there any of the above 3 options or another one that you'd like to take? |
I wasn't sure what was being proposed earlier. I mean that we effectively dispatch based on the type of cell, and let that kernel take care of reshaping, not dispatching on reshape itself. Sorry about the confusion. |
It's a bit odd for the state to be a tuple, I admit. |
Bumps are very welcome! Still, it is not clear to me what's wrong with what I suggested above, and what you mean by "issue of type inference". My proposal is mutable struct Recur{T,S}
cell::T
state::S
end
function (m::RNNCell{F,A,V,<:AbstractArray{T}})(h, x::AbstractArray{T}) where {F,A,V,T}
...
end
#this one is just a lazy guess
function (m::RNNCell{F,A,NTuple{2,<:AbstractArray{T}})(h, x::AbstractArray{T}) where {F,A,V,T}
...
end I'm not particularly fond of it but I think it should work. |
And yes, having a tuple state when not needed is not ideal. Also true though that 2 out of 3 cells have a tuple state, so we may have the remaining one be a tuple as well for consistency. Not sure what's the best choice. |
@CarloLucibello Yes your proposal looks like a good solution. My initial doubt was concerning the parametrization of the For the hidden state, the proposal to always go with tuples was initially motivated to avoid the union of Matrix and Tuple of Matrix. But given the above comment on As for the requirement of having a function (m::RNNCell{F,A,V,<:AbstractMatrix{T}})(h, x::AbstractVecOrMat{T}) where {F,A,V,T}
σ, Wi, Wh, b = m.σ, m.Wi, m.Wh, m.b
h = σ.(Wi*x .+ Wh*h .+ b)
sz = size(x)
return h, reshape(h, :, sz[2:end]...)
end Altogether, I think it these will resolve the remaining friction points. If that is fine with you, I'll go with a PR. |
1521: Fixes to Recurrent models for informative type mismatch error & output Vector for Vector input r=CarloLucibello a=jeremiedb Minor fix to Recurrent to return `Vector` with `Vector` input, returns an indicative error relative to type incompatibility where eltype of input doesn't match with eltype of state, as well as some typos in associated docs. As discussed in #1483. Co-authored-by: jeremie.db <[email protected]> Co-authored-by: jeremiedb <[email protected]>
I believe this was closed with #1521 but GH didn't automatically link the issue with the PR. @jeremiedb can reopen if I missed something. |
I couldn't find another issue mentioning this, so my apologies if this has already come up. On the current master branch, 08e79c4, RNN layers give a mutation error from Zygote during the backward pass.
RNN stacktrace
GRU layers throw a similar error.
GRU stacktrace
And, LSTM layers give an
ArgumentError
about not having a definite number of fields.LSTM stacktrace
None of these errors occur in the latest release, v0.11.6, and all the tests in the current master branch also pass for me.
The text was updated successfully, but these errors were encountered: