Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix in UniqueElementsEncoding #418

Merged
merged 7 commits into from
Jul 21, 2024
Merged

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Jul 21, 2024

While working on the latest version of CausalityTools.jl, I encountered a weird bug that turned out to have its root in ComplexityMeasures.jl: codifying certain types of state space sets with UniqueElements didn't work. This happened because the unique(x)call errored. In turn, this happens because unique isn't implemented for AbstractStateSpaceSets.

I fix this here by using vec to get the data (which works both for regular vectors and for state space sets). This is now tested explicitly.

I also ensure that we get the correct element types for the dicts, and use an outer constructor because there's nothing special happening in it requiring it to be an inner constructor.

@kahaaga kahaaga requested a review from Datseris July 21, 2024 07:51
@kahaaga
Copy link
Member Author

kahaaga commented Jul 21, 2024

@Datseris, any idea if we can do something in StateSpaceSets.jl to make the last case below work? I haven't done a deep dive into Julia's broadcasting mechanisms yet, but it might be worth it.

Currently, we have to use vec on the state space set before constructing the UniqueElementsEncoding for encode to work when broadcasting. This isn't a problem upstream, because there I use counts (which calls vec explicitly, avoiding the issue below). However it is a bit annoying.

To illustrate, see the following test cases (which all work):

julia> x = ['a', 2, 5, 2, 5, 'a']; e = UniqueElementsEncoding(x)
UniqueElementsEncoding, with 2 fields:
 encode_dict = Dict{Any, Int64}(5 => 3, 2 => 2, 'a' => 1)
 decode_dict = Dict{Int64, Any}(2 => 2, 3 => 5, 1 => 'a')


julia> @test encode.(Ref(e), x) == [1, 2, 3, 2, 3, 1]
Test Passed

julia> y = ["a", "b", "c", "b", "a"]; ey = UniqueElementsEncoding(y)
UniqueElementsEncoding, with 2 fields:
 encode_dict = Dict("c" => 3, "b" => 2, "a" => 1)
 decode_dict = Dict(2 => "b", 3 => "c", 1 => "a")


julia> @test encode.(Ref(ey), y) == [1, 2, 3, 2, 1]
Test Passed

julia> z = vec(StateSpaceSet(y)); ez = UniqueElementsEncoding(z)
UniqueElementsEncoding, with 2 fields:
 encode_dict = Dict{SVector{1, String}, Int64}(["b"] => 2, ["c"] => 3, ["a"] => 1)
 decode_dict = Dict{Int64, SVector{1, String}}(2 => ["b"], 3 => ["c"], 1 => ["a"])


julia> @test encode.(Ref(ez), z)  == [1, 2, 3, 2, 1]
Test Passed

In contrast, using the state space set directly fails because broadcasting fails for the state space set:

julia> w = StateSpaceSet(y); ew = UniqueElementsEncoding(w)
UniqueElementsEncoding, with 2 fields:
 encode_dict = Dict{SVector{1, String}, Int64}(["b"] => 2, ["c"] => 3, ["a"] => 1)
 decode_dict = Dict{Int64, SVector{1, String}}(2 => ["b"], 3 => ["c"], 1 => ["a"])


julia> @test encode.(Ref(ew), w)  == [1, 2, 3, 2, 1]
Error During Test at REPL[166]:1
  Test threw exception
  Expression: encode.(Ref(ew), w) == [1, 2, 3, 2, 1]
  MethodError: Cannot `convert` an object of type SVector{1, String} to an object of type String

  Closest candidates are:
    convert(::Type{String}, ::Base.JuliaSyntax.Kind)
     @ Base /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XG3Q6T6R70.0/build/default-honeycrisp-XG3Q6T6R70-0/julialang/julia-release-1-dot-10/base/JuliaSyntax/src/kinds.jl:975
    convert(::Type{String}, ::String)
     @ Base essentials.jl:321
    convert(::Type{T}, ::T) where T<:AbstractString
     @ Base strings/basic.jl:231
    ...

  Stacktrace:
   [1] setindex!(A::Vector{String}, x::SVector{1, String}, i1::Int64)
     @ Base ./array.jl:1021
   [2] copyto!
     @ ./abstractarray.jl:946 [inlined]
   [3] _collect
     @ ./array.jl:765 [inlined]
   [4] collect
     @ ./array.jl:759 [inlined]
   [5] broadcastable
     @ ./broadcast.jl:743 [inlined]
   [6] broadcasted(::typeof(encode), ::Base.RefValue{UniqueElementsEncoding{SVector{1, String}, Int64}}, ::StateSpaceSet{1, String})
     @ Base.Broadcast ./broadcast.jl:1345
   [7] macro expansion
     @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:669 [inlined]
   [8] top-level scope
     @ REPL[166]:1
ERROR: There was an error during testing

I've added tests on the cases that work and added a TODO for the test case that doesn't work.

Copy link

codecov bot commented Jul 21, 2024

Codecov Report

Attention: Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.

Project coverage is 91.99%. Comparing base (9fcf9d7) to head (2703afa).
Report is 4 commits behind head on main.

Files Patch % Lines
...coding_implementations/unique_elements_encoding.jl 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #418      +/-   ##
==========================================
+ Coverage   91.97%   91.99%   +0.01%     
==========================================
  Files          86       86              
  Lines        2468     2473       +5     
==========================================
+ Hits         2270     2275       +5     
  Misses        198      198              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Datseris
Copy link
Member

Currently, we have to use vec on the state space set before constructing the UniqueElementsEncoding for encode to work when broadcasting. This isn't a problem upstream, because there I use counts (which calls vec explicitly, avoiding the issue below). However it is a bit annoying.

Yeah, StateSpaceSet needs to be reworked to subtype AbstractVector, and to formally implement Julia's vector (array) interface. Broadcasting then will "just work", and the whole vec wrapping will become obsolete. I simply haven't gotten around doing that yet, because it will create the breaking change that now size(ssset) will be (length(ssset), ) instead of (length(ssset), dimension(ssset)) that it is now, and I haven't checked wherther any downstream package utilizes this assumption. None should, because dimension(ssset) should always be used for hte number of columns, but of course some code in DynamicalSystems.jl is about 5-6 years old so you never know.

The best way forwards in my opinion is to fix the source: make a v2 of SSSets.jl that makes SSSet subtype vector, and also make it so that the element type of the inner vectors is parameterized, so that we can also solve JuliaDynamics/StateSpaceSets.jl#18 at the same time.

@kahaaga
Copy link
Member Author

kahaaga commented Jul 21, 2024

The best way forwards in my opinion is to fix the source: make a v2 of SSSets.jl that makes SSSet subtype vector, and also make it so that the element type of the inner vectors is parameterized, so that we can also solve JuliaDynamics/StateSpaceSets.jl#18 at the same time.

Yep. Agreed.

Then we can just merge this in the meanwhile, since this doesn't actually break anything intrinsic to this package?

Copy link
Member

@Datseris Datseris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is fine. Regardless of when JuliaDynamics/StateSpaceSets.jl#30 happens, vec(ssset) will always work anyways.

@Datseris Datseris merged commit 0d9bc24 into main Jul 21, 2024
4 checks passed
@Datseris Datseris deleted the fix_uniqueleementsencoding branch July 21, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants