Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and throw an error whenever we get a closure with a Core.Box #133

Closed
MasonProtter opened this issue Jan 28, 2025 · 3 comments · Fixed by #141
Closed

Detect and throw an error whenever we get a closure with a Core.Box #133

MasonProtter opened this issue Jan 28, 2025 · 3 comments · Fixed by #141

Comments

@MasonProtter
Copy link
Member

MasonProtter commented Jan 28, 2025

This is one way to at least warn users that code like the code shown in #132 is incorrect. The idea would be that when a user calls

tmap(f, itr)

(or tmapreduce or whatever), we want to check if any of the fields of f are a Core.Box, which would signal that the user accidentally has caused a concurrency violation.

Here's a fun example:

julia> let
           A = 1
           res1 = tmap(1:10) do i
               A = i
               sleep(rand())
               A
           end
           res2 = tmap(1:10) do i
               local A = i
               sleep(rand())
               A
           end
           @info "Oops!" res1' res2'
       end
┌ Info: Oops!
│   res1' =1×10 adjoint(::Vector{Int64}) with eltype Int64:8  2  10  8  4  2  6  8  10  4
│   res2' =1×10 adjoint(::Vector{Int64}) with eltype Int64:1  2  3  4  5  6  7  8  9  10

The problem is that the A=1 inside the let block is making it so that inside the tmap for res1, A actually is a Core.Box and every time we assign to A, we are mutating the Core.Box as a race condition, and each time we reference A, we're making a racey read.

The Core.Box design for closures is fine for sequential programs, but is wildly incorrect for concurrenct programs. This should actually be pretty easy to detect and throw an informative error.

I don't think there is any valid use-case where you'd put a boxed closure into one of our threaded functions and not cause a data-race.

@lkdvos
Copy link

lkdvos commented Mar 18, 2025

Probably a bit late to the party, and apologies if this is too much of a "just complaining" post, but the following examples seem not super contrived, and do not cause race conditions.

let
    A = zeros(10)
    for _ in 1:10
        res = tmap(eachindex(A)) do i
            A[i] = i
            A[i]
        end
        A = zeros(10)
    end
end # error

let
    A = zeros(10)
    for _ in 1:10
        A = zeros(10)
        res = tmap(eachindex(A)) do i
            A[i] = i
            A[i]
        end
    end
end # error


let
    for _ in 1:10
        A = zeros(10)
        res = tmap(eachindex(A)) do i
            A[i] = i
            A[i]
        end
    end
end # fine

Even more obvious is this one:

let
    A = rand(10)
    for _ in 1:10
        res = tmap(eachindex(A)) do i
            A[i]
        end
        A = rand(10)
    end
end # error

I understand that there are performance implications for the boxed variables, but in some cases the performance penalty simply does not matter, the inner function can take a very long time and then this is somewhat negligible (correct me if I'm wrong, compilers can be voodoo to me sometimes :)).
Anyways, I see the value of protecting users from themselves, but this error seems fairly opinionated, as Boxed variables can appear because of a variety of reasons, and not all of them are because you are modifying an external variable.

@MasonProtter
Copy link
Member Author

MasonProtter commented Mar 18, 2025

Hi @lkdvos, thanks for the feedback. "Just complaining" is also fine. I want this package to be useful for people, and it's good for me to hear feedback on this feature / misfeature, even if I don't end up agreeing. My opinions on this are not set in stone, and if it's too problematic I am okay with making this error opt-in rather than opt-out

While I usually lean away from hyper opinionated errors, I really do think this one is such a surprising and dangerous class of errors that I really do think there's value here in throwing this error to be extra defensive.

I've opened a PR #142 to try and make cases where the boxing is not incorrect a bit better. In particular, your examples can now be fixed with the @localize macro. They'd no longer be boxed and so will also be more performant.

Perhaps I could also add a way to exempt entire modules from these error checks so that users like you can just say "leave me alone" as a semi-global toggle. Any thoughts?

@lkdvos
Copy link

lkdvos commented Mar 19, 2025

Thanks for the nice reply!

I failed to consider the let block approach completely, so I think your PR is actually a really nice addition.
I think what I was dreading the most is that if I want to insist on "I know what I'm doing - I only have myself to blame", I would have to make use of the opt-out macro. While this is not a problem itself, I was a bit worried about having compat issues with the two versions of OhMyThreads in the meantime, since that macro does not exist in 0.7 (I think?), and so switching to 0.8 would mean dropping support for 0.7.

The let block solves this problem for me, and I do agree that this is a topic where being too careful probably pays off in the long run, but throwing errors is also quite drastic.
Maybe a print-once warning could also be considered, since the performance penalty of this warning is probably negligible compared to the boxed variable, so that could be a bit of a softer blow?

For the @localize macro, I do think it looks convenient, but I tend to prefer being a bit more verbose in this context so I would probably still manually apply the let blocks.
My main reason here is that this pattern shows up in other contexts as well, since boxed variables can also lead to type-instabilities and other performance penalties, and I feel like exposing more people to these let blocks might raise awareness (as illustrated by the fact that my brain also didn't immediately make the connection).

In any case, thanks a lot for taking the time for dealing with this.
Given that I can fix my problems with let blocks, I think my use-cases are covered, and I don't mind being a bit more verbose when I want to insist on doing "unsafe" stuff so that all seems very fair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants