Add direct Enzyme support #476

Closed

opened

on Oct 8, 2024

AutoEnzyme should probably be specialized and not fall back to DI.

In addition to being slower in some cases, it's been shown to cause errors (even segfaults) when using AutoEnzyme in DI whereas using Enzyme directly ran successfully.

see jump-dev/JuMP.jl#3836 (comment)

gdalle

Collaborator

How about we try to fix bugs together in DI instead of always undermining it? We could start with this one you mention: EnzymeAD/Enzyme.jl#1942 (I managed to reproduce it with pure Enzyme, independently from DI).

I understand your concerns about Enzyme performance / correctness being misrepresented by DI. But whenever someone wants access to multiple AD backends, as is the case for most SciML libraries, DI is the natural way to go. It offers unified syntax, and it can do things that no backend on its own can do at the moment, like sparse autodiff¹.
Besides, the whole idea is to avoid duplication of efforts, so that bindings only have to be written in one place.

If you show me a meaningful example where LinearSolve fails with DI + Enzyme (I have no doubt there are plenty), I'll be happy to try and fix it. I have already altered DI's design significantly to fit Enzyme's peculiarities (e.g. by introducing Constant arguments). I can alter it again, but it would be more pleasant to work together instead of against each other.

I know about Spadina but it seems there is no Julia interface yet? ↩

wsmoses

Author

Oh for sure, and I'm not at all saying to remove DI.

It's a fantastic package that makes a lot of things easier, some of which you describe above!

But also at the end of the day we both want to make things easier for our users. From my point of view, we should help the ecosystem by writing extensions/examples for Enzyme that maximize performance and compatibility. I don't see how this is different from you opening up PR's/issues on various repos asking if DI can be helpful?

In some cases, like LinearSolve.jl, that's an extension to add EnzymeRules to simplify the code being differentiated (also learning to better performance).

In other cases, where a package dispatches to an autodiff backend, it makes sense to call Enzyme directly.

The fact that the other day we saw a segfault when using AutoEnzyme in DI for a small docs example where Enzyme directly worked is rather worrying to me and implies that presently the overhead of DI might be making problems for Enzyme users more generically.

As a result, packages have been usually adopting a dual approach, calling Enzyme.autodiff/related directly when given an AutoEnzyme or other object, and DI.gradient/related for other non-specialized ADTypes. This lets users get the best of both worlds, performance and compatibility when available, and general support for AD packages, as well as all the nice things like sparsity that DI provides.

I'm all up for fixing issues as they arise, but also in the case where we have a backend which is more performant and stable, we should use it and not force the burden of debugging our packages on users, when unnecessary.

Often times, someone will try something once, and if it fails or runs unnecessarily slow they'll just never use things again.

I get that your goal is to try to have as many packages as possible start using DI, but there's more than enough from for both DI and Enzyme extensions in the Julia autodiff ecosystem! :)

wsmoses

Author

As for the issue you opened seemingly after my initial comment, the MWE implies that DI is introducing type instability through the splat operator. This may result in performance slowdowns as well as some code no longer being able to be differentiated (as is clearly the case there).

I'll quickly look into fixing it, but also historically we have been advising people to remove type instabilities (as well as unions), which here would require not using DI

gdalle

Collaborator

In other cases, where a package dispatches to an autodiff backend, it makes sense to call Enzyme directly.

My main message is that this doesn't always make sense. There is a convenience tradeoff between (1) using DI for everything and (2) using DI for everything except Enzyme + using Enzyme's native interface. And of course option (2) seems easy to wou because you know Enzyme inside and out, but that's not the case for most users, even power users from SciML.

The fact that the other day we saw a segfault when using AutoEnzyme in DI for a small docs example where Enzyme directly worked is rather worrying to me

This was a very specific case due to JuMP's unusual requirement that arguments should be splatted into the function f(x...). In most other optimization settings, the input x is a vector or array that doesn't need to be splatted into individual numbers before being passed to f. So I wouldn't really make such a big deal out of it, but we can add a warning to that JuMP page mentioning this caveat.

Still, whenever you say "DI is problematic because it can segfault on Enzyme", it also implies "Enzyme itself is problematic because it can segfault". Indeed, the pure Enzyme MWE for splatting shows that this type instability is not handled gracefully, and just crashes the console. Sure, Enzyme provides a multi-argument way around this bug (which is inaccessible through DI's single-argument interface), but it remains an Enzyme bug because it doesn't happen with other backends.

presently the overhead of DI might be making problems for Enzyme users more generically.

On the other hand, DI also allows users to experiment with Enzyme at virtually no cost. Until LinearSolve switched to DI, it had no way of using Enzyme for its internal Jacobians. Now it's as simple as a backend argument switch, because the necessary bindings are already in DI.

As a result, packages have been usually adopting a dual approach, calling Enzyme.autodiff/related directly when given an AutoEnzyme or other object, and DI.gradient/related for other non-specialized ADTypes.

But then what's the point of going through all the trouble of supporting Enzyme in DI, if you're just gonna go around telling people not to use it that way?
LinearSolve is a classic example of case where using Enzyme through DI should be straightforward: array in, array out, single active argument. Do you have benchmarks or errors to show that DI is insufficient here? Not in JuMP, in this very repo.

This lets users get the best of both worlds, performance and compatibility when available, and general support for AD packages, as well as all the nice things like sparsity that DI provides.

Fair enough, so how would you handle AutoSparse{AutoEnzyme} with Enzyme's native API? Because that's a big part of what LinearSolve needs, and DI gives you that for free.

I'm all up for fixing issues as they arise, but also in the case where we have a backend which is more performant and stable, we should use it and not force the burden of debugging our packages on users, when unnecessary.

Except that in doing so, we force another burden on the users: maintenance of tens of copies of essentially identical Enzyme extensions. My hope is that, if we keep putting our minds together, we could just write this extension once in DI and be good enough for most use cases.

adrhill

Contributor

I think the last bit addressing the "maintenance of tens of copies of essentially identical Enzyme extensions" hits the nail on the head. If maintainers had unlimited time, adding individual package extensions would be great. The appeal of DI is the centralization of that maintenance burden into a single package. We keep up with any breaking changes in any backends for package developers, freeing up dev time.

Adding specialized package extensions on top of DI is probably a valuable approach for performance critical hot loops. Individual package developers will have to decide whether the gained performance-delta is worth the increased maintenance burden. Here, Avik and Chris will have to make that call.

At the end of the day, to advance the Julia AD ecosystem, that maintenance burden should be centralized as much as possible. DI made big steps toward Enzyme with the new argument activities.

wsmoses

Author

In other cases, where a package dispatches to an autodiff backend, it makes sense to call Enzyme directly.

My main message is that this doesn't always make sense. There is a convenience tradeoff between (1) using DI for everything and (2) using DI for everything except Enzyme + using Enzyme's native interface. And of course option (2) seems easy to wou because you know Enzyme inside and out, but that's not the case for most users, even power users from SciML.

Oh for sure, but my argument here isn't that users ought use Enzyme directly (though that also may be wise/helpful), but libraries that already hardcode a bunch of autodiff support should. Such libraries already have a bunch of code for various AD tools, so there is already precedence for having such code around (and of course users all benefit without writing additional code).

I guess I bucket library devs and end users in two separate categories.

Still, whenever you say "DI is problematic because it can segfault on Enzyme", it also implies "Enzyme itself is problematic because it can segfault". Indeed, the pure Enzyme MWE for splatting shows that this type instability is not handled gracefully, and just crashes the console. Sure, Enzyme provides a multi-argument way around this bug (which is inaccessible through DI's single-argument interface), but it remains an Enzyme bug because it doesn't happen with other backends.

Yeah for sure, but also if the use of DI makes it more likely to hit Enzyme issues, its natural to just use Enzyme directly, no?

On the other hand, DI also allows users to experiment with Enzyme at virtually no cost. Until LinearSolve switched to DI, it had no way of using Enzyme for its internal Jacobians. Now it's as simple as a backend argument switch, because the necessary bindings are already in DI.

Oh definitely and that's one of the most significant advantages of DI (both swapping out backends, and also the sparse support). I'm only suggesting here that Nonlinearsolve.jl add some special cases when something is calling Dense Enzyme to improve performance/compatibility.

But then what's the point of going through all the trouble of supporting Enzyme in DI, if you're just gonna go around telling people not to use it that way? LinearSolve is a classic example of case where using Enzyme through DI should be straightforward: array in, array out, single active argument. Do you have benchmarks or errors to show that DI is insufficient here? Not in JuMP, in this very repo.

I think you're conflating LinearSolve and NonlinearSolve. Like I said above "In some cases, like LinearSolve.jl, that's an extension to add EnzymeRules to simplify the code being differentiated (also learning to better performance)", some repos need an enzyme extension for a custom rule (https://github.com/SciML/LinearSolve.jl/blob/main/ext/LinearSolveEnzymeExt.jl). This was needed to get some things differentiating at the time (that now might work without). My guess it's that it's like 2-5x (and possibly more with threading on) with the Enzyme extension?

Fair enough, so how would you handle AutoSparse{AutoEnzyme} with Enzyme's native API? Because that's a big part of what LinearSolve needs, and DI gives you that for free.

Yeah I wouldn't special case this, just dense. Sparse can always call DI (unless say a future sparse backend wants to specialize).

gdalle

Collaborator

I guess I bucket library devs and end users in two separate categories.

Yeah that might be the main difference between our mindsets, because I see library devs as users of AD systems ;) But I get where you come from.

Yeah for sure, but also if the use of DI makes it more likely to hit Enzyme issues, its natural to just use Enzyme directly, no?

All I'm saying is that this needs to be decided on a case by case basis and with examples, to justify the cost of additional implementations living side by side.

I think you're conflating LinearSolve and NonlinearSolve. Like I said above "In some cases, like LinearSolve.jl, that's an extension to add EnzymeRules to simplify the code being differentiated (also learning to better performance)", some repos need an enzyme extension for a custom rule

Sorry yes I meant NonlinearSolve in my remark. Of course extensions for rule systems are still warranted and essential, like those in LinearSolve. This discussion was solely focused on extensions for calling into AD, not for making stuff AD-compatible.

wsmoses

Author

Yeah for sure, but also if the use of DI makes it more likely to hit Enzyme issues, its natural to just use Enzyme directly, no?

All I'm saying is that this needs to be decided on a case by case basis and with examples, to justify the cost of additional implementations living side by side.

Sure, but at this point having seen segfaults in the wild for some tiny example code (in addition to the performance issues discussed slack), at least my default is that there should be a separate Enzyme backend unless DI can be shown to be @code_typed equivalent to relevant Enzyme calls.

In other words, if the function DI passes to Enzyme autodiff utilities is equivalent to the original user function (and not with a closure, unwrapping, etc that could dramatically change how julia compiles the code and add type instabilities, drop alias info, etc) the default being not having an Enzyme extension is reasonable, if there is an indirection, it should default to using Enzyme directly (since that indirection, like above, can frequently be the source of Enzyme failing to differentiate a function).

The reason I'd push this is because Enzyme has a defined scope of Julia code that it works on. Adding indirection can cause Julia to compile code outside of that scope, causing crashes (like above), in addition to all the perf issues. I agree in the long term it would be nice for Enzyme to handle all code always, but otherwise that's equivalent to asking for significantly more feature dev for something which is already supported natively.

Our goal was explicitly to start with a small but well defined set of code (e.g. originally just code you could use in a @cuda kernel from GPUCompiler), and do exceptionally well on it. This lets us organically grow an ecosystem of users who can use Enzyme without being stuck trying to "boil the ocean" before anything done. We've been growing that scope with time, but again if something is likely to cause code to move outside of it, I'd recommend someone to use the version which works (and open an issue).

gdalle

Collaborator

my default is that there should be a separate Enzyme backend unless DI can be shown to be @code_typed-equivalent to relevant Enzyme calls.

Let me rephrase that the way I see it:

"Given infinite developer, money and time resources, provided everyone is perfectly at ease with autodiff in general and Enzyme in particular, and assuming that optimal performance in every single case is worth more than convenience, conciseness and maintainability, then there should be a separate Enzyme backend unless DI can be shown to be @code_typed-equivalent to relevant Enzyme calls."

But in the current situation of the Julia ecosystem, I think this is a completely unreasonable request, and it would be essentially equivalent to

const DifferentiationInterface = Enzyme

Most users and package devs don't even need a fraction of what you're demanding for DI + Enzyme to be useful.

I'm going to stop the conversation here because it makes me sad and I admire your work too much to keep fighting. In any case, it's up to package developers to decide what they want to do with the tools that we offer.

wsmoses

Author

Most users and package devs don't even need a fraction of what you're demanding for DI + Enzyme to be useful.

But that's exactly my point!

I'm just trying to propose a threshold that would indicate where an extension would be useful or not, and catch the relevant usage and performance bugs.

To be clear, my biggest concern here is not performance -- but code which fails with DI but works with Enzyme directly (performance is good too but not crashing should be the first priority).

My argument is that if the extra indirection is added, it probably presents issues (and should be catchable). But most cases this isn't the case.

So let's see how it works as a rule of thumb on some samples (again not sure if its the best rule of thumb, but its what I could think of just now):

using ADTypes
import DifferentiationInterface as DI

# Note I modified Enzyme.autodiff with noinline so we could see if and how the call occured (without getting inlined into an llvmcall at which point the info is lost)
# Specifically
#  @noinline function autodiff(
#     rmode::ReverseMode{ReturnPrimal,RuntimeActivity,RABI,Holomorphic,ErrIfFuncWritten},
#     f::FA,
#     ::Type{A},
#     args::Vararg{Annotation,Nargs},
#
# and 
# 
# @noinline function autodiff(
#     ::ForwardMode{ReturnPrimal,RABI,ErrIfFuncWritten,RuntimeActivity},
#     f::FA,
#     ::Type{A},
#     args::Vararg{Annotation,Nargs},

using Enzyme: Enzyme

x = [2.0, 3.0]

a = [3.0, 4.0]

function mul(a, b)
    dot(a, b)
end

function grad(f::F, a, x) where F
    DI.gradient(f, AutoEnzyme(), x, DI.Constant(a))
end

@code_typed grad(mul, a, x)

julia> @code_typed grad(mul, a, x)
# 
# CodeInfo(
# 1 ── %1  = Base.arraysize(x, 1)::Int64
# │    %2  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Float64}, svec(Any, Int64), 0, :(:ccall), Vector{Float64}, :(%1), :(%1)))::Vector{Float64}
# │    %3  = Base.arraysize(%2, 1)::Int64
# │    %4  = Base.slt_int(%3, 0)::Bool
# │    %5  = Core.ifelse(%4, 0, %3)::Int64
# │    %6  = Base.slt_int(%5, 1)::Bool
# └───       goto #3 if not %6
# 2 ──       goto #4
# 3 ──       goto #4
# 4 ┄─ %10 = φ (#2 => true, #3 => false)::Bool
# │    %11 = φ (#3 => 1)::Int64
# │    %12 = φ (#3 => 1)::Int64
# │    %13 = Base.not_int(%10)::Bool
# └───       goto #10 if not %13
# 5 ┄─ %15 = φ (#4 => %11, #9 => %23)::Int64
# │    %16 = φ (#4 => %12, #9 => %24)::Int64
# │          Base.arrayset(false, %2, 0.0, %15)::Vector{Float64}
# │    %18 = (%16 === %5)::Bool
# └───       goto #7 if not %18
# 6 ──       goto #8
# 7 ── %21 = Base.add_int(%16, 1)::Int64
# └───       goto #8
# 8 ┄─ %23 = φ (#7 => %21)::Int64
# │    %24 = φ (#7 => %21)::Int64
# │    %25 = φ (#6 => true, #7 => false)::Bool
# │    %26 = Base.not_int(%25)::Bool
# └───       goto #10 if not %26
# 9 ──       goto #5
# 10 ┄       goto #11
# 11 ─       goto #12
# 12 ─       goto #13
# 13 ─ %32 = %new(Duplicated{Vector{Float64}}, x, %2)::Duplicated{Vector{Float64}}
# │    %33 = %new(Const{Vector{Float64}}, a)::Const{Vector{Float64}}
# │          invoke Enzyme.autodiff($(QuoteNode(ReverseMode{false, false, FFIABI, false, true}()))::ReverseMode{false, false, FFIABI, false, true}, $(QuoteNode(Const{typeof(mul)}(mul)))::Const{typeof(mul)}, Active::Type{Active}, %32::Duplicated{Vector{Float64}}, %33::Const{Vector{Float64}})::Tuple{Tuple{Nothing, Nothing}}
# └───       goto #14
# 14 ─       return %2
# ) => Vector{Float64}

We can see that mul was directly send to an Enzyme call so this should be fine as a rule of thumb.

Let's look at the multi arg case earlier (which is fixed above by DI support), to confirm that its absence would trigger the rule of thumb.

function grad2(mul, a, x)
    DI.gradient(Base.Fix1(mul, a), AutoEnzyme(), x)
end

julia> @code_typed grad2(mul, a, x)
# CodeInfo(
# 1 ── %1  = %new(Base.Fix1{typeof(mul), Vector{Float64}}, mul, a)::Base.Fix1{typeof(mul), Vector{Float64}}
# │    %2  = Base.arraysize(x, 1)::Int64
# │    %3  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Float64}, svec(Any, Int64), 0, :(:ccall), Vector{Float64}, :(%2), :(%2)))::Vector{Float64}
# │    %4  = Base.arraysize(%3, 1)::Int64
# │    %5  = Base.slt_int(%4, 0)::Bool
# │    %6  = Core.ifelse(%5, 0, %4)::Int64
# │    %7  = Base.slt_int(%6, 1)::Bool
# └───       goto #3 if not %7
# 2 ──       goto #4
# 3 ──       goto #4
# 4 ┄─ %11 = φ (#2 => true, #3 => false)::Bool
# │    %12 = φ (#3 => 1)::Int64
# │    %13 = φ (#3 => 1)::Int64
# │    %14 = Base.not_int(%11)::Bool
# └───       goto #10 if not %14
# 5 ┄─ %16 = φ (#4 => %12, #9 => %24)::Int64
# │    %17 = φ (#4 => %13, #9 => %25)::Int64
# │          Base.arrayset(false, %3, 0.0, %16)::Vector{Float64}
# │    %19 = (%17 === %6)::Bool
# └───       goto #7 if not %19
# 6 ──       goto #8
# 7 ── %22 = Base.add_int(%17, 1)::Int64
# └───       goto #8
# 8 ┄─ %24 = φ (#7 => %22)::Int64
# │    %25 = φ (#7 => %22)::Int64
# │    %26 = φ (#6 => true, #7 => false)::Bool
# │    %27 = Base.not_int(%26)::Bool
# └───       goto #10 if not %27
# 9 ──       goto #5
# 10 ┄       goto #11
# 11 ─       goto #12
# 12 ─       goto #13
# 13 ─ %33 = %new(Duplicated{Vector{Float64}}, x, %3)::Duplicated{Vector{Float64}}
# │    %34 = %new(Const{Base.Fix1{typeof(mul), Vector{Float64}}}, %1)::Const{Base.Fix1{typeof(mul), Vector{Float64}}}
# │          invoke Enzyme.autodiff($(QuoteNode(ReverseMode{false, false, FFIABI, false, true}()))::ReverseMode{false, false, FFIABI, false, true}, %34::Const{Base.Fix1{typeof(mul), Vector{Float64}}}, Active::Type{Active}, %33::Duplicated{Vector{Float64}})::Tuple{Tuple{Nothing}}
# └───       goto #14
# 14 ─       return %3
# ) => Vector{Float64}
#

We can see that this doesn't directly forward mul into the call, instead passing in Base.Fix1{typeof(mul), Vector{Float64}}} and might cause problems under our rule of thumb. Thus, libraries that require Fix1/etc to use DI probably need an Enzyme Ext.

Let's try the hessian case causing issues

f(x::T...) where {T} = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
f_nosplat(x::AbstractVector) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2

function h1(x)
    backend = AutoEnzyme()
    DI.hessian(f_nosplat, backend, x)  # works
end

function h2(x)
    DI.hessian(splat(f), backend, x)  # segfaults
end

julia> @code_typed h1(x)
CodeInfo(
1 ─ %1 = invoke DifferentiationInterface._prepare_hessian_aux($(QuoteNode(Val{16}()))::Val{16}, f_nosplat::typeof(f_nosplat), $(QuoteNode(AutoEnzyme()))::AutoEnzyme{Nothing, Nothing}, x::Vector{Float64})::DifferentiationInterface.HVPGradientHessianPrep{16, NTuple{16, Vector{Float64}}, NTuple{16, Vector{Float64}}, DifferentiationInterface.ForwardOverReverseHVPPrep{DifferentiationInterface.var"#inner_gradient#46"{typeof(f_nosplat), AutoEnzyme{Nothing, Nothing}, DifferentiationInterface.Rewrap{0, Tuple{}}}, DifferentiationInterface.NoPushforwardPrep}, DifferentiationInterfaceEnzymeExt.EnzymeGradientPrep{Vector{Float64}}}
│   %2 = invoke DifferentiationInterface.hessian(f_nosplat::typeof(f_nosplat), %1::DifferentiationInterface.HVPGradientHessianPrep{16, NTuple{16, Vector{Float64}}, NTuple{16, Vector{Float64}}, DifferentiationInterface.ForwardOverReverseHVPPrep{DifferentiationInterface.var"#inner_gradient#46"{typeof(f_nosplat), AutoEnzyme{Nothing, Nothing}, DifferentiationInterface.Rewrap{0, Tuple{}}}, DifferentiationInterface.NoPushforwardPrep}, DifferentiationInterfaceEnzymeExt.EnzymeGradientPrep{Vector{Float64}}}, $(QuoteNode(AutoEnzyme()))::AutoEnzyme{Nothing, Nothing}, x::Vector{Float64})::Matrix{Float64}
└──      return %2
) => Matrix{Float64}

okay this isn't terribly helpful so let's recurse in a bit to the inner calls

function h1(f::F, x, dx) where F
    @inline DI.hvp(f, AutoEnzyme(), x, dx)
end

@code_typed h1(f_nosplat, x, (x,))

CodeInfo(
1 ── %1  = Base.getfield(dx, 1, true)::Vector{Float64}
│    %2  = %new(Duplicated{Vector{Float64}}, x, %1)::Duplicated{Vector{Float64}}
│    %3  = invoke Enzyme.autodiff($(QuoteNode(ForwardMode{false, FFIABI, true, false}()))::ForwardMode{false, FFIABI, true, false}, $(QuoteNode(Const{DifferentiationInterface.var"#inner_gradient#46"{typeof(f_nosplat), AutoEnzyme{Nothing, Nothing}, DifferentiationInterface.Rewrap{0, Tuple{}}}}(DifferentiationInterface.var"#inner_gradient#46"{typeof(f_nosplat), AutoEnzyme{Nothing, Nothing}, DifferentiationInterface.Rewrap{0, Tuple{}}}(f_nosplat, AutoEnzyme(), DifferentiationInterface.Rewrap{0, Tuple{}}(())))))::Const{DifferentiationInterface.var"#inner_gradient#46"{typeof(f_nosplat), AutoEnzyme{Nothing, Nothing}, DifferentiationInterface.Rewrap{0, Tuple{}}}}, Duplicated{Vector{Float64}}::Type{Duplicated{Vector{Float64}}}, %2::Duplicated{Vector{Float64}})::@NamedTuple{1::Vector{Float64}}
└───       goto #3
2 ──       nothing::Nothing
3 ┄─ %6  = Base.getfield(%3, 1)::Vector{Float64}
└───       goto #4
4 ──       goto #5
5 ──       goto #6
6 ──       goto #7
7 ── %11 = Core.tuple(%6)::Tuple{Vector{Float64}}
└───       goto #8
8 ──       goto #9
9 ──       goto #10
10 ─       return %11
) => Tuple{Vector{Float64}}

This indicates that the wrapper of DI.inner_gradient could be an issue, meriting further investigation.

It looks like this is a closure capturing the function argument which actually might provide problems for user code. Let's reimplement from DI: https://github.com/gdalle/DifferentiationInterface.jl/blob/efb5acf1f3df51cd5f85edc7fb693703a4cd5bf0/DifferentiationInterface/src/second_order/hvp.jl#L91

In particular, let's recurse a bit more to debug and try to fix.

# copying from DI
#     function inner_gradient(_x, unannotated_contexts...)
#        annotated_contexts = rewrap(unannotated_contexts...)
#        return gradient(f, nested(inner(backend)), _x, annotated_contexts...)
#     end


function h1(f::F, x, dx) where F
    prep = @inline DI.prepare_hvp(f, AutoEnzyme(), x, (x,))
    @inline DI.hvp(f, prep, AutoEnzyme(), x, dx)
end

@code_typed h1(f_nosplat, x, (x,))

# ...
# much more inlining and debugging later
# 

function my_inner_gradient(_x, f::F) where F
    return DI.gradient(f, AutoEnzyme(), _x)
end

function my_hvp(f::F, _x, dx) where F
    return (DI.pushforward(my_inner_gradient, DI.NoPushforwardPrep(), AutoEnzyme(), _x, dx, DI.Constant(f)),)
end

function my_h1(x, dx)
    my_hvp(f_nosplat, x, dx)
end
@code_typed my_h1(x, (x,))

# CodeInfo(
# 1 ─ %1  = Base.getfield(dx, 1, true)::Vector{Float64}
# │   %2  = %new(Duplicated{Vector{Float64}}, x, %1)::Duplicated{Vector{Float64}}
# │   %3  = invoke Enzyme.autodiff($(QuoteNode(ForwardMode{false, FFIABI, true, false}()))::ForwardMode{false, FFIABI, true, false}, $(QuoteNode(Const{typeof(my_inner_gradient)}(my_inner_gradient)))::Const{typeof(my_inner_gradient)}, Duplicated{Vector{Float64}}::Type{Duplicated{Vector{Float64}}}, %2::Duplicated{Vector{Float64}}, $(QuoteNode(Const{typeof(f_nosplat)}(f_nosplat)))::Const{typeof(f_nosplat)})::@NamedTuple{1::Vector{Float64}}
# └──       goto #3
# 2 ─       nothing::Nothing
# 3 ┄ %6  = Base.getfield(%3, 1)::Vector{Float64}
# └──       goto #4
# 4 ─       goto #5
# 5 ─       goto #6
# 6 ─       goto #7
# 7 ─ %11 = Core.tuple(%6)::Tuple{Vector{Float64}}
# └──       goto #8
# 8 ─ %13 = Core.tuple(%11)::Tuple{Tuple{Vector{Float64}}}
# └──       goto #9
# 9 ─       return %13
# ) => Tuple{Tuple{Vector{Float64}}}

Great, with our fix looks like we're directly passing in functions without creating closures/indirection! This is probably good to go! (opened an issue on DI to fix this JuliaDiff/DifferentiationInterface.jl#555

Let's check out my_h2

function my_h2(x, dx)
    my_hvp(splat(f), x, dx)
end
@code_typed my_h2(x, (x,))
# julia> @code_typed my_h2(x, (x,))
# CodeInfo(
# 1 ─ %1  = Base.getfield(dx, 1, true)::Vector{Float64}
# │   %2  = %new(Duplicated{Vector{Float64}}, x, %1)::Duplicated{Vector{Float64}}
# │   %3  = invoke Enzyme.autodiff($(QuoteNode(ForwardMode{false, FFIABI, true, false}()))::ForwardMode{false, FFIABI, true, false}, $(QuoteNode(Const{typeof(my_inner_gradient)}(my_inner_gradient)))::Const{typeof(my_inner_gradient)}, Duplicated{Vector{Float64}}::Type{Duplicated{Vector{Float64}}}, %2::Duplicated{Vector{Float64}}, $(QuoteNode(Const{Base.Splat{typeof(f)}}(splat(f))))::Const{Base.Splat{typeof(f)}})::@NamedTuple{1::Vector{Float64}}
# └──       goto #3
# 2 ─       nothing::Nothing
# 3 ┄ %6  = Base.getfield(%3, 1)::Vector{Float64}
# └──       goto #4
# 4 ─       goto #5
# 5 ─       goto #6
# 6 ─       goto #7
# 7 ─ %11 = Core.tuple(%6)::Tuple{Vector{Float64}}
# └──       goto #8
# 8 ─ %13 = Core.tuple(%11)::Tuple{Tuple{Vector{Float64}}}
# └──       goto #9
# 9 ─       return %13
# ) => Tuple{Tuple{Vector{Float64}}}
#

Yeah so it looks like we're sending in a closure Const{Base.Splat{typeof(f)}}(splat(f))) to autodiff, so this could (and does) cause more problems

wsmoses

mentioned this

on Oct 8, 2024

Closure created for hvp JuliaDiff/DifferentiationInterface.jl#555

gdalle

Collaborator

Thank you for this analysis, it will be helpful to improve DI!

wsmoses

Author

To be clear, this is the code_typed analysis I mentioned above as the back of envelope threshold for whether an Enzyme Ext makes sense in addition to DI.

Obviously there will be mitigating factors in either direction (code re-use, extreme performance/compat), but this is what I'm suggesting be done in a way that generalizes to the types of arguments/functions intended for use. If it passes the smoke test, an Enzyme Ext probably isn't necessary, if not it probably does.

However, as seen in the past few days, even seemingly simple use cases can fail this (including DI internally shown above not needing the closure). Thus my personal default is that it probably is necessary to have an extension until this analysis is performed (which would probably pass for most users which have simpler cases).

I am in no way trying to undermine DI. I am, however, trying to keep users from having broken (first priority) or slow (second priority) code with the default setup.

gdalle

Collaborator

To be clear, this is the code_typed analysis I mentioned above as the back of envelope threshold for whether an Enzyme Ext makes sense in addition to DI.

I had interpreted your requirement as "DI should generate the exact same @code_typed as Enzyme to the semicolon", which will obviously be impossible. But if it's only about passing the right function and argument annotations to the Enzyme call without a closure, that seems much more reasonable indeed.

I am in no way trying to undermine DI. I am, however, trying to keep users from having broken (first priority) or slow (second priority) code with the default setup.

Thank you for clarifying. I think in the end we are mostly in agreement, except on the default recommendation:

Mine is to start with DI, and if Enzyme problems are detected, add an Enzyme extension.
Yours is to start with an Enzyme extension, and if the comparison with DI is good enough, remove it afterwards.

Both are understandable from our respective standpoints, I think they will simply be appealing to different categories of devs and that's okay.

gdalle

Collaborator

And I really appreciate you taking the time to dive into DI internals and debug, the same way I try to do it for Enzyme. I do think we can end up with something pretty decent with some more elbow grease.

33 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add direct Enzyme support #476

33 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Add direct Enzyme support #476

Description

Activity

gdalle commented on Oct 8, 2024

Footnotes

wsmoses commented on Oct 8, 2024

wsmoses commented on Oct 8, 2024

gdalle commented on Oct 8, 2024

adrhill commented on Oct 8, 2024

wsmoses commented on Oct 8, 2024

gdalle commented on Oct 8, 2024

wsmoses commented on Oct 8, 2024

gdalle commented on Oct 8, 2024

wsmoses commented on Oct 8, 2024

gdalle commented on Oct 8, 2024

wsmoses commented on Oct 8, 2024

gdalle commented on Oct 8, 2024

gdalle commented on Oct 8, 2024

33 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions