AutoEnzyme should probably be specialized and not fall back to DI.
In addition to being slower in some cases, it's been shown to cause errors (even segfaults) when using AutoEnzyme in DI whereas using Enzyme directly ran successfully.
see jump-dev/JuMP.jl#3836 (comment)
Activity
gdalle commentedon Oct 8, 2024
How about we try to fix bugs together in DI instead of always undermining it? We could start with this one you mention: EnzymeAD/Enzyme.jl#1942 (I managed to reproduce it with pure Enzyme, independently from DI).
I understand your concerns about Enzyme performance / correctness being misrepresented by DI. But whenever someone wants access to multiple AD backends, as is the case for most SciML libraries, DI is the natural way to go. It offers unified syntax, and it can do things that no backend on its own can do at the moment, like sparse autodiff1.
Besides, the whole idea is to avoid duplication of efforts, so that bindings only have to be written in one place.
If you show me a meaningful example where LinearSolve fails with DI + Enzyme (I have no doubt there are plenty), I'll be happy to try and fix it. I have already altered DI's design significantly to fit Enzyme's peculiarities (e.g. by introducing
Constantarguments). I can alter it again, but it would be more pleasant to work together instead of against each other.Footnotes
I know about Spadina but it seems there is no Julia interface yet? ↩
wsmoses commentedon Oct 8, 2024
Oh for sure, and I'm not at all saying to remove DI.
It's a fantastic package that makes a lot of things easier, some of which you describe above!
But also at the end of the day we both want to make things easier for our users. From my point of view, we should help the ecosystem by writing extensions/examples for Enzyme that maximize performance and compatibility. I don't see how this is different from you opening up PR's/issues on various repos asking if DI can be helpful?
In some cases, like LinearSolve.jl, that's an extension to add EnzymeRules to simplify the code being differentiated (also learning to better performance).
In other cases, where a package dispatches to an autodiff backend, it makes sense to call Enzyme directly.
The fact that the other day we saw a segfault when using AutoEnzyme in DI for a small docs example where Enzyme directly worked is rather worrying to me and implies that presently the overhead of DI might be making problems for Enzyme users more generically.
As a result, packages have been usually adopting a dual approach, calling Enzyme.autodiff/related directly when given an AutoEnzyme or other object, and DI.gradient/related for other non-specialized ADTypes. This lets users get the best of both worlds, performance and compatibility when available, and general support for AD packages, as well as all the nice things like sparsity that DI provides.
I'm all up for fixing issues as they arise, but also in the case where we have a backend which is more performant and stable, we should use it and not force the burden of debugging our packages on users, when unnecessary.
Often times, someone will try something once, and if it fails or runs unnecessarily slow they'll just never use things again.
I get that your goal is to try to have as many packages as possible start using DI, but there's more than enough from for both DI and Enzyme extensions in the Julia autodiff ecosystem! :)
wsmoses commentedon Oct 8, 2024
As for the issue you opened seemingly after my initial comment, the MWE implies that DI is introducing type instability through the splat operator. This may result in performance slowdowns as well as some code no longer being able to be differentiated (as is clearly the case there).
I'll quickly look into fixing it, but also historically we have been advising people to remove type instabilities (as well as unions), which here would require not using DI
gdalle commentedon Oct 8, 2024
My main message is that this doesn't always make sense. There is a convenience tradeoff between (1) using DI for everything and (2) using DI for everything except Enzyme + using Enzyme's native interface. And of course option (2) seems easy to wou because you know Enzyme inside and out, but that's not the case for most users, even power users from SciML.
This was a very specific case due to JuMP's unusual requirement that arguments should be splatted into the function
f(x...). In most other optimization settings, the inputxis a vector or array that doesn't need to be splatted into individual numbers before being passed tof. So I wouldn't really make such a big deal out of it, but we can add a warning to that JuMP page mentioning this caveat.Still, whenever you say "DI is problematic because it can segfault on Enzyme", it also implies "Enzyme itself is problematic because it can segfault". Indeed, the pure Enzyme MWE for splatting shows that this type instability is not handled gracefully, and just crashes the console. Sure, Enzyme provides a multi-argument way around this bug (which is inaccessible through DI's single-argument interface), but it remains an Enzyme bug because it doesn't happen with other backends.
On the other hand, DI also allows users to experiment with Enzyme at virtually no cost. Until LinearSolve switched to DI, it had no way of using Enzyme for its internal Jacobians. Now it's as simple as a backend argument switch, because the necessary bindings are already in DI.
But then what's the point of going through all the trouble of supporting Enzyme in DI, if you're just gonna go around telling people not to use it that way?
LinearSolve is a classic example of case where using Enzyme through DI should be straightforward: array in, array out, single active argument. Do you have benchmarks or errors to show that DI is insufficient here? Not in JuMP, in this very repo.
Fair enough, so how would you handle
AutoSparse{AutoEnzyme}with Enzyme's native API? Because that's a big part of what LinearSolve needs, and DI gives you that for free.Except that in doing so, we force another burden on the users: maintenance of tens of copies of essentially identical Enzyme extensions. My hope is that, if we keep putting our minds together, we could just write this extension once in DI and be good enough for most use cases.
adrhill commentedon Oct 8, 2024
I think the last bit addressing the "maintenance of tens of copies of essentially identical Enzyme extensions" hits the nail on the head. If maintainers had unlimited time, adding individual package extensions would be great. The appeal of DI is the centralization of that maintenance burden into a single package. We keep up with any breaking changes in any backends for package developers, freeing up dev time.
Adding specialized package extensions on top of DI is probably a valuable approach for performance critical hot loops. Individual package developers will have to decide whether the gained performance-delta is worth the increased maintenance burden. Here, Avik and Chris will have to make that call.
At the end of the day, to advance the Julia AD ecosystem, that maintenance burden should be centralized as much as possible. DI made big steps toward Enzyme with the new argument activities.
wsmoses commentedon Oct 8, 2024
Oh for sure, but my argument here isn't that users ought use Enzyme directly (though that also may be wise/helpful), but libraries that already hardcode a bunch of autodiff support should. Such libraries already have a bunch of code for various AD tools, so there is already precedence for having such code around (and of course users all benefit without writing additional code).
I guess I bucket library devs and end users in two separate categories.
Yeah for sure, but also if the use of DI makes it more likely to hit Enzyme issues, its natural to just use Enzyme directly, no?
Oh definitely and that's one of the most significant advantages of DI (both swapping out backends, and also the sparse support). I'm only suggesting here that Nonlinearsolve.jl add some special cases when something is calling Dense Enzyme to improve performance/compatibility.
I think you're conflating LinearSolve and NonlinearSolve. Like I said above "In some cases, like LinearSolve.jl, that's an extension to add EnzymeRules to simplify the code being differentiated (also learning to better performance)", some repos need an enzyme extension for a custom rule (https://github.com/SciML/LinearSolve.jl/blob/main/ext/LinearSolveEnzymeExt.jl). This was needed to get some things differentiating at the time (that now might work without). My guess it's that it's like 2-5x (and possibly more with threading on) with the Enzyme extension?
Yeah I wouldn't special case this, just dense. Sparse can always call DI (unless say a future sparse backend wants to specialize).
gdalle commentedon Oct 8, 2024
Yeah that might be the main difference between our mindsets, because I see library devs as users of AD systems ;) But I get where you come from.
All I'm saying is that this needs to be decided on a case by case basis and with examples, to justify the cost of additional implementations living side by side.
Sorry yes I meant NonlinearSolve in my remark. Of course extensions for rule systems are still warranted and essential, like those in LinearSolve. This discussion was solely focused on extensions for calling into AD, not for making stuff AD-compatible.
wsmoses commentedon Oct 8, 2024
Sure, but at this point having seen segfaults in the wild for some tiny example code (in addition to the performance issues discussed slack), at least my default is that there should be a separate Enzyme backend unless DI can be shown to be @code_typed equivalent to relevant Enzyme calls.
In other words, if the function DI passes to Enzyme autodiff utilities is equivalent to the original user function (and not with a closure, unwrapping, etc that could dramatically change how julia compiles the code and add type instabilities, drop alias info, etc) the default being not having an Enzyme extension is reasonable, if there is an indirection, it should default to using Enzyme directly (since that indirection, like above, can frequently be the source of Enzyme failing to differentiate a function).
The reason I'd push this is because Enzyme has a defined scope of Julia code that it works on. Adding indirection can cause Julia to compile code outside of that scope, causing crashes (like above), in addition to all the perf issues. I agree in the long term it would be nice for Enzyme to handle all code always, but otherwise that's equivalent to asking for significantly more feature dev for something which is already supported natively.
Our goal was explicitly to start with a small but well defined set of code (e.g. originally just code you could use in a @cuda kernel from GPUCompiler), and do exceptionally well on it. This lets us organically grow an ecosystem of users who can use Enzyme without being stuck trying to "boil the ocean" before anything done. We've been growing that scope with time, but again if something is likely to cause code to move outside of it, I'd recommend someone to use the version which works (and open an issue).
gdalle commentedon Oct 8, 2024
Let me rephrase that the way I see it:
"Given infinite developer, money and time resources, provided everyone is perfectly at ease with autodiff in general and Enzyme in particular, and assuming that optimal performance in every single case is worth more than convenience, conciseness and maintainability, then there should be a separate Enzyme backend unless DI can be shown to be
@code_typed-equivalent to relevant Enzyme calls."But in the current situation of the Julia ecosystem, I think this is a completely unreasonable request, and it would be essentially equivalent to
Most users and package devs don't even need a fraction of what you're demanding for DI + Enzyme to be useful.
I'm going to stop the conversation here because it makes me sad and I admire your work too much to keep fighting. In any case, it's up to package developers to decide what they want to do with the tools that we offer.
wsmoses commentedon Oct 8, 2024
But that's exactly my point!
I'm just trying to propose a threshold that would indicate where an extension would be useful or not, and catch the relevant usage and performance bugs.
To be clear, my biggest concern here is not performance -- but code which fails with DI but works with Enzyme directly (performance is good too but not crashing should be the first priority).
My argument is that if the extra indirection is added, it probably presents issues (and should be catchable). But most cases this isn't the case.
So let's see how it works as a rule of thumb on some samples (again not sure if its the best rule of thumb, but its what I could think of just now):
We can see that mul was directly send to an Enzyme call so this should be fine as a rule of thumb.
Let's look at the multi arg case earlier (which is fixed above by DI support), to confirm that its absence would trigger the rule of thumb.
We can see that this doesn't directly forward mul into the call, instead passing in Base.Fix1{typeof(mul), Vector{Float64}}} and might cause problems under our rule of thumb. Thus, libraries that require Fix1/etc to use DI probably need an Enzyme Ext.
Let's try the hessian case causing issues
okay this isn't terribly helpful so let's recurse in a bit to the inner calls
This indicates that the wrapper of DI.inner_gradient could be an issue, meriting further investigation.
It looks like this is a closure capturing the function argument which actually might provide problems for user code. Let's reimplement from DI: https://github.com/gdalle/DifferentiationInterface.jl/blob/efb5acf1f3df51cd5f85edc7fb693703a4cd5bf0/DifferentiationInterface/src/second_order/hvp.jl#L91
In particular, let's recurse a bit more to debug and try to fix.
Great, with our fix looks like we're directly passing in functions without creating closures/indirection! This is probably good to go! (opened an issue on DI to fix this JuliaDiff/DifferentiationInterface.jl#555
Let's check out my_h2
Yeah so it looks like we're sending in a closure Const{Base.Splat{typeof(f)}}(splat(f))) to autodiff, so this could (and does) cause more problems
gdalle commentedon Oct 8, 2024
Thank you for this analysis, it will be helpful to improve DI!
wsmoses commentedon Oct 8, 2024
To be clear, this is the code_typed analysis I mentioned above as the back of envelope threshold for whether an Enzyme Ext makes sense in addition to DI.
Obviously there will be mitigating factors in either direction (code re-use, extreme performance/compat), but this is what I'm suggesting be done in a way that generalizes to the types of arguments/functions intended for use. If it passes the smoke test, an Enzyme Ext probably isn't necessary, if not it probably does.
However, as seen in the past few days, even seemingly simple use cases can fail this (including DI internally shown above not needing the closure). Thus my personal default is that it probably is necessary to have an extension until this analysis is performed (which would probably pass for most users which have simpler cases).
I am in no way trying to undermine DI. I am, however, trying to keep users from having broken (first priority) or slow (second priority) code with the default setup.
gdalle commentedon Oct 8, 2024
I had interpreted your requirement as "DI should generate the exact same
@code_typedas Enzyme to the semicolon", which will obviously be impossible. But if it's only about passing the right function and argument annotations to the Enzyme call without a closure, that seems much more reasonable indeed.Thank you for clarifying. I think in the end we are mostly in agreement, except on the default recommendation:
Both are understandable from our respective standpoints, I think they will simply be appealing to different categories of devs and that's okay.
gdalle commentedon Oct 8, 2024
And I really appreciate you taking the time to dive into DI internals and debug, the same way I try to do it for Enzyme. I do think we can end up with something pretty decent with some more elbow grease.
33 remaining items