-
-
Couldn't load subscription status.
- Fork 197
fix: AOT interop with managed .NET runtimes #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding the diff to make the other implementations work @jpnurmi. Since this deviates significantly, we should document the change as clearly as possible (for our internal use) to ensure understanding that we have shifted the focus to the current AOT signal/exception interface or adapt the tests to cover the relevant area.
Co-authored-by: Mischan Toosarani-Hausberger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can isolate the SIGABRT in case of an unhandled exception on AOT/Mono as well.
| SENTRY_DEBUG("runtime converted the signal to a managed " | ||
| "exception, we do not handle the signal"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely correct, but the only side-effect currently visible is for the logging toggle. Similarly, to how we "leave" the signal handler before chaining, we must also re-enable logging immediately after "leaving" and disable it again before re-entering, because if it were a managed code exception, we want logging to remain enabled.
We can also move the entire sig_slot assignment down below the CHAIN_AT_START code, to make the path dependencies more obvious.
However, I think both have a lower priority than figuring out the signaling sequence of both runtimes and how we can align them.
|
I'm trying to fix the scenario where Mono's signal handler detects a managed exception, it modifies the context to transfer execution to a managed exception handler, and then execution returns to Sentry Native's signal handler. In this case, Sentry Native needs to detect that Mono wanted execution to continue, and abort crash handling. In case of a real native crash, though, if we invoke Mono's signal handler first and Mono's native crash handling decides to call |
Isn't that what you're trying to do here all along? Or is there yet another difference when you use pure Mono?
Were you able to observe this? Because this only happens when |
Btw, if it is the latter, then this is also the reason why I suggested that CLR JIT support can be dropped altogether. When I started this project (which was over a year ago), the primary goal was to determine how much the handler interaction between the various runtime implementations converges. I started with CLR JIT as a baseline. However, if we primarily have downstream usage for another implementation that diverges entirely in signal handling, then we can either drop the current implementation or add another handler strategy. |
I tried creating a test case using |
No wait, it's the newly added IP/SP check that prevents native crash handling, too. 🤦 How the heck do we distinguish between these....... |
I was wary of checking ucontext modifications along the signal chain. I didn't have the time to review the implementation, but I can. |
Try, as a first step, to switch the order of the handler chain for Mono (and drop your current IP/SP check or even the CHAIN_AT_FIRST strategy entirely). The way it seems to be operating makes more sense if their handlers get installed last. In "old" Mono, there were managed-language side functions that could (un)install signal handlers at specific points (which could be controlled from the sentry-dotnet around the native SDK initialization) to control the chain being: rather than what we have now: Not sure if they are still exposed in the dotnet/runtime mono fork, but we can certainly try to achieve something similar. Then we would have their handler first and might not need an alternative strategy inside our handler; maybe not even for CLR (but one step at a time). |
|
Swapping the order of the signal handlers would work. I was able to confirm the theory on Linux, even though I had to patch Mono to either
to make it possible to swap the order in either managed or native code, respectively. However, that's just Linux, which is not relevant for Sentry .NET on Android or iOS. The problem is that there's no such type as |
We should do this in the Native SDK, similar to how we can change the invocation sequence in the handler; we can construct the signal chain up to a point during the setup of the signal handlers (rather than at signal-handling time). I can follow up on this topic next week. |
|
The main purpose of debugging on Linux was just to understand Mono's behavior. 🙂 Anyway,
I have also temporarily hacked This assumes locally built and published (
Furthermore, I reverted this old change and switched With local builds and all above changes temporarily combined in |
They are irrelevant for sentry-dotnet or .NET on Android, and there are no tests checking if they even work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see that the adaptation works downstream ❤️
Please move the log flush (Flush logs in a crash-safe...) below the point where we call sentry__page_allocator_enable() but outside the #ifdef because that should happen on all platforms. There is no reason to flush the logs if we don't know there is a terminal signal to handle, and it also requires the allocator, which isn't safe before we enable the page-allocator.
Otherwise, I would either document why we hide the behavior for managed exceptions that aren't handled in C# code now (since this has a more limited scope than "all managed exceptions" and essentially hides behavior) or adapt the test handling accordingly.
|
P.S. These tests were only executed in Release mode, but I've prepared a patch to make them execute in both Debug and Release: For what it's worth, they did seem to pass locally on both |
Yes, I have seen this. Release is more critical because that is where most problems appear. However, it is sensible to track the behavior on Debug too, so we don't surprise users with changing behavior during development.
Perfection 💯
Agreed, and also because we want to track changes in behavior or have feedback when we add or extend the test dimensions. |
|
The only thing left now is to add the |
Co-authored-by: Mischan Toosarani-Hausberger <[email protected]>
Well, this is interesting. Unwinding the chained sentry-native/src/backends/sentry_backend_inproc.c Lines 497 to 502 in 516c150
Not sure how to debug this, because running in a debugger changes the behavior. 🤨 |
|
Not sure if skipping this fallback in case of chained signal handlers is the right thing to do, but it helps avoid the crash... |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1392 +/- ##
==========================================
- Coverage 83.48% 83.35% -0.14%
==========================================
Files 58 58
Lines 9648 9660 +12
Branches 1511 1512 +1
==========================================
- Hits 8055 8052 -3
- Misses 1439 1451 +12
- Partials 154 157 +3 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fair to exclude the non-user-context backtrace fallback. It is questionable whether that stack trace, had it not crashed, would provide any helpful information. I would comment on why you don't do it.
|
Looking forward to making this available as an opt-in for starters: Thanks so much for the help and guidance! ❤️ |
When chaining signal handlers in AOT mode, detect whether the .NET runtime converts a signal to a managed exception and transfers execution to the managed exception handler. In this case, Sentry Native should abort crash handling because the exception is caught and handled in managed code.
See also:
Note
Detect .NET runtime converting signals to managed exceptions and skip native crash handling; add JIT/AOT tests and changelog entry.
get_stack_pointerandget_instruction_pointerfor multiple architectures to read fromucontext_t.CHAIN_AT_START, compare IP/SP before/after invoking prior handler; if changed, treat as managed exception and abort native handling.run_jit_*) and add AOT runners (run_aot_*), including AOT publish and execution.Program.csto use null-forgivings!and conditionally rethrow viaargs("managed-exception").test_aot_signals_inprocand rename JIT test; adjust skip reasons/messages.Written by Cursor Bugbot for commit 060b18b. This will update automatically on new commits. Configure here.