Skip to content

Bump to dotnet/java-interop/main@d3d3a1bf #9921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jonpryor
Copy link
Member

@jonpryor jonpryor commented Mar 14, 2025

Fixes: #9862

Changes: dotnet/java-interop@8221b7d...d3d3a1b

Issue #9862 is an observed "race condition" around
Android.App.Application subclass creation. Two instances of
AndroidApp were created, one from the "normal" app startup:

at crc647fae2f69c19dcd0d.AndroidApp.n_onCreate(Native Method)
at crc647fae2f69c19dcd0d.AndroidApp.onCreate(AndroidApp.java:25)
at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1316)

and another from an androidx.work.WorkerFactory:

at mono.android.TypeManager.n_activate(Native Method)
at mono.android.TypeManager.Activate(TypeManager.java:7)
at crc647fae2f69c19dcd0d.SyncWorker.<init>(SyncWorker.java:23)
at java.lang.reflect.Constructor.newInstance0(Native Method)
at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
at androidx.work.WorkerFactory.createWorkerWithDefaultFallback(WorkerFactory.java:95)

However, what was odd about this "race condition" was that the
second instance created would reliably win!

Further investigation suggested that this was less of a
"race condition" and more a bug in AndroidValueManager, wherein when
"Replaceable" instances were created, an existing instance would
always be replaced, even if the new instance was also Replaceable!
This feels bananas; yes, Replaceable should be replaceable, but the
expectation was that it would be replaced by non-Replaceable
instances, not just any instance that came along later.

dotnet/java-interop@d3d3a1bf updates
JniRuntimeJniValueManagerContract to add a new
CreatePeer_ReplaceableDoesNotReplace() test to codify the desired
semantic that Replaceable instances do not replace Replaceable
instances.

Update AndroidValueManager so that the new
CreatePeer_ReplaceableDoesNotReplace() test passes. It turns out
that the existing code within TypeManager.CreateInstance():

var result = CreateProxy (type, handle, transfer);
if (Runtime.IsGCUserPeer (result.PeerReference.Handle)) {
  result.SetJniManagedPeerState (JniManagedPeerStates.Replaceable | JniManagedPeerStates.Activatable);
}
return result;

sets .Replaceable too late, as during execution of the activation
constructor the instance thinks it isn't replaceable, and thus
creation of a new instance via the activation constructor will
replace an already existing replaceable instance; it's not until
after the constructor finished executing that we set .Replaceable.

Address this by updating CreatePeer() to split up instance creation:

  1. Create an un-constructed instance using
    RuntimeHelpers.GetUninitializedObject(Type).

  2. Set .Replaceable on the instance from (1)

  3. Then invoke the activation constructor.

This allows AddPeer() to reliably determine that an instance is
.Replaceable, and thus not replace a .Replaceable instance with
another .Replaceable instance.

dotnet/java-interop@d3d3a1bf does similar things with
JniRuntime.JniValueManager. Update ManagedValueManager to
override the new TryConstructPeer() method, as TryCreatePeer()
has been removed.

Context: dotnet/java-interop#1323
Context: #9862 (comment)

Does It Build™?

(The expectation is that it *does* build -- only unit tests are changed
in dotnet/java-interop#1323 -- but that the new
`JniRuntimeJniValueManagerContract.cs.CreatePeer_ReplaceableDoesNotReplace()`
test will fail.)`
@jonpryor jonpryor force-pushed the dev/jonp/jonp-try-ji-1323 branch from e67229e to d6e0e7d Compare April 4, 2025 19:13
@jonpryor
Copy link
Member Author

jonpryor commented Apr 4, 2025

So I expected JniRuntimeJniValueManagerContract.CreatePeer_ReplaceableDoesNotReplace() to fail, but it failed for the wrong reason:

Expected peer1.JniManagedPeerState to have .Replaceable, but was None.

Further investigation was because we weren't hitting this line:

result.SetJniManagedPeerState (JniManagedPeerStates.Replaceable | JniManagedPeerStates.Activatable);

because Runtime.IsGCUserPeer() returned false, because AnotherJavaInterfaceImpl.java does not implement GCUserPeerable.

Okay, fair.

What I no longer remember -- and thus need to dig around for -- is why we're "protect" setting .Replacable with IsGCUserPeer(). Offhand, this feels like something we could always do?

@jonpryor jonpryor marked this pull request as ready for review April 15, 2025 16:04
@jonpryor jonpryor changed the title Try dotnet/java-interop#1323 Bump to dotnet/java-interop/main@d3d3a1bf Apr 15, 2025
Remove the `Console.WriteLine()`s from `AndroidRuntime.cs` and `TypeManager.cs`.

Add an `@(AndroidEnvironment)` to `Mono.Android.NET-Tests.csproj` to
set `debug.mono.log=default,gref+`, so that on the next run we can get
*some* gref logging to help diagnose an apparent crash:

    JNI ERROR (app bug): accessed deleted Global 0x3056
We're seeing a recurrent crash on CI, which is still inexplicable:

	E droid.NET_Test: JNI ERROR (app bug): accessed deleted Global 0x3056
	F droid.NET_Test: java_vm_ext.cc:570] JNI DETECTED ERROR IN APPLICATION: use of deleted global reference 0x3056

The obvious thing to do to track this down is to enable GREF logging,
which makes the error disappear.  (Figures.)

However, attempting to enable GREF logging found *other* issues:

The GREF log couldn't be created (!):

	 W monodroid: Failed to create directory '/data/user/0/Mono.Android.NET_Tests/files/.__override__/arm64-v8a'. No such file or directory
	 …
	 E monodroid: fopen failed for file /data/user/0/Mono.Android.NET_Tests/files/.__override__/arm64-v8a/grefs.txt: No such file or directory

The apparent cause for this is that the ABI is now part of the path,
`…/.__override__/arm64-v8a/grefs.txt` and not `…/.__override__/grefs.txt`.
Additionally, `create_public_directory()` was a straight `mkdir()` call,
which *does not* create intermediate directories.
Update `create_public_directory()` to use `create_directory()` instead,
which *does* create intermediate directories.  This allows `grefs.txt`
to be created.

*Next*, I started getting a *bizarre* failure within `MoarThreadingTests()`:

	I NUnit   : 1) Java.InteropTests.JnienvTest.MoarThreadingTests (Mono.Android.NET-Tests)
	I NUnit   :   No exception should be thrown [t2]! Got: System.ObjectDisposedException: Cannot access disposed object with JniIdentityHashCode=158880748.
	I NUnit   : Object name: 'Java.Lang.Integer'.
	I NUnit   :    at Java.Interop.JniPeerMembers.AssertSelf(IJavaPeerable self)
	I NUnit   :    at Java.Interop.JniPeerMembers.JniInstanceMethods.InvokeAbstractInt32Method(String encodedMember, IJavaPeerable self, JniArgumentValue* parameters)
	I NUnit   :    at Java.Lang.Integer.IntValue()
	I NUnit   :    at Java.Lang.Integer.System.IConvertible.ToInt32(IFormatProvider provider)
	I NUnit   :    at System.Convert.ChangeType(Object value, Type conversionType, IFormatProvider provider)
	I NUnit   :    at Android.Runtime.JNIEnv.GetObjectArray(IntPtr array_ptr, Type[] element_types)
	I NUnit   :    at Java.InteropTests.JnienvTest.<>c__DisplayClass26_0.<MoarThreadingTests>b__1()
	I NUnit   :   Expected: null
	I NUnit   :   But was:  <System.ObjectDisposedException: Cannot access disposed object with JniIdentityHashCode=158880748.

Relevant code from `JNIEnv.GetObjectArray()`:

	for (int i = 0; i < cnt; i++) {
		Type? targetType	= (element_types != null && i < element_types.Length) ? element_types [i] : null;
		object? value    = converter ((targetType == null || targetType.IsValueType) ? null : targetType,
				array_ptr, i);

		ret [i] = value;
		ret [i] = targetType == null || targetType.IsInstanceOfType (value)
			? value
			: Convert.ChangeType (value, targetType, CultureInfo.InvariantCulture);
	}

It's *conceivable* that `value` is finalized *during* the
`Convert.ChangeType()` call (…really?! I'm grasping at straws here!),
which could explain the `ObjectDisposedException` being thrown.
(This doesn't *actually* make sense, but it's the best I can come up with.)
Add a `GC.KeepAlive()` call to keep the object alive until after
`Convert.ChangeType()` is called:

	ret [i] = targetType == null || targetType.IsInstanceOfType (value)
		? value
		: Convert.ChangeType (value, targetType, CultureInfo.InvariantCulture);
	GC.KeepAlive (value);

This "fixed" the `ObjectDisposedException` issue, only to raise a new,
different, and *not* fixed issue:

	I NUnit   : 1) Java.InteropTests.JnienvTest.MoarThreadingTests (Mono.Android.NET-Tests)
	I NUnit   :   No exception should be thrown [t2]! Got: System.ArgumentException: Handle must be valid. (Parameter 'instance')
	I NUnit   :    at Java.Interop.JniEnvironment.InstanceMethods.CallIntMethod(JniObjectReference instance, JniMethodInfo method, JniArgumentValue* args) in /Volumes/Xamarin-Work/src/dotnet/android/external/Java.Interop/src/Java.Interop/obj/Debug/net9.0/JniEnvironment.g.cs:line 20191
	I NUnit   :    at Java.Interop.JniPeerMembers.JniInstanceMethods.InvokeAbstractInt32Method(String encodedMember, IJavaPeerable self, JniArgumentValue* parameters) in /Volumes/Xamarin-Work/src/dotnet/android/external/Java.Interop/src/Java.Interop/Java.Interop/JniPeerMembers.JniInstanceMethods_Invoke.cs:line 492
	I NUnit   :    at Java.Lang.Integer.IntValue() in /Volumes/Xamarin-Work/src/dotnet/android/src/Mono.Android/obj/Debug/net10.0/android-36/mcw/Java.Lang.Integer.cs:line 354
	I NUnit   :    at Java.Lang.Integer.System.IConvertible.ToInt32(IFormatProvider provider) in /Volumes/Xamarin-Work/src/dotnet/android/src/Mono.Android/Java.Lang/Integer.cs:line 58
	I NUnit   :    at System.Convert.ChangeType(Object value, Type conversionType, IFormatProvider provider)
	I NUnit   :    at Android.Runtime.JNIEnv.GetObjectArray(IntPtr array_ptr, Type[] element_types) in /Volumes/Xamarin-Work/src/dotnet/android/src/Mono.Android/Android.Runtime/JNIEnv.cs:line 1070
	I NUnit   :    at Java.InteropTests.JnienvTest.<>c__DisplayClass26_0.<MoarThreadingTests>b__1() in /Volumes/Xamarin-Work/src/dotnet/android/tests/Mono.Android-Tests/Mono.Android-Tests/Java.Interop/JnienvTest.cs:line 390

which makes *no* sense, and I'm starting to have flashbacks to
issue #9039, which was a GC bug fixed in dotnet/runtime#112825.
(Don't we have that fix?!  Perhaps not.)

Next, the contents of `grefs.txt` occasionally looked "wrong".
Turns out, `WriteGlobalReferenceLine()` doesn't consistently append
a newline to the message, which impacts e.g. the `Created …` message:

	Created PeerReference=0x3c06/G IdentityHashCode=0x2e29bd Instance=0xbe0aab36 Instance.Type=Android.OS.Bundle, Java.Type=android/os/Bundle+g+ grefc 19 gwrefc 0 obj-handle 0x706f711035/L -> new-handle 0x3d06/G from thread '<null>'(1)

Update `WriteGlobalReferenceLine()` to always append a newline
to the message.

Finally, update `env.txt` to always set debug.mono.debug=1.
Filenames and line numbers in stack traces are handy!
@jonpryor
Copy link
Member Author

The story so far…?

dotnet/java-interop@d3d3a1bf added several unit tests, one of which is CreatePeer_ReplaceableDoesNotReplace(), which was the unit test for #9862.

CI for this PR is observing crashes in tests/Mono.Android-Tests, which don't entirely make sense.

For testing, PR #10047 is a "similar" bump that also bumps to dotnet/java-interop@d3d3a1bf and fixes ManagedValueManager so that it builds, but does not contain fixes for CreatePeer_ReplaceableDoesNotReplace(). It shows two meaningful results:

  1. CreatePeer_ReplaceableDoesNotReplace() fails, which is expected, and
  2. It does not crash.

This suggests that the crash isn't caused by dotnet/java-interop@d3d3a1bf (yay), and may be "random" based on history.

Next step: copy over the remaining changes in this PR into #10047. Maybe it'll pass the tests?

jonpryor added a commit that referenced this pull request Apr 17, 2025
Context: #9921 (comment)

Do tests pass without crashing?

If the tests *do* crash, next step is to *revert* changes to
`AndroidRuntime.cs`.  This should re-introduce failures within
`CreatePeer_ReplaceableDoesNotReplace()`, but would narrow down
what may be in play.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible race condition in proxy object creation
1 participant