[DRAFT] [DO NOT REVIEW] Introduce CachedSupplier for BasePersistence objects #1765

adnanhemani · 2025-05-31T01:22:10Z

I came across an interesting bug yesterday that we need to fix to ensure that tasks can use the BasePersistence object, as they run outside of user call contexts.

What I was trying to do:

Create and run a Task which dumps some information to the persistence. In order to do this, I was using the following line of code to get a BasePersistence object: metaStoreManagerFactory.getOrCreateSessionSupplier(CallContext.getCurrentContext().getRealmContext()).get();
Get the following error message when executing the last .get() call:

jakarta.enterprise.context.ContextNotActiveException: RequestScoped context was not active when trying to obtain a bean instance for a client proxy...

When digging deeper into why this is happening, I realized that due to the Supplier's lazy-loading at https://github.com/apache/polaris/blob/main/extension/persistence/relational-jdbc/src/main/java/org/apache/polaris/extension/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L100-L105, the .get() was actually using a RequestScoped realmContext bean given by the previously-ran TokenBroker initialization (which is a RequestScoped object here: https://github.com/apache/polaris/blob/main/quarkus/service/src/main/java/org/apache/polaris/service/quarkus/config/QuarkusProducers.java#L290-L299. Given this is a relatively-new addition, this may be why we haven't seen this bug previously.

As Tasks run asynchronously, likely after the original request was already completed, this error actually makes sense - we should not be able to use a request scoped bean inside of a Task execution. But upon further looking, we do not actually need realmContext for anything other than resolving the realmIdentifier once during the BasePersistence object initialization - as a result, we can cache the BasePersistence object using a supplier that caches the original result instead of constantly making new objects. This will also solve our issue, as the original request scoped RealmContext bean will not be used again during the Task's call to get a BasePersistence object.

I've added a test case that shows the difference between the OOTB supplier and my ideal way to solve this problem using a CachedSupplier. If there is significant concern that we cannot cache the BasePersistence object, we can materialize the RealmContext object prior to the supplier so that at a minimum the RequestScoped RealmContext object is not being used - but I'm not sure if there's an easy way to test this, given that the MetastoreFactories are Quarkus ApplicationScoped objects.

Please note, this is an issue in both EclipseLink and JDBC, as they have almost identical code paths here.

Many thanks to @singhpk234 for being my debugging rubber ducky :)

adnanhemani · 2025-05-31T01:24:53Z

cc @dimas-b (as you are looking at the similar issue at #1758), @eric-maynard , @collado-mike

edit: sorry, wrong PR number

adnanhemani · 2025-06-02T18:34:56Z

cc @adutra as well as you are also looking through #1758 .

adutra · 2025-06-02T19:13:05Z

@adnanhemani thanks for bringing my attention to this PR.

I realized that due to the Supplier's lazy-loading [...] the .get() was actually using a RequestScoped realmContext bean given by the previously-ran TokenBroker initialization

Hmm I looked at your code snippets but I don't see the connection between the TokenBroker bean production and the lazy loading of JdbcBasePersistenceImpl. But assuming that this is happening inside a task executor thread, and the problem is RealmContext, why don't you resolve the realmId eagerly? E.g.:

  private void initializeForRealm(
      RealmContext realmContext, RootCredentialsSet rootCredentialsSet, boolean isBootstrap) {
    String realmId = realmContext.getRealmIdentifier(); // resolve realm ID eagerly
    DatasourceOperations databaseOperations = getDatasourceOperations(isBootstrap);
    sessionSupplierMap.put(
        realmId,
        () ->
            new JdbcBasePersistenceImpl(
                databaseOperations,
                secretsGenerator(() -> realmId, rootCredentialsSet),
                storageIntegrationProvider,
                realmId));

    PolarisMetaStoreManager metaStoreManager = createNewMetaStoreManager();
    metaStoreManagerMap.put(realmId, metaStoreManager);
  }

adnanhemani · 2025-06-02T20:51:45Z

@adutra thanks for taking a look :)

Hmm I looked at your code snippets but I don't see the connection between the TokenBroker bean production and the lazy loading of JdbcBasePersistenceImpl

The connection is that the TokenBroker bean is RequestScoped and it does create a BasePersistence Supplier object as part of the bean initialization using the realmContext in the RequestScoped bean initialization. That BasePersistence Supplier is then stored in the sessionSupplierMap and attempted to be used during the lazy loading - which then tries to load the bean's (now-expired) realmContext. Not sure if this is more clear? Let me know what part that's not clear if not!

But assuming that this is happening inside a task executor thread, and the problem is RealmContext, why don't you resolve the realmId eagerly?

Yes, this was my original idea - but was hard for me to construct a test case for this type of fix. Maybe this is something you've had more experience with - but using a request-scoped realmContext bean during a test was something I just wasn't able to do at all. Additionally, I'm just not sure that we're getting any use for continuously re-creating JdbcBasePersistenceImpl objects - is there really any good reason for us to lazy load this? If not, why not cache the object as-is?

As a result, I'm promoting the CachedSupplier as our preferred way to solve this issue instead. But I'm not heavily tied to this approach if we have a better way to test the way that you suggested.

adutra · 2025-06-02T21:05:07Z

The connection is that the TokenBroker bean is RequestScoped and it does create a BasePersistence Supplier

I still don't see any TokenBroker creating any BasePersistence anywhere in the code 🤔

@adnanhemani as it stands, this PR is imo not mergeable: it has no clear error description, no stack trace that we can investigate, no reproducer, and no test case (CachedSupplierTest is just a unit test, but there is no test that shows evidence of a broken behavior that would be "fixed" by the proposed changes).

adnanhemani · 2025-06-05T03:26:53Z

@adutra - I've reproduced the issue on a branch in my fork: https://github.com/adnanhemani/polaris/tree/ahemani/show_failure_1765

You can read the full diff here, but I made a really simple case here that creates a task when you create a catalog. The task only tries to get the BasePersistence object - which is where the call blows up due to the poisoned cache. Feel free to stick a debugger in there and you'll be able to see, it is because of the lazy loading of the JdbcBasePersistenceImpl class and that the cache poisoning happened due to the creation of the TokenBroker (RequestScoped) bean.

Steps to reproduce the error using the code linked above:

[This can only be reproduced using JDBC or EclipseLink.] Create a Persistence instance and set application.properties to the right set of configurations.
Run: ./polaris --client-id <CLIENT_ID> --client-secret <CLIENT_SECRET> catalogs create polaris1 --storage-type FILE --default-base-location "/var/tmp/polaris1/" (you must try this
Wait for the Task to execute. This will fail and retry - until it runs out of retries altogether and then will print out to logs that the task cannot be successfully completed. A call trace will also be able to be seen here.

You can then apply this PR on top of that code and retry these steps and see that you will no longer see this issue.

More on how the TokenBroker creates the poisoned cache:

tokenBrokerFactory.apply(realmContext): https://github.com/apache/polaris/blob/main/quarkus/service/src/main/java/org/apache/polaris/service/quarkus/config/QuarkusProducers.java#L289. Note this is a RequestScoped bean - and so is realmContext.
createTokenBroker(realmContext): https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L53
metaStoreManagerFactory.getOrCreateMetaStoreManager(realmContext): https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L65-L66
initializeForRealm(realmContext, null, false);: https://github.com/apache/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L177

And that call is where the sessionSupplierMap stores the poisoned lambda that creates JdbcBasePersistenceImpl. At no point in this call trace was realmContext reset with a materialized version of the realmIdentifier - which is why it remains as a RequestScoped bean that made its way into the sessionSupplierMap.

Again, your suggestion above to change this behavior by materializing the realmContext (perhaps from the tokenBroker itself) will solve this issue. But I have no idea how to make a test that ensures something like that cannot happen again. If you have an idea, then glad to change the approach to that.

adnanhemani · 2025-06-10T02:36:11Z

A decision on this PR is blocking #1844. In order to unblock the testing on that PR, I've included this change on that PR as well.

snazy · 2025-06-10T12:06:52Z

...e/src/main/java/org/apache/polaris/core/persistence/LocalPolarisMetaStoreManagerFactory.java

@@ -100,7 +101,12 @@ private void initializeForRealm(
    final StoreType backingStore = createBackingStore(diagnostics);
    sessionSupplierMap.put(
        realmContext.getRealmIdentifier(),
-        () -> createMetaStoreSession(backingStore, realmContext, rootCredentialsSet, diagnostics));
+        new CachedSupplier<>(


I'm not sure this change conforms with the behavior intended by the sessionSupplierMap.
Currently, each call to the supplier yields a new instance - this PR updates the behavior to provide a lazily initialized Persistence instance per realm-ID.
Maybe @collado-mike or @dennishuo could chime in here.

This is a good point. This makes me wonder - would my original idea of materializing the RealmContext prior to the creation of the Supplier also become an issue? For instance, would it be possible that the RealmContext is also be computed lazily in some instances?

Let me follow up with @collado-mike and @dennishuo on this.

Yeah, each instance of the session was intended to be scoped to a single request. Though, it seems now that all the current implementations are stateless, but the TransactionalPersistence interface methods kind of imply a stateful implementation - e.g., lookupEntityInCurrentTxn assumes that there is a current transaction that has already been started and will be committed at some point.

Got it, so the way I see it is we have really 2 options to fix this issue:

Keep this approach and change the semantics for the TransactionalPersistence interface to be stateless (either now or in the future). OR

Take the approach to materialize the Realm ID and create a new realmContext to pass into the supplier that breaks the dependency on the realmContext that originally came from the function signature. While this is a less invasive change, I do not have an easy way to test this behavior.

I'm leaning towards option 2 simply because it is less invasive - but is everyone else okay without there being hard testing for this few-line change? It would look something like this:

private void initializeForRealm( RealmContext realmContext, RootCredentialsSet rootCredentialsSet) { final StoreType backingStore = createBackingStore(diagnostics); String materializedRealmId = realmContext.getRealmIdentifier(); RealmContext materializedRealmContext = () -> materializedRealmId; sessionSupplierMap.put( realmContext.getRealmIdentifier(), () -> createMetaStoreSession(backingStore, materializedRealmContext, rootCredentialsSet, diagnostics)); PolarisMetaStoreManager metaStoreManager = createNewMetaStoreManager(); metaStoreManagerMap.put(realmContext.getRealmIdentifier(), metaStoreManager); }

This is a good question and it's not as straightforward as it might seem. The RealmContext interface only defines a getName method, but there are concrete implementations that may contain extra information about the realm (we have our own custom impl). Simply materializing the RealmContext in this way could break functionality if the underlying Session/MetaStoreManager depend on the concrete implementation.

I think the proper long-term fix is to make the BasePersistence itself a CDI-managed bean so that the RealmContext can be injected by the context rather than us materializing it manually. It also means we have to make the task execution framework CDI-managed, which is a bigger task that we've been putting off for a while

snazy · 2025-06-10T12:07:18Z

polaris-core/src/main/java/org/apache/polaris/core/utils/CachedSupplier.java

+
+import java.util.function.Supplier;
+
+public class CachedSupplier<T> implements Supplier<T> {


IIRC this functionality already exists in Guava.

TIL - I believe you're talking about Suppliers#memoize. Thanks! Will convert to using this if we continue with this approach!

adnanhemani · 2025-07-04T10:26:08Z

Closed in favor of #1988.

adnanhemani added 2 commits May 30, 2025 17:31

Fix Poisoned Request-Specific Supplier Bug

cae8d2f

added tests

1a1813e

adnanhemani requested review from adutra, ashvina, dennishuo, dimas-b, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike, snazy and RussellSpitzer as code owners May 31, 2025 01:22

github-project-automation bot added this to Basic Kanban Board May 31, 2025

adnanhemani requested review from takidau, MonkeyCanCode, flyrain, ebyhr, ajantha-bhat, HonahX, singhpk234 and pingtimeout as code owners May 31, 2025 01:22

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board May 31, 2025

spotlessapply

19bde28

adnanhemani mentioned this pull request Jun 5, 2025

Use isolated request contexts for task execution #1817

Closed

snazy reviewed Jun 10, 2025

View reviewed changes

adnanhemani mentioned this pull request Jul 2, 2025

Materialize Realm ID for Session Supplier in JDBC #1988

Merged

adnanhemani changed the title ~~Introduce CachedSupplier for BasePersistence objects~~ [DRAFT] [DO NOT REVIEW] Introduce CachedSupplier for BasePersistence objects Jul 2, 2025

adnanhemani closed this Jul 4, 2025

github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] [DO NOT REVIEW] Introduce CachedSupplier for BasePersistence objects #1765

[DRAFT] [DO NOT REVIEW] Introduce CachedSupplier for BasePersistence objects #1765

Uh oh!

adnanhemani commented May 31, 2025

Uh oh!

adnanhemani commented May 31, 2025 •

edited

Loading

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 5, 2025 •

edited

Loading

Uh oh!

adnanhemani commented Jun 10, 2025

Uh oh!

snazy Jun 10, 2025

Uh oh!

adnanhemani Jun 11, 2025

Uh oh!

collado-mike Jun 16, 2025

Uh oh!

adnanhemani Jun 16, 2025 •

edited

Loading

Uh oh!

collado-mike Jun 18, 2025

Uh oh!

snazy Jun 10, 2025

Uh oh!

adnanhemani Jun 11, 2025

Uh oh!

adnanhemani commented Jul 4, 2025

Uh oh!

Uh oh!


		import java.util.function.Supplier;

		public class CachedSupplier<T> implements Supplier<T> {

[DRAFT] [DO NOT REVIEW] Introduce CachedSupplier for BasePersistence objects #1765

[DRAFT] [DO NOT REVIEW] Introduce CachedSupplier for BasePersistence objects #1765

Uh oh!

Conversation

adnanhemani commented May 31, 2025

Uh oh!

adnanhemani commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adnanhemani commented Jun 10, 2025

Uh oh!

snazy Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

adnanhemani Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

collado-mike Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

adnanhemani Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

collado-mike Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

snazy Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

adnanhemani Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

adnanhemani commented Jul 4, 2025

Uh oh!

Uh oh!

adnanhemani commented May 31, 2025 •

edited

Loading

adnanhemani commented Jun 5, 2025 •

edited

Loading

adnanhemani Jun 16, 2025 •

edited

Loading