-
Notifications
You must be signed in to change notification settings - Fork 281
[DRAFT] [DO NOT REVIEW] Introduce CachedSupplier for BasePersistence objects #1765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @dimas-b (as you are looking at the similar issue at #1758), @eric-maynard , @collado-mike edit: sorry, wrong PR number |
@adnanhemani thanks for bringing my attention to this PR.
Hmm I looked at your code snippets but I don't see the connection between the private void initializeForRealm(
RealmContext realmContext, RootCredentialsSet rootCredentialsSet, boolean isBootstrap) {
String realmId = realmContext.getRealmIdentifier(); // resolve realm ID eagerly
DatasourceOperations databaseOperations = getDatasourceOperations(isBootstrap);
sessionSupplierMap.put(
realmId,
() ->
new JdbcBasePersistenceImpl(
databaseOperations,
secretsGenerator(() -> realmId, rootCredentialsSet),
storageIntegrationProvider,
realmId));
PolarisMetaStoreManager metaStoreManager = createNewMetaStoreManager();
metaStoreManagerMap.put(realmId, metaStoreManager);
} |
@adutra thanks for taking a look :)
The connection is that the TokenBroker bean is RequestScoped and it does create a BasePersistence Supplier object as part of the bean initialization using the
Yes, this was my original idea - but was hard for me to construct a test case for this type of fix. Maybe this is something you've had more experience with - but using a request-scoped As a result, I'm promoting the CachedSupplier as our preferred way to solve this issue instead. But I'm not heavily tied to this approach if we have a better way to test the way that you suggested. |
I still don't see any @adnanhemani as it stands, this PR is imo not mergeable: it has no clear error description, no stack trace that we can investigate, no reproducer, and no test case ( |
@adutra - I've reproduced the issue on a branch in my fork: https://github.com/adnanhemani/polaris/tree/ahemani/show_failure_1765 You can read the full diff here, but I made a really simple case here that creates a task when you create a catalog. The task only tries to get the Steps to reproduce the error using the code linked above:
You can then apply this PR on top of that code and retry these steps and see that you will no longer see this issue. More on how the TokenBroker creates the poisoned cache:
And that call is where the Again, your suggestion above to change this behavior by materializing the |
A decision on this PR is blocking #1844. In order to unblock the testing on that PR, I've included this change on that PR as well. |
@@ -100,7 +101,12 @@ private void initializeForRealm( | |||
final StoreType backingStore = createBackingStore(diagnostics); | |||
sessionSupplierMap.put( | |||
realmContext.getRealmIdentifier(), | |||
() -> createMetaStoreSession(backingStore, realmContext, rootCredentialsSet, diagnostics)); | |||
new CachedSupplier<>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this change conforms with the behavior intended by the sessionSupplierMap
.
Currently, each call to the supplier yields a new instance - this PR updates the behavior to provide a lazily initialized Persistence
instance per realm-ID.
Maybe @collado-mike or @dennishuo could chime in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. This makes me wonder - would my original idea of materializing the RealmContext
prior to the creation of the Supplier also become an issue? For instance, would it be possible that the RealmContext
is also be computed lazily in some instances?
Let me follow up with @collado-mike and @dennishuo on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, each instance of the session was intended to be scoped to a single request. Though, it seems now that all the current implementations are stateless, but the TransactionalPersistence
interface methods kind of imply a stateful implementation - e.g., lookupEntityInCurrentTxn
assumes that there is a current transaction that has already been started and will be committed at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, so the way I see it is we have really 2 options to fix this issue:
- Keep this approach and change the semantics for the
TransactionalPersistence
interface to be stateless (either now or in the future). OR - Take the approach to materialize the Realm ID and create a new
realmContext
to pass into the supplier that breaks the dependency on therealmContext
that originally came from the function signature. While this is a less invasive change, I do not have an easy way to test this behavior.
I'm leaning towards option 2 simply because it is less invasive - but is everyone else okay without there being hard testing for this few-line change? It would look something like this:
private void initializeForRealm(
RealmContext realmContext, RootCredentialsSet rootCredentialsSet) {
final StoreType backingStore = createBackingStore(diagnostics);
String materializedRealmId = realmContext.getRealmIdentifier();
RealmContext materializedRealmContext = () -> materializedRealmId;
sessionSupplierMap.put(
realmContext.getRealmIdentifier(),
() -> createMetaStoreSession(backingStore, materializedRealmContext, rootCredentialsSet, diagnostics));
PolarisMetaStoreManager metaStoreManager = createNewMetaStoreManager();
metaStoreManagerMap.put(realmContext.getRealmIdentifier(), metaStoreManager);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question and it's not as straightforward as it might seem. The RealmContext
interface only defines a getName
method, but there are concrete implementations that may contain extra information about the realm (we have our own custom impl). Simply materializing the RealmContext
in this way could break functionality if the underlying Session/MetaStoreManager depend on the concrete implementation.
I think the proper long-term fix is to make the BasePersistence
itself a CDI-managed bean so that the RealmContext
can be injected by the context rather than us materializing it manually. It also means we have to make the task execution framework CDI-managed, which is a bigger task that we've been putting off for a while
|
||
import java.util.function.Supplier; | ||
|
||
public class CachedSupplier<T> implements Supplier<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this functionality already exists in Guava.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL - I believe you're talking about Suppliers#memoize
. Thanks! Will convert to using this if we continue with this approach!
Closed in favor of #1988. |
I came across an interesting bug yesterday that we need to fix to ensure that tasks can use the BasePersistence object, as they run outside of user call contexts.
What I was trying to do:
metaStoreManagerFactory.getOrCreateSessionSupplier(CallContext.getCurrentContext().getRealmContext()).get();
.get()
call:When digging deeper into why this is happening, I realized that due to the Supplier's lazy-loading at https://github.com/apache/polaris/blob/main/extension/persistence/relational-jdbc/src/main/java/org/apache/polaris/extension/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L100-L105, the
.get()
was actually using a RequestScoped realmContext bean given by the previously-ranTokenBroker
initialization (which is aRequestScoped
object here: https://github.com/apache/polaris/blob/main/quarkus/service/src/main/java/org/apache/polaris/service/quarkus/config/QuarkusProducers.java#L290-L299. Given this is a relatively-new addition, this may be why we haven't seen this bug previously.As Tasks run asynchronously, likely after the original request was already completed, this error actually makes sense - we should not be able to use a request scoped bean inside of a Task execution. But upon further looking, we do not actually need
realmContext
for anything other than resolving therealmIdentifier
once during the BasePersistence object initialization - as a result, we can cache the BasePersistence object using a supplier that caches the original result instead of constantly making new objects. This will also solve our issue, as the original request scoped RealmContext bean will not be used again during the Task's call to get a BasePersistence object.I've added a test case that shows the difference between the OOTB supplier and my ideal way to solve this problem using a CachedSupplier. If there is significant concern that we cannot cache the BasePersistence object, we can materialize the RealmContext object prior to the supplier so that at a minimum the RequestScoped RealmContext object is not being used - but I'm not sure if there's an easy way to test this, given that the MetastoreFactories are Quarkus
ApplicationScoped
objects.Please note, this is an issue in both EclipseLink and JDBC, as they have almost identical code paths here.
Many thanks to @singhpk234 for being my debugging rubber ducky :)