Skip to content

Conversation

@jsinovassin
Copy link
Contributor

@jsinovassin jsinovassin commented Feb 6, 2026

PR Title format:
https://issues.apache.org/jira/browse/UNOMI-930

Please add a meaningful description for your change here

This pull request refines the startup message display logic in the BundleWatcherImpl class to ensure thread safety and prevent duplicate messages. The main changes involve introducing a synchronization mechanism and restructuring the startup completion checks.

Thread safety and startup logic improvements:

  • Added a startupMessageLock object and synchronized the startup message display block to prevent concurrent threads from displaying the startup message multiple times.

@sergehuber
Copy link
Contributor

It seems the test are failing because we need to update CodeQL, see https://github.blog/changelog/2025-01-10-code-scanning-codeql-action-v2-is-now-deprecated/

@sergehuber
Copy link
Contributor

Ok I found why the tests were failing we need to update the CodeQL Javascript version. I've done that in the master branch, you should merge that change into your branch so that the checks pass.

private List<ServerInfo> serverInfos = new ArrayList<>();

// Lock object to synchronize startup message display
private final Object startupMessageLock = new Object();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startupMessageAlreadyDisplayed should be declared volatile here, because it is read outside the synchronized block (line 290) as part of a double-checked locking pattern. Without volatile, the Java Memory Model allows a thread to read a stale cached value indefinitely — meaning a thread could keep seeing false even after another thread has set it to true inside the lock. In practice, stale reads are rare on strongly-ordered architectures like x86 (which is why the bug may not show up during development), but they are much more frequent on weakly-ordered architectures like ARM — commonly used in cloud/container deployments. Adding volatile guarantees that every write is immediately visible to all threads, which is what makes the outer check reliable.

References:

}

private void destroyScheduler() {
scheduledFuture.cancel(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancel(true) should be changed to cancel(false) here. To see why, trace the call path:

  1. checkStartupComplete() calls startScheduler(getBundleCheckTask()), which schedules a recurring task via scheduleWithFixedDelay.
  2. When that task fires, it runs getBundleCheckTask().run(), which calls back into checkStartupComplete().
  3. checkStartupComplete() now starts with destroyScheduler(), which calls scheduledFuture.cancel(true).
  4. At this point, the thread calling cancel(true) is the thread running the scheduled task — it is cancelling itself.

The true argument means "interrupt if running", which sets the current thread's interrupt flag. The thread then continues executing the rest of checkStartupComplete() with its interrupt flag set. Any subsequent interruptible operation in that same execution — in particular, NIO-based log appenders writing to a FileChannel — may then throw ClosedByInterruptException, which permanently closes the channel and can break logging for the rest of the process lifetime. This is hard to reproduce because it depends on timing and on which logging backend and appenders are configured.

Using cancel(false) prevents future scheduled executions without interrupting the current one, which is the intended behavior here.

References:

@sergehuber
Copy link
Contributor

Hi @jsinovassin thanks for the PR.

I found (with some AI help I must admit :)) some tricky bugs. Let me know if you need any clarifications.

Best regards,
Serge...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants