Skip to content

fix(tests): prevent HashedWheelTimer leak in integration tests#97

Draft
dkasimovskiy wants to merge 2 commits into
masterfrom
fix/box-integration-tests
Draft

fix(tests): prevent HashedWheelTimer leak in integration tests#97
dkasimovskiy wants to merge 2 commits into
masterfrom
fix/box-integration-tests

Conversation

@dkasimovskiy

Copy link
Copy Markdown
Contributor

The integration test base classes in tarantool-core, tarantool-schema, tarantool-jackson-mapping, and tarantool-client created a static HashedWheelTimer that was never released, producing a Netty leak warning ('LEAK: HashedWheelTimer.release() was not called before it's garbage-collected') for every test class on every CI run.

  • Promote timerService and factory to static (shared across tests in a class for efficiency) and register a JVM shutdown hook that calls timerService.stop() and cancels any pending timeouts at JVM exit. A shutdown hook is used instead of @afterall because the static fields in BaseTest are per-declaring-class and would otherwise be stopped after the first subclass, breaking subsequent test classes.
  • IProtoClientTest.testTimeoutCancel was reassigning the shared static timer; refactored to use a local HashedWheelTimer and ConnectionFactory so the cancellation behavior is still exercised without affecting other tests.
  • IProtoClientWatchersTest.checkTTVersion was throwing NPE when the TARANTOOL_VERSION environment variable was not set (the case for local runs; CI sets it). Fall back to the actual container version when the env var is unset.

I haven't forgotten about:

  • Tests
  • Changelog
  • Documentation
    • JavaDoc was written
  • Commit messages comply with the guideline
  • Cleanup the code for review. See checklist

Related issues:

The integration test base classes in tarantool-core, tarantool-schema,
tarantool-jackson-mapping, and tarantool-client created a static
HashedWheelTimer that was never released, producing a Netty leak warning
('LEAK: HashedWheelTimer.release() was not called before it's
garbage-collected') for every test class on every CI run.

- Promote timerService and factory to static (shared across tests in a
  class for efficiency) and register a JVM shutdown hook that calls
  timerService.stop() and cancels any pending timeouts at JVM exit. A
  shutdown hook is used instead of @afterall because the static fields in
  BaseTest are per-declaring-class and would otherwise be stopped after
  the first subclass, breaking subsequent test classes.
- IProtoClientTest.testTimeoutCancel was reassigning the shared static
  timer; refactored to use a local HashedWheelTimer and ConnectionFactory
  so the cancellation behavior is still exercised without affecting other
  tests.
- IProtoClientWatchersTest.checkTTVersion was throwing NPE when the
  TARANTOOL_VERSION environment variable was not set (the case for local
  runs; CI sets it). Fall back to the actual container version when the
  env var is unset.
Synchronize PoolEntry state mutations and add volatile visibility to
fix intermittent failures in ConnectionPoolReconnectsTest.

PoolEntry changes:
- Make connectFuture, lastHeartbeatEvent, heartbeatTask, reconnectTask,
  isHeartbeatStarted, isLocked volatile to ensure cross-thread visibility
  between netty IO threads and the main thread.
- Add AtomicBoolean isShutdown for idempotent shutdown() so the close
  listener fires only once even with overlapping close events.
- Synchronize lock(), unlock(), close(), shutdown(), internalConnect(),
  stopReconnectTask() to serialize state transitions.
- Make connectAfter() cancel the previous reconnect task atomically and
  increment the reconnecting counter only when no task is pending,
  preventing duplicate reconnect scheduling.
- Add double-check for connectFuture in internalConnect() to return the
  in-flight future instead of starting a new connect, eliminating the
  shutdown() vs internalConnect() race that could leave the pool with a
  closed client behind a fresh future.
- Wrap state mutations in handleConnectError() with synchronized block.

IProtoClientPoolImpl changes:
- Synchronize forEach() on connectionPoolLock to prevent
  ConcurrentModificationException when called concurrently with setGroups().

Verified: ConnectionPoolReconnectsTest passes 5/5 runs locally.
ConnectionPoolTest (12 tests) and 24 unit tests remain green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant