[automatic failover] Implement wait on healthCheck results during client init #4207

atakavci · 2025-07-19T11:36:06Z

This PR is based on changes in previous #4204.

Summary of changes in PR;

Added initialization synchronization with health monitoring - StatusTracker tracks initial health results and manages event-driven waits for health checks and prevents event processing during startup
Improved health check registration and fallback - registers listeners before clusters, defaults to HEALTHY when no strategy
Enhanced failback mechanism for better visibility - made periodicFailbackCheck testable and improved thread naming
Added forced active cluster with duration control - allows temporary cluster forcing with time limits
Enhanced test utilities and helper methods - centralized health status change triggering and better test support
Added graceful degradation with proper error handling - throws exceptions when all clusters unhealthy, better timeout messages
Downgraded logback version - for compatibility reasons with slf4j version

Commits essential to this one are;

- Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters

- add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check

- clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests

Co-authored-by: Copilot <[email protected]>

- replace failback enabled with failbacksupported in client - fix formatting - set defaults

- fix failing tests

- introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed

… results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests

- fix formatting

Copilot

Pull Request Overview

This PR implements automatic failover with wait on health check results during client initialization. It introduces status tracking to wait for initial health check results and adds significant new functionality for health monitoring and automatic cluster management.

Adds comprehensive health check system with StatusTracker for waiting on initial health check results
Implements weighted cluster selection and automatic failback mechanisms with grace periods
Replaces index-based cluster management with endpoint-based approach for better flexibility

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
MultiClusterPooledConnectionProvider.java	Core provider changes - adds health status management, weight-based selection, and failback scheduling
StatusTracker.java	New class for event-driven health status waiting during initialization
HealthStatusManager.java	New health status management system with listener support
HealthCheck.java	Health check execution with timeout and callback mechanisms
EchoStrategy.java	Default health check strategy using Redis ECHO command
MultiClusterClientConfig.java	Configuration updates for weights, health checks, and failback settings
Various test files	Comprehensive test coverage for new health check and failback functionality

Comments suppressed due to low confidence (1)

src/test/java/redis/clients/jedis/mcf/StatusTrackerTest.java:178

The test expects a JedisConnectionException but doesn't verify the specific exception type or message. Should use assertThrows() for better exception verification.

                fail("Should have thrown JedisConnectionException due to interrupt");

src/main/java/redis/clients/jedis/mcf/StatusTracker.java

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

src/main/java/redis/clients/jedis/mcf/HealthCheck.java

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java

- downgrade logback version for slf4j compatibility - increase timeouts for faultInjector

…MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests

- fix failing tests due to method name change

atakavci and others added 15 commits June 27, 2025 19:13

- weighted cluster seleciton

8a9f876

- Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters

- fix naming

df66b1e

clean up and mark override methods

13757f5

fix link in javadoc

ef5d83a

fix formatting

a15fc64

- fix double registered listeners in healtstatusmgr

cf38240

- clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests

Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java

c2fb34c

Co-authored-by: Copilot <[email protected]>

- add remove endpoints

ade866d

- replace cluster disabled with failbackCandidate

ca3378d

- replace failback enabled with failbacksupported in client - fix formatting - set defaults

- remove failback candidate

ddcec73

- fix failing tests

- fix remove logic

c1b6d5f

- fix failing tests

- periodic failback checks

ff16330

- introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed

- introduce forceActiveCluster by duration

975ab78

- fix formatting

atakavci requested review from uglide, ggivo, a-TODO-rov and Copilot July 19, 2025 11:36

atakavci self-assigned this Jul 19, 2025

atakavci added the feature label Jul 19, 2025

Copilot AI reviewed Jul 19, 2025

View reviewed changes

- fix failing tests by waiting on clusters to get healthy

405101e

a-TODO-rov reviewed Jul 23, 2025

View reviewed changes

src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java Outdated Show resolved Hide resolved

atakavci added 6 commits July 23, 2025 16:05

- fix failing scenario test

607c66d

- downgrade logback version for slf4j compatibility - increase timeouts for faultInjector

- adressing reviews and feedback

aaac8f7

- fix formatting

2ffffef

- fix formatting

e6e1121

- get rid of the queue and event ordering for healthstatus change in …

b8d4e87

…MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests

- replace use of reflection with helper methods

1ae7219

- fix failing tests due to method name change

atakavci mentioned this pull request Jul 31, 2025

[automatic failover] Introduce fast failover mode #4220

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[automatic failover] Implement wait on healthCheck results during client init #4207

[automatic failover] Implement wait on healthCheck results during client init #4207

Uh oh!

atakavci commented Jul 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[automatic failover] Implement wait on healthCheck results during client init #4207

Are you sure you want to change the base?

[automatic failover] Implement wait on healthCheck results during client init #4207

Uh oh!

Conversation

atakavci commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atakavci commented Jul 19, 2025 •

edited

Loading