Skip to content

[automatic failover] Implement wait on healthCheck results during client init #4207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: feature/automatic-failover
Choose a base branch
from

Conversation

atakavci
Copy link
Contributor

@atakavci atakavci commented Jul 19, 2025

This PR is based on changes in previous #4204.

Summary of changes in PR;

  • Added initialization synchronization with health monitoring - StatusTracker tracks initial health results and manages event-driven waits for health checks and prevents event processing during startup
  • Improved health check registration and fallback - registers listeners before clusters, defaults to HEALTHY when no strategy
  • Enhanced failback mechanism for better visibility - made periodicFailbackCheck testable and improved thread naming
  • Added forced active cluster with duration control - allows temporary cluster forcing with time limits
  • Enhanced test utilities and helper methods - centralized health status change triggering and better test support
  • Added graceful degradation with proper error handling - throws exceptions when all clusters unhealthy, better timeout messages
  • Downgraded logback version - for compatibility reasons with slf4j version

Commits essential to this one are;

atakavci and others added 15 commits June 27, 2025 19:13
- Healtstatus manager with initial listener and registration logic
- pluggable health checker strategy  introduced,  these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy,
- fix failing tests impacted from weighted clusters
- add echo ot CommandObjects and UnifiedJEdis
- improve StrategySupplier by accepting jedisclientconfig
- adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig
- make healthchecks disabled by default
- drop noOpStrategy
-  add unit&integration tests for health check
- clear redundant catch
- replace failover options and drop failoveroptions class
- remove forced_unhealthy from healthstatus
- fix failback check
- add disabled flag to cluster
- update/fix related tests
- replace failback enabled with failbacksupported in client
- fix formatting
- set defaults
- fix failing tests
- fix failing tests
- introduce graceperiod
- fix issue when CB is forced_open and gracePeriod is completed
… results during consturction of provider

- add HealthStatus.UNKNOWN as default for Cluster
- handle status changes in order of events during initialization
- add tests for status tracker and orderingof events
- fix impacted unit&integ tests
@atakavci atakavci self-assigned this Jul 19, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automatic failover with wait on health check results during client initialization. It introduces status tracking to wait for initial health check results and adds significant new functionality for health monitoring and automatic cluster management.

  • Adds comprehensive health check system with StatusTracker for waiting on initial health check results
  • Implements weighted cluster selection and automatic failback mechanisms with grace periods
  • Replaces index-based cluster management with endpoint-based approach for better flexibility

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
MultiClusterPooledConnectionProvider.java Core provider changes - adds health status management, weight-based selection, and failback scheduling
StatusTracker.java New class for event-driven health status waiting during initialization
HealthStatusManager.java New health status management system with listener support
HealthCheck.java Health check execution with timeout and callback mechanisms
EchoStrategy.java Default health check strategy using Redis ECHO command
MultiClusterClientConfig.java Configuration updates for weights, health checks, and failback settings
Various test files Comprehensive test coverage for new health check and failback functionality
Comments suppressed due to low confidence (1)

src/test/java/redis/clients/jedis/mcf/StatusTrackerTest.java:178

  • The test expects a JedisConnectionException but doesn't verify the specific exception type or message. Should use assertThrows() for better exception verification.
                fail("Should have thrown JedisConnectionException due to interrupt");

atakavci added 6 commits July 23, 2025 16:05
- downgrade logback version for slf4j compatibility
- increase timeouts for faultInjector
…MultiClusterPooledConnectionProvider

- add test for init and post init events
- fix failing tests
- fix failing tests due to method name change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants