feat: add node health status for CLUSTER SLOTS and SHARDS #4767

BorysTheDev · 2025-03-14T13:17:55Z

kostasrim · 2025-03-14T13:44:37Z

src/server/cluster/cluster_family.cc

+  auto config = GetShardInfos(cntx);
+  if (config) {
+    // we need to remove hiden replicas
+    auto shards_info = config->Unwrap();


We do so many unnecessary copies on a relatively large data structure. We copy it here by value and then another time on line 246.

And it's not only here, I was going over cluster code and we seem to copy by value a lot for not good reason (when const& is perfectly fine on those accessors)

yes we have some resource wasting in cluster code, but it is not important now

Well, there is no reason to copy by value and it's an easy fix so...

config is constant and shouldn't be changed

I am not objecting that. return const& to avoid copies then 😄

andydunstall · 2025-03-14T13:51:59Z

src/server/cluster/cluster_family.cc

    slot_ranges += shard.slot_ranges.Size();
+    auto new_end = std::remove_if(shard.replicas.begin(), shard.replicas.end(), [](const auto& r) {
+      return r.health == NodeHealth::HIDDEN || r.health == NodeHealth::FAIL;


I don't think we should include LOADING iin CLUSTER SLOTS?

We can't have clients connecting to LOADING replicas as they won't be reachable (so the request will fail)

it's just a status to be compatible with redis

Though the original goal for adding this node health state was to avoid clients connecting to replicas syncing with the master (that aren't reachable in Dragonfly cloud) - which won't be fixed if we include LOADING replicas in CLUSTER SLOTS?

it's just a status to be compatible with redis

So should we mark those replicas as hidden then when they aren't yet synced with the master? (In which case we'll never use the loading state)

You control this info from config. So if you decide that loading state isn't needed you can send hidden

@BorysTheDev I think that when the replica is in loading state there are some cluster client commands which should return the node and there are other commands which should not return it. So I believe this logic should be in dragonfly and not in cluster manager

if the client is using cluster shards command it should see the loading state and it should know not to redirect traffic to it
if the client is using the cluster slots command it should not see the replica if its in loading state
Therefore the fix should be here not to expose loading replicas

adiholden · 2025-03-16T09:32:10Z

src/server/cluster/cluster_defs.cc

+    case NodeHealth::ONLINE:
+      return "online";
+    case NodeHealth::HIDDEN:
+      DCHECK(false);  // shouldn't be used


because we shouldn't show it, I've added for consistency

src/server/cluster/cluster_defs.h

src/server/cluster/cluster_family.cc

feat: add node health status for CLUSTER SLOTS and SHARDS

6afe123

BorysTheDev requested review from andydunstall and adiholden March 14, 2025 13:17

kostasrim reviewed Mar 14, 2025

View reviewed changes

andydunstall reviewed Mar 14, 2025

View reviewed changes

adiholden reviewed Mar 16, 2025

View reviewed changes

src/server/cluster/cluster_defs.h Outdated Show resolved Hide resolved

adiholden reviewed Mar 16, 2025

View reviewed changes

src/server/cluster/cluster_family.cc Show resolved Hide resolved

fix: address comments

2a7533b

BorysTheDev force-pushed the feat_add_health_status_to_cluster_shard_cmd branch from 3ac98fd to 2a7533b Compare March 16, 2025 19:01

BorysTheDev requested a review from adiholden March 17, 2025 06:26

fix: address comments

d5d2290

adiholden approved these changes Mar 17, 2025

View reviewed changes

BorysTheDev merged commit 151e40e into main Mar 17, 2025
10 checks passed

BorysTheDev deleted the feat_add_health_status_to_cluster_shard_cmd branch March 17, 2025 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add node health status for CLUSTER SLOTS and SHARDS #4767

feat: add node health status for CLUSTER SLOTS and SHARDS #4767

BorysTheDev commented Mar 14, 2025

kostasrim Mar 14, 2025

BorysTheDev Mar 16, 2025

kostasrim Mar 17, 2025

BorysTheDev Mar 17, 2025

kostasrim Mar 17, 2025

andydunstall Mar 14, 2025

BorysTheDev Mar 16, 2025

andydunstall Mar 16, 2025 •

edited

Loading

BorysTheDev Mar 16, 2025 •

edited

Loading

adiholden Mar 17, 2025

adiholden Mar 17, 2025

adiholden Mar 16, 2025

BorysTheDev Mar 16, 2025

feat: add node health status for CLUSTER SLOTS and SHARDS #4767

feat: add node health status for CLUSTER SLOTS and SHARDS #4767

Conversation

BorysTheDev commented Mar 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andydunstall Mar 16, 2025 • edited Loading

Choose a reason for hiding this comment

BorysTheDev Mar 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andydunstall Mar 16, 2025 •

edited

Loading

BorysTheDev Mar 16, 2025 •

edited

Loading