Skip to content

Commit 9b912c8

Browse files
authored
Merge pull request #1979 from Danielle9897/RDoc-3179-updateClusterConfiguration
RDoc-3179 Update the Cluster Configuration
2 parents db3d92d + 33c859e commit 9b912c8

File tree

5 files changed

+405
-24
lines changed

5 files changed

+405
-24
lines changed

Diff for: Documentation/4.0/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown

+17-8
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33

44
{NOTE: }
55

6-
* The primary goal of the `Cluster Observer` is to maintain the [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor) of each database in the cluster.
6+
* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster
7+
and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
78

89
* This observer is always running on the Leader node.
910
{NOTE/}
@@ -38,17 +39,25 @@ The _Cluster Observer_ stores its information **in memory**, so when the `Leader
3839
| `/admin/cluster/maintenance-stats` | GET | | Fetch the latest reports of the _Cluster Observer_ |
3940
{PANEL/}
4041

41-
{NOTE: For Example}
42+
{NOTE: }
43+
44+
**For example**:
4245

43-
* Let us assume a five node cluster, with servers A, B, C, D, E.
44-
We create a database with a replication factor of 3 and define an ETL task.
46+
* Let us assume a five-node cluster with servers A, B, C, D, E.
47+
We create a database with a replication factor of 3 and define an ETL task.
4548

4649
* The newly created database will be distributed automatically to three of the cluster nodes.
47-
Let's assume it is distributed to B, C and E (So the database group is [B,C,E]),
48-
and the cluster decides that node C is the responsible for performing the ETL task.
50+
Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),
51+
and the cluster decides that node C is responsible for performing the ETL task.
52+
53+
* If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
54+
Initially:
55+
* After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,
56+
the observer moves node C to rehab mode, allowing time for recovery.
57+
* The ETL task fails over to another available node in the Database Group.
58+
59+
* If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration, the observer begins replicating the database to another node in the Database Group as a last resort.
4960

50-
* If node C goes offline or is not reachable, the Observer will notice it and relocate the database from node C to another available node.
51-
Meanwhile the ETL task will failover to be performed by another available node from the Database Group.
5261
{NOTE/}
5362

5463
## Related articles

Diff for: Documentation/4.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown

+17-8
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33

44
{NOTE: }
55

6-
* The primary goal of the `Cluster Observer` is to maintain the [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor) of each database in the cluster.
6+
* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster
7+
and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
78

89
* This observer is always running on the Leader node.
910
{NOTE/}
@@ -37,15 +38,23 @@ The _Cluster Observer_ stores its information **in memory**, so when the `Leader
3738
| `/admin/cluster/maintenance-stats` | GET | | Fetch the latest reports of the _Cluster Observer_ |
3839
{PANEL/}
3940

40-
{NOTE: For Example}
41+
{NOTE: }
4142

42-
* Let us assume a five node cluster, with servers A, B, C, D, E.
43-
We create a database with a replication factor of 3 and define an ETL task.
43+
**For example**:
4444

45+
* Let us assume a five-node cluster with servers A, B, C, D, E.
46+
We create a database with a replication factor of 3 and define an ETL task.
47+
4548
* The newly created database will be distributed automatically to three of the cluster nodes.
46-
Let's assume it is distributed to B, C and E (So the database group is [B,C,E]),
47-
and the cluster decides that node C is the responsible for performing the ETL task.
49+
Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),
50+
and the cluster decides that node C is responsible for performing the ETL task.
51+
52+
* If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
53+
Initially:
54+
* After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,
55+
the observer moves node C to rehab mode, allowing time for recovery.
56+
* The ETL task fails over to another available node in the Database Group.
4857

49-
* If node C goes offline or is not reachable, the Observer will notice it and relocate the database from node C to another available node.
50-
Meanwhile the ETL task will failover to be performed by another available node from the Database Group.
58+
* If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration, the observer begins replicating the database to another node in the Database Group as a last resort.
59+
5160
{NOTE/}

Diff for: Documentation/5.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown

+17-8
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33

44
{NOTE: }
55

6-
* The primary goal of the `Cluster Observer` is to maintain the [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor) of each database in the cluster.
6+
* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster
7+
and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
78

89
* This observer is always running on the Leader node.
910
{NOTE/}
@@ -37,15 +38,23 @@ The _Cluster Observer_ stores its information **in memory**, so when the `Leader
3738
| `/admin/cluster/maintenance-stats` | GET | | Fetch the latest reports of the _Cluster Observer_ |
3839
{PANEL/}
3940

40-
{NOTE: For Example}
41+
{NOTE: }
42+
43+
**For example**:
4144

42-
* Let us assume a five node cluster, with servers A, B, C, D, E.
43-
We create a database with a replication factor of 3 and define an ETL task.
45+
* Let us assume a five-node cluster with servers A, B, C, D, E.
46+
We create a database with a replication factor of 3 and define an ETL task.
4447

4548
* The newly created database will be distributed automatically to three of the cluster nodes.
46-
Let's assume it is distributed to B, C and E (So the database group is [B,C,E]),
47-
and the cluster decides that node C is the responsible for performing the ETL task.
49+
Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),
50+
and the cluster decides that node C is responsible for performing the ETL task.
51+
52+
* If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
53+
Initially:
54+
* After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,
55+
the observer moves node C to rehab mode, allowing time for recovery.
56+
* The ETL task fails over to another available node in the Database Group.
57+
58+
* If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration, the observer begins replicating the database to another node in the Database Group as a last resort.
4859

49-
* If node C goes offline or is not reachable, the Observer will notice it and relocate the database from node C to another available node.
50-
Meanwhile the ETL task will failover to be performed by another available node from the Database Group.
5160
{NOTE/}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Cluster Observer
2+
---
3+
4+
{NOTE: }
5+
6+
* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster
7+
and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
8+
9+
* This observer is always running on the [Leader](../../../server/clustering/rachis/cluster-topology#leader) node.
10+
11+
* In this page:
12+
* [Operation flow](../../../server/clustering/distribution/cluster-observer#operation-flow)
13+
* [Interacting with the Cluster Observer](../../../server/clustering/distribution/cluster-observer#interacting-with-the-cluster-observer)
14+
15+
{NOTE/}
16+
17+
---
18+
19+
{PANEL: Operation flow}
20+
21+
* To maintain the Replication Factor, every newly elected [Leader](../../../server/clustering/rachis/cluster-topology#leader) starts measuring the health of each node
22+
by creating dedicated maintenance TCP connections to all other nodes in the cluster.
23+
24+
* Each node reports the current status of _all_ its databases at intervals of [500 milliseconds](../../../server/configuration/cluster-configuration#cluster.workersampleperiodinms) (by default).
25+
The `Cluster Observer` consumes those reports every [1000 milliseconds](../../../server/configuration/cluster-configuration#cluster.supervisorsampleperiodinms) (by default).
26+
27+
* Upon a **node failure**, the [Dynamic Database Distribution](../../../server/clustering/distribution/distributed-database#dynamic-database-distribution) sequence
28+
will take place in order to ensure that the `Replication Factor` does not change.
29+
30+
{NOTE: }
31+
32+
**For example**:
33+
34+
* Let us assume a five-node cluster with servers A, B, C, D, E.
35+
We create a database with a replication factor of 3 and define an ETL task.
36+
37+
* The newly created database will be distributed automatically to three of the cluster nodes.
38+
Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),
39+
and the cluster decides that node C is responsible for performing the ETL task.
40+
41+
* If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
42+
Initially:
43+
* After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,
44+
the observer moves node C to rehab mode, allowing time for recovery.
45+
* The ETL task fails over to another available node in the Database Group.
46+
47+
* If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration,
48+
the observer begins replicating the database to another node in the Database Group as a last resort.
49+
50+
{NOTE/}
51+
52+
{WARNING: }
53+
54+
**Note**:
55+
56+
* The _Cluster Observer_ stores its information **in memory**, so when the `Leader` loses leadership,
57+
the collected reports of the _Cluster Observer_ and its decision log are lost.
58+
59+
{WARNING/}
60+
61+
{PANEL/}
62+
63+
{PANEL: Interacting with the Cluster Observer}
64+
65+
You can interact with the `Cluster Observer` using the following REST API calls:
66+
67+
| URL | Method | Query Params | Description |
68+
|-------------------------------------|---------|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
69+
| `/admin/cluster/observer/suspend` | POST | value=[`bool`] | Setting `false` will suspend the _Cluster Observer_ operation for the current [Leader term](../../../studio/cluster/cluster-view#cluster-nodes-states-&-types-flow). |
70+
| `/admin/cluster/observer/decisions` | GET | | Fetch the log of the recent decisions made by the cluster observer. |
71+
| `/admin/cluster/maintenance-stats` | GET | | Fetch the latest reports of the _Cluster Observer_ |
72+
73+
{PANEL/}

0 commit comments

Comments
 (0)