Merge pull request #1979 from Danielle9897/RDoc-3179-updateClusterConfiguration

reebhub · web-flow · commit 9b912c850e74 · 2025-03-05T13:42:56.000+02:00
RDoc-3179 Update the Cluster Configuration
diff --git a/Documentation/4.0/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown b/Documentation/4.0/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown
@@ -3,7 +3,8 @@
 
 {NOTE: }
 
-* The primary goal of the `Cluster Observer` is to maintain the [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor) of each database in the cluster.  
+* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster  
+  and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
 
 * This observer is always running on the Leader node.  
 {NOTE/}
@@ -38,17 +39,25 @@ The _Cluster Observer_ stores its information **in memory**, so when the `Leader
 | `/admin/cluster/maintenance-stats` | GET | | Fetch the latest reports of the _Cluster Observer_ |
 {PANEL/}
 
-{NOTE: For Example}
+{NOTE: }
+
+**For example**:
 
-* Let us assume a five node cluster, with servers A, B, C, D, E.  
-  We create a database with a replication factor of 3 and define an ETL task.  
+* Let us assume a five-node cluster with servers A, B, C, D, E.  
+  We create a database with a replication factor of 3 and define an ETL task.
 
 * The newly created database will be distributed automatically to three of the cluster nodes.  
-  Let's assume it is distributed to B, C and E (So the database group is [B,C,E]),  
-  and the cluster decides that node C is the responsible for performing the ETL task.  
+  Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),  
+  and the cluster decides that node C is responsible for performing the ETL task.
+
+* If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
+  Initially:
+    * After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,  
+      the observer moves node C to rehab mode, allowing time for recovery.
+    * The ETL task fails over to another available node in the Database Group.
+
+* If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration, the observer begins replicating the database to another node in the Database Group as a last resort.
 
-* If node C goes offline or is not reachable, the Observer will notice it and relocate the database from node C to another available node. 
-  Meanwhile the ETL task will failover to be performed by another available node from the Database Group.  
 {NOTE/}
 
 ## Related articles 
diff --git a/Documentation/4.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown b/Documentation/4.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown
@@ -3,7 +3,8 @@
 
 {NOTE: }
 
-* The primary goal of the `Cluster Observer` is to maintain the [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor) of each database in the cluster.  
+* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster  
+  and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
 
 * This observer is always running on the Leader node.  
 {NOTE/}
@@ -37,15 +38,23 @@ The _Cluster Observer_ stores its information **in memory**, so when the `Leader
 | `/admin/cluster/maintenance-stats` | GET | | Fetch the latest reports of the _Cluster Observer_ |
 {PANEL/}
 
-{NOTE: For Example}
+{NOTE: }
 
-* Let us assume a five node cluster, with servers A, B, C, D, E.  
-  We create a database with a replication factor of 3 and define an ETL task.  
+**For example**:  
 
+* Let us assume a five-node cluster with servers A, B, C, D, E.  
+We create a database with a replication factor of 3 and define an ETL task.
+ 
 * The newly created database will be distributed automatically to three of the cluster nodes.  
-  Let's assume it is distributed to B, C and E (So the database group is [B,C,E]),  
-  and the cluster decides that node C is the responsible for performing the ETL task.  
+Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),  
+and the cluster decides that node C is responsible for performing the ETL task.
+
+* If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
+  Initially:
+  * After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,  
+    the observer moves node C to rehab mode, allowing time for recovery.
+  * The ETL task fails over to another available node in the Database Group.
 
-* If node C goes offline or is not reachable, the Observer will notice it and relocate the database from node C to another available node. 
-  Meanwhile the ETL task will failover to be performed by another available node from the Database Group.  
+* If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration, the observer begins replicating the database to another node in the Database Group as a last resort.
+ 
 {NOTE/}
diff --git a/Documentation/5.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown b/Documentation/5.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown
@@ -3,7 +3,8 @@
 
 {NOTE: }
 
-* The primary goal of the `Cluster Observer` is to maintain the [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor) of each database in the cluster.  
+* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster  
+  and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
 
 * This observer is always running on the Leader node.  
 {NOTE/}
@@ -37,15 +38,23 @@ The _Cluster Observer_ stores its information **in memory**, so when the `Leader
 | `/admin/cluster/maintenance-stats` | GET | | Fetch the latest reports of the _Cluster Observer_ |
 {PANEL/}
 
-{NOTE: For Example}
+{NOTE: }
+
+**For example**:
 
-* Let us assume a five node cluster, with servers A, B, C, D, E.  
-  We create a database with a replication factor of 3 and define an ETL task.  
+* Let us assume a five-node cluster with servers A, B, C, D, E.  
+  We create a database with a replication factor of 3 and define an ETL task.
 
 * The newly created database will be distributed automatically to three of the cluster nodes.  
-  Let's assume it is distributed to B, C and E (So the database group is [B,C,E]),  
-  and the cluster decides that node C is the responsible for performing the ETL task.  
+  Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),  
+  and the cluster decides that node C is responsible for performing the ETL task.
+
+* If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
+  Initially:
+    * After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,  
+      the observer moves node C to rehab mode, allowing time for recovery.
+    * The ETL task fails over to another available node in the Database Group.
+
+* If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration, the observer begins replicating the database to another node in the Database Group as a last resort.
 
-* If node C goes offline or is not reachable, the Observer will notice it and relocate the database from node C to another available node. 
-  Meanwhile the ETL task will failover to be performed by another available node from the Database Group.  
 {NOTE/}
diff --git a/Documentation/6.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown b/Documentation/6.2/Raven.Documentation.Pages/server/clustering/distribution/cluster-observer.markdown
@@ -0,0 +1,73 @@
+# Cluster Observer
+---
+
+{NOTE: }
+ 
+* The primary goal of the **Cluster Observer** is to monitor the health of each database in the cluster  
+  and adjust its topology to maintain the desired [Replication Factor](../../../server/clustering/distribution/distributed-database#replication-factor).
+
+* This observer is always running on the [Leader](../../../server/clustering/rachis/cluster-topology#leader) node.
+
+* In this page:
+  * [Operation flow](../../../server/clustering/distribution/cluster-observer#operation-flow)
+  * [Interacting with the Cluster Observer](../../../server/clustering/distribution/cluster-observer#interacting-with-the-cluster-observer)
+
+{NOTE/}
+
+---
+
+{PANEL: Operation flow}
+
+* To maintain the Replication Factor, every newly elected [Leader](../../../server/clustering/rachis/cluster-topology#leader) starts measuring the health of each node 
+  by creating dedicated maintenance TCP connections to all other nodes in the cluster.  
+
+* Each node reports the current status of _all_  its databases at intervals of [500 milliseconds](../../../server/configuration/cluster-configuration#cluster.workersampleperiodinms) (by default).  
+  The `Cluster Observer` consumes those reports every [1000 milliseconds](../../../server/configuration/cluster-configuration#cluster.supervisorsampleperiodinms) (by default).  
+
+* Upon a **node failure**, the [Dynamic Database Distribution](../../../server/clustering/distribution/distributed-database#dynamic-database-distribution) sequence
+  will take place in order to ensure that the `Replication Factor` does not change.  
+
+    {NOTE: }
+    
+    **For example**:  
+    
+    * Let us assume a five-node cluster with servers A, B, C, D, E.  
+      We create a database with a replication factor of 3 and define an ETL task.
+    
+    * The newly created database will be distributed automatically to three of the cluster nodes.  
+      Let's assume it is distributed to B, C, and E (so the database group is [B,C,E]),  
+      and the cluster decides that node C is responsible for performing the ETL task.
+ 
+    * If node C goes offline or becomes unreachable, the Cluster Observer detects the issue.
+      Initially:
+      * After the duration specified in the [Cluster.TimeBeforeMovingToRehabInSec](../../../server/configuration/cluster-configuration#cluster.timebeforemovingtorehabinsec) configuration,  
+        the observer moves node C to rehab mode, allowing time for recovery.
+      * The ETL task fails over to another available node in the Database Group.
+   
+    * If node C remains offline beyond the period specified in the [Cluster.TimeBeforeAddingReplicaInSec](../../../server/configuration/cluster-configuration#cluster.timebeforeaddingreplicainsec) configuration,
+      the observer begins replicating the database to another node in the Database Group as a last resort.
+    
+    {NOTE/}
+
+    {WARNING: }
+
+    **Note**:  
+
+    * The _Cluster Observer_ stores its information **in memory**, so when the `Leader` loses leadership,  
+      the collected reports of the _Cluster Observer_ and its decision log are lost.
+
+    {WARNING/}
+
+{PANEL/}
+
+{PANEL: Interacting with the Cluster Observer}
+
+You can interact with the `Cluster Observer` using the following REST API calls:  
+
+| URL                                 | Method  | Query Params   | Description                                                                                                                                                          |
+|-------------------------------------|---------|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `/admin/cluster/observer/suspend`   | POST    | value=[`bool`] | Setting `false` will suspend the _Cluster Observer_ operation for the current [Leader term](../../../studio/cluster/cluster-view#cluster-nodes-states-&-types-flow). |
+| `/admin/cluster/observer/decisions` | GET     |                | Fetch the log of the recent decisions made by the cluster observer.                                                                                                  |
+| `/admin/cluster/maintenance-stats`  | GET     |                | Fetch the latest reports of the _Cluster Observer_                                                                                                                   |
+
+{PANEL/}
diff --git a/Documentation/6.2/Raven.Documentation.Pages/server/configuration/cluster-configuration.markdown b/Documentation/6.2/Raven.Documentation.Pages/server/configuration/cluster-configuration.markdown