DOC-12485 prevent bucket from running out of space #3811

ggray-cb · 2025-05-30T18:44:27Z

This doc PR covers the Morpheus feature that lets users set threshold to prevent the data storage path from becoming full (MB-59113).

It also addresses several other issues in the areas of the documentation that were being updated anyhow:

DOC-12778 Data Settings guidance for reader/writer threads change to 'disk i/o optimized' needs to be revised
New alert added by MB-58882 and MB-57062 which weren't labeled with needs-doc so they were not called to our attention.
New alert added by MB-65138 Alert when there is an items count mismatch in an index, and its replica which someone tried to alert docs about, but sadly they typoed the label as "need-doc" so we didn't see it on our dashboard.

Main changes in this PR, with links to the preview site (see here for username/password for the site):

Added a What's New entry.
Storage Properties lots of editing to bring up to doc standards. Added new section (Filesystem Free Space and Usage Limits) to cover new default alert and the ability to limit disk use.
Updated the Available Alerts section of the Alerts page to add the new default disk use percent alert. Also added the alerts for stuck rebalance and index replica divergence.
In the Data Settings section of the General page, revised to meet doc standards. Added documentation on the checkbox to enable the data limit. Also revised the guidance on when to use the Disk i/o optimized setting, as requested by DOC-12778.
Set Data Disk Use Limits new page for the new REST API endpoint to change disk usage limit settings.
Setting Alerts added the limit entry for maxDataDiskUsedPerc ti set the default warning disk useage threshold. Also added entries for the stuck rebalance thresholds for the alert added by MB-58882 and MB-57062.

* Initial pass on Storage Properties to bring up to doc standards.

* Added some coverage for alerts that were adding without alerting the doc team: rebalance timeouts and an index issue that I haven't dug into.

anuthan · 2025-06-02T23:04:03Z

Thanks @ggray-cb, glanced over it have one minor comment.
@Peter-Searby could you do the review, I just glanced over it, thanks.

anuthan · 2025-06-02T23:01:22Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

-Items written to disk are always written in compressed form.
-Based on bucket configuration, items may be maintained in compressed form in memory also.
-See xref:buckets-memory-and-storage/compression.adoc[Compression] for information.
+Disk access does not interrupt most client interactions.


"Disk access does not interrupt most client interactions." We should probably get rid of this line. Durable writes which are client operation require flush to disk.

Peter-Searby

Given the amount of changes to kv/storage sections, perhaps it would be worth having someone from one or both of those teams to review this?

Peter-Searby · 2025-06-03T10:57:59Z

modules/introduction/partials/new-features-80.adoc

+ You can configure Couchbase Server to prevent writes to buckets from consuming all of the disk space in a node.
+ You set a minimum amount of space every node must have free in the filesystem used by the data service.
+ If the node's has less free space than this limit, Couchbase Server prevents  writes to buckets.
+ Even if you do not set this limit, Couchbase Server now  alerts you when a node starts to run out of disk space.


This suggests that there wasn't already an alert for this, which there is: https://docs.couchbase.com/server/current/manage/manage-settings/configure-alerts.html#:~:text=Disk%20space%20used%20for%20persistent%20storage%20has%20reached%20at%20least%2090%25%20of%20capacity
The new alert is lower and specific to the data disk

FYI for @ggray-cb
@Peter-Searby is referring to the last sentence (on line 115) -- maybe it should say something like "Even if you do not configure the prevention of writes when the limit is reached, Couchbase Server alerts you when the disk usage is within 10% of the limit for data service mutations. By default, the limit is 85%, so the alerts will begin around 75%. You cannot disable the alerts, as the intent of this notice is to provide a warning in case you need to reserve disk space for recovery operations if a disk storage needs to be changed."

In the 8.0 UI, Settings > Alerts, there is a new alert listed:
Disk usage is within 10% of maximum for data service mutations

So, this feature allows you to change the default limit for when the alerting begins -- but the alerting is always enabled (cannot disable). What the user can enable/disable is whether or not the Couchbase Server prevents writes by returning an error (EBucketDiskSpace error) to the client when the limit has been reached.

Note: The intent of this feature is to reserve disk space for recovery and rebalance when a disk storage needs to be changed.

Peter-Searby · 2025-06-03T11:13:32Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

-See xref:buckets-memory-and-storage/buckets.adoc[Buckets] for information.
+Couchbase Server compresses the data it writes to disk.
+Compression reduces the amount of disk space used which can help reduce costs.
+It also makes the backup and restore procedures easier.


I'm not sure what compression has to do with the ease of backup and restore. Does this just mean speed/performance?

Peter-Searby · 2025-06-03T11:15:28Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc


-For illustrations of how Couchbase Server saves new and updates existing Couchbase-bucket items, thereby employing both memory and storage resources, see xref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage].
+To see how Couchbase Server saves new items and updates existing items in Couchbase buckets, using both memory and storage, seexref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage].


Suggested change

To see how Couchbase Server saves new items and updates existing items in Couchbase buckets, using both memory and storage, seexref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage].

To see how Couchbase Server saves new items and updates existing items in Couchbase buckets, using both memory and storage, see xref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage].

Peter-Searby · 2025-06-03T11:33:04Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

+You can control the number of reader and writer threads. 
+In the Couchbase Server Web Console, you can have Couchbase Server automatically choose a default value or a value that optimizes disk I/O. 
+You can also manually set the number of threads per node to a value between 1 and 64.
+Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a high large of cores.


Suggested change

Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a high large of cores.

Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a larger number of cores.

Peter-Searby · 2025-06-03T11:34:20Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

+You can also manually set the number of threads per node to a value between 1 and 64.
+Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a high large of cores.
+Increasing the number of writer threads helps optimize durable writes.
+For more information, see xref:learn:data/durability.adoc[Durability].


This feels like it applies to the whole paragraph, when it really is just relevant to the prior sentence, perhaps it could be phrased better?

Peter-Searby · 2025-06-03T15:44:40Z

modules/manage/pages/manage-settings/general-settings.adoc

-Left-clicking on the *Advanced Data Settings* tab displays radio buttons for *Reader Thread Settings* and *Writer Thread Settings*:
+The *Reader Thread Settings* and *Writer Thread Settings* options let you control the number of threads the Data Service uses on each node to read and write data.
+Allocating more threads can improve performance.
+In particular, adding more writer threads can improve durable write performance,.


Suggested change

In particular, adding more writer threads can improve durable write performance,.

In particular, adding more writer threads can improve durable write performance.

Peter-Searby · 2025-06-03T16:20:30Z

modules/rest-api/pages/disk-usage-limits.adoc

+[[get-privs]]
+=== Required Privileges
+
+You must have at least on one of the following roles:


Suggested change

You must have at least on one of the following roles:

You must have at least one of the following roles:

Peter-Searby · 2025-06-03T16:21:22Z

modules/rest-api/pages/disk-usage-limits.adoc

+[source,bash]
+----
+curl -u Administrator:password \
+     -X GET 'http://127.0.0.1:8091//settings/resourceManagement' | jq


Suggested change

-X GET 'http://127.0.0.1:8091//settings/resourceManagement' | jq

-X GET 'http://127.0.0.1:8091/settings/resourceManagement' | jq

Peter-Searby · 2025-06-03T16:24:09Z

modules/rest-api/pages/rest-cluster-email-notifications.adoc

@@ -173,6 +184,12 @@ NOTE: If the node exceeds 90% of the available system connections, then please c

 * `memcachedUserConnectionWarningThreshold`. Trigger the `xref:manage:manage-settings/configure-alerts.adoc#memcached-alert[memcached_connections]` alert if the number of `user` connections in use exceeds the given percentage of connections available. (E.g., if this value is set to `90`, the system will trigger an alert if the number of user connections  used by the data service exceeds 90% of the available connections.)

+* `stuckRebalanceThresholdIndex` and `stuckRebalanceThresholdKV`.
+Sets the timeout threshold for an index rebalance and a data operation to be considered stuck.


Suggested change

Sets the timeout threshold for an index rebalance and a data operation to be considered stuck.

Sets the timeout threshold for a data or index service rebalance to make no identified progress to be considered stuck.

Peter-Searby · 2025-06-04T11:27:17Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc


-For all information on using the REST API for compaction, see the xref:rest-api:compaction-rest-api.adoc[Compaction API].
+You can enable a feature to have Couchbase Server stop writing to the Data Service storage path when it reaches a certain percentage of disk usage.


Suggested change

You can enable a feature to have Couchbase Server stop writing to the Data Service storage path when it reaches a certain percentage of disk usage.

You can enable a feature to have Couchbase Server Data Service stop writing to the Data Service storage path when it reaches a certain percentage of disk usage.

Also worth noting that this storage path may be on the same disk as other data, which may still be written to

Peter-Searby · 2025-06-04T11:42:03Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

+You can also perform compaction manually on a specific bucket.
+For information about performing manual compaction with the command line, see xref:cli:cbcli/couchbase-cli-bucket-compact.adoc[bucket-compact].
+
+For all information about using the REST API for compaction, see the xref:rest-api:compaction-rest-api.adoc[Compaction API].

 == Disk I/O Priority


Assuming this is the Bucket Priority (in the UI), my understanding is that this doesn't actually do anything. I'm not too sure why we've kept the config around, but it would be worth getting confirmation from KV how this should be documented (they look to be planning on cleaning this up in Ponyo: https://jira.issues.couchbase.com/browse/MB-66579)

@owend74 Do we need to change anything (from what is in the current documentation for "Disk I/O Priority" for 8.0 documentation (in light of MB-66579)? Since MB-66579 is still open, I'm thinking that we should just leave things the way it is for now and clean-up the item from documentation and the UI in Totoro or Ponyo (based on how MB-66579 resolves). Please advise.

Peter-Searby · 2025-06-04T11:49:15Z

modules/manage/pages/manage-settings/configure-alerts.adoc

@@ -202,17 +202,30 @@ The size of the change history may need to be increased.
 For information, on establishing change-history size, see xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets].
 | `history_size_warning`

-| Low Indexer Residence Percentage
+| Approaching Indexer low resident percentage
 | Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`.


Suggested change

| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`.

| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`%.

Peter-Searby · 2025-06-04T11:53:21Z

modules/manage/pages/manage-settings/general-settings.adoc


-A high thread-allocation may improve performance on systems whose hardware-resources are commensurately supportive (for example, where the number of CPU cores is high).
-In particular, a high number of _writer_ threads on such systems may significantly optimize the performance of _durable writes_: see xref:learn:data/durability.adoc[Durability], for information.
+*Prevent writes to buckets when storage becomes <number>% full*  controls whether Couchbase Server prevents the filesystem containing the data path from becoming full.


"whether Couchbase Server prevents the filesystem containing the data path from becoming full."
This is too strongly worded. We can't prevent the filesystem becoming full, so lets be careful not to imply that we can

Agree with @Peter-Searby . All Couchbase is doing is that the Data Service is stopping Data Service writes by returning errors to try and prevent the file system containing the data path from becoming full.

However, the disk specified by the data disk path can still become full if the user has used the same path for other services, or if the user has put multiple services storage paths (different paths) on the same file system (the paths are subdirectories of the same file system), or if the user has other things outside of Couchbase writing to the same file system as the data disk path file system.

Peter-Searby · 2025-06-04T12:01:06Z

modules/rest-api/pages/rest-cluster-email-notifications.adoc

+This alert warns you that the disk is becoming full.
+It occurs even if data disk usage limits are not enabled.
+The value must be an integer between `1` and `100`, which is the percentage of disk space used.
+It defaults to `90`.


It actually defaults to 75%. Also, if the data disk limit is enabled, then it will ignore the configured threshold and use 10% less than the enforcement threshold.

hyunjuV · 2025-06-17T01:11:38Z

modules/manage/assets/images/manage-settings/data-settings.png

Unfortunately, you'll need to update this picture again, since the UI change associated with the Reader/Writer Thread Settings had not been complete -- see MB-65204.

Also, the Data Reader and Writer Thread settings documentation will need to be reviewed by Shivani Gupta (PM), Sarath Lakshman (storage eng), and Jim Walker (KV eng) ... so, might be better to make the Data Reader and Writer changes separately...

hyunjuV · 2025-06-18T08:17:05Z

modules/rest-api/pages/rest-cluster-email-notifications.adoc

@@ -152,6 +154,15 @@ See xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets], for inf
 Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, which is the value of `lowIndexerResidentPerc`.
 The default value is `10`.

+* `maxDataDiskUsedPerc`. 
+The percentage of disk space used that will trigger an alert on the filesystem containing the data service, index service, or the `ns_log` or `audit_log`  storage paths.


The percentage of disk space used that will trigger an alert on the filesystem containing the data service storage path.

The maxDataDiskUsedPerc only applies to the data service storage path. So, the sentence should just say the data service storage path. If you don't specify a custom path for data, indexes, eventing, analytics when you initialize a cluster or a node, then, it's not easy to tell what goes into each individual storage paths since they then all end up in a default location. The ns_log does not get put in the data service storage path. Not sure what audit_log is ... but if it's the audit.log, then, that gets put into a location specified by the customer, so it shouldn't be mentioned here either.
cc @Peter-Searby

hyunjuV · 2025-06-18T08:26:46Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

+
+* Data Service
+* Index Service
+* `ns_log` and `audit_log` paths`


This comment is for lines 128 through 132:
By default, Couchbase Server alerts you if the filesystem containing the Data Service storage path becomes more than 75% full (10% before the limit, which is set to 85% by default).

The Index Service on line 131 and ns_log and audit_log paths on line 132 should be removed.

The maxDataDiskUsedPerc only applies to the data service storage path. So, the sentence should just say the data service storage path. If you don't specify a custom path for data, indexes, eventing, analytics when you initialize a cluster or a node, then, it's not easy to tell what goes into each individual storage paths since they then all end up in a default location. The ns_log does not get put in the data service storage path. Not sure what audit_log is ... but if it's the audit.log, then, that gets put into a location specified by the customer, so it shouldn't be mentioned here either.

I think that you are getting maxDiskUsedPerc (value 90) and maxDataDiskUsedPerc (value 75) mixed up -- on line 134 below, you have maxDiskUsedPerc when you mean to have maxDataDiskUsedPerc.

cc @Peter-Searby

hyunjuV · 2025-06-18T08:41:41Z

modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

+* `ns_log` and `audit_log` paths`
+
+See xref:manage:manage-settings/configure-alerts.adoc[] for more information about alerts.
+You can change how full the disk becomes before triggering this alert by changing the  xref:rest-api:rest-cluster-email-notifications.adoc#maxdatadiskusedperc[maxDiskUsedPerc] alert limit.


On line 135, you have maxDiskUsedPerc showing in the text when you mean maxDataDiskUsedPerc

% curl -s -X GET http://localhost:8091/settings/alerts/limits -u Administrator:password | jq .
{
"certExpirationDays": 30,
"historyWarningThreshold": 90,
"lowIndexerResidentPerc": 10,
"maxDataDiskUsedPerc": 75, <---- this is one you want
"maxDiskUsedPerc": 90, <---- getting it mixed up with this
"maxIndexerRamPerc": 75,
"maxOverheadPerc": 50,
"memcachedSystemConnectionWarningThreshold": 90,
"memcachedUserConnectionWarningThreshold": 90,
"memoryCriticalThreshold": 90,
"memoryNoticeThreshold": -1,
"memoryWarningThreshold": 85,
"stuckRebalanceThresholdIndex": 1800,
"stuckRebalanceThresholdKV": 1800
}

hyunjuV · 2025-06-18T21:19:10Z

modules/manage/pages/manage-settings/configure-alerts.adoc

+| The used disk space on the a filesystem containing the Data Service storage path is within 10% of the configured limit. 
+This limit is set either through the Advanced Data Settings in the Couchbase Server Web Console, or by using the `/settings/resourceManagement` REST API endpoint.
+See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information.
+| `disk_guardrail`


This is a minor point, but I'm confused as to where this name disk_guardrail is coming from -- since that word isn't used anywhere else in the documentation or the REST API in reference to this feature.

cc @Peter-Searby

% curl -s -X GET http://localhost:8091/settings/resourceManagement -u Administrator:password | jq .
{
"diskUsage": {
"enabled": false,
"maximum": 85
}
}

Note: I do appreciate that this feature is being documented thoroughly since the resource name is pretty generic.

ggray-cb added 5 commits May 28, 2025 11:46

* Added entry to what's new.

8f0d840

* Initial pass on Storage Properties to bring up to doc standards.

* Completed draft for prevent disk from running out of space.

f923219

* Added some coverage for alerts that were adding without alerting the doc team: rebalance timeouts and an index issue that I haven't dug into.

Updating what's new link

cf954eb

Changed anchor in What's New

8143e97

Fixing link text.

e4256fa

ggray-cb requested review from Peter-Searby, anuthan and hyunjuV May 30, 2025 18:44

anuthan reviewed Jun 2, 2025

View reviewed changes

Peter-Searby reviewed Jun 3, 2025

View reviewed changes

Peter-Searby reviewed Jun 4, 2025

View reviewed changes

ggray-cb mentioned this pull request Jun 6, 2025

DOC-12489 magma default storage engine #3813

Open

hyunjuV reviewed Jun 17, 2025

View reviewed changes

hyunjuV reviewed Jun 18, 2025

View reviewed changes


		For illustrations of how Couchbase Server saves new and updates existing Couchbase-bucket items, thereby employing both memory and storage resources, see xref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage].
		To see how Couchbase Server saves new items and updates existing items in Couchbase buckets, using both memory and storage, seexref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage].

	Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a high large of cores.
	Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a larger number of cores.

	In particular, adding more writer threads can improve durable write performance,.
	In particular, adding more writer threads can improve durable write performance.

	You must have at least on one of the following roles:
	You must have at least one of the following roles:

	-X GET 'http://127.0.0.1:8091//settings/resourceManagement' \| jq
	-X GET 'http://127.0.0.1:8091/settings/resourceManagement' \| jq

	Sets the timeout threshold for an index rebalance and a data operation to be considered stuck.
	Sets the timeout threshold for a data or index service rebalance to make no identified progress to be considered stuck.


		For all information on using the REST API for compaction, see the xref:rest-api:compaction-rest-api.adoc[Compaction API].
		You can enable a feature to have Couchbase Server stop writing to the Data Service storage path when it reaches a certain percentage of disk usage.

	\| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`.
	\| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`%.

DOC-12485 prevent bucket from running out of space #3811

Are you sure you want to change the base?

DOC-12485 prevent bucket from running out of space #3811

Conversation

ggray-cb commented May 30, 2025 • edited by jira bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anuthan commented Jun 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Peter-Searby left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyunjuV Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggray-cb commented May 30, 2025 •

edited by jira bot

Loading

hyunjuV Jun 18, 2025 •

edited

Loading