-
Notifications
You must be signed in to change notification settings - Fork 183
DOC-12485 prevent bucket from running out of space #3811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/8.0
Are you sure you want to change the base?
DOC-12485 prevent bucket from running out of space #3811
Conversation
* Initial pass on Storage Properties to bring up to doc standards.
* Added some coverage for alerts that were adding without alerting the doc team: rebalance timeouts and an index issue that I haven't dug into.
Thanks @ggray-cb, glanced over it have one minor comment. |
Items written to disk are always written in compressed form. | ||
Based on bucket configuration, items may be maintained in compressed form in memory also. | ||
See xref:buckets-memory-and-storage/compression.adoc[Compression] for information. | ||
Disk access does not interrupt most client interactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Disk access does not interrupt most client interactions." We should probably get rid of this line. Durable writes which are client operation require flush to disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the amount of changes to kv/storage sections, perhaps it would be worth having someone from one or both of those teams to review this?
You can configure Couchbase Server to prevent writes to buckets from consuming all of the disk space in a node. | ||
You set a minimum amount of space every node must have free in the filesystem used by the data service. | ||
If the node's has less free space than this limit, Couchbase Server prevents writes to buckets. | ||
Even if you do not set this limit, Couchbase Server now alerts you when a node starts to run out of disk space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggests that there wasn't already an alert for this, which there is: https://docs.couchbase.com/server/current/manage/manage-settings/configure-alerts.html#:~:text=Disk%20space%20used%20for%20persistent%20storage%20has%20reached%20at%20least%2090%25%20of%20capacity
The new alert is lower and specific to the data disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI for @ggray-cb
@Peter-Searby is referring to the last sentence (on line 115) -- maybe it should say something like "Even if you do not configure the prevention of writes when the limit is reached, Couchbase Server alerts you when the disk usage is within 10% of the limit for data service mutations. By default, the limit is 85%, so the alerts will begin around 75%. You cannot disable the alerts, as the intent of this notice is to provide a warning in case you need to reserve disk space for recovery operations if a disk storage needs to be changed."
In the 8.0 UI, Settings > Alerts, there is a new alert listed:
Disk usage is within 10% of maximum for data service mutations
So, this feature allows you to change the default limit for when the alerting begins -- but the alerting is always enabled (cannot disable). What the user can enable/disable is whether or not the Couchbase Server prevents writes by returning an error (EBucketDiskSpace error) to the client when the limit has been reached.
Note: The intent of this feature is to reserve disk space for recovery and rebalance when a disk storage needs to be changed.
See xref:buckets-memory-and-storage/buckets.adoc[Buckets] for information. | ||
Couchbase Server compresses the data it writes to disk. | ||
Compression reduces the amount of disk space used which can help reduce costs. | ||
It also makes the backup and restore procedures easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what compression has to do with the ease of backup and restore. Does this just mean speed/performance?
|
||
For illustrations of how Couchbase Server saves new and updates existing Couchbase-bucket items, thereby employing both memory and storage resources, see xref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage]. | ||
To see how Couchbase Server saves new items and updates existing items in Couchbase buckets, using both memory and storage, seexref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To see how Couchbase Server saves new items and updates existing items in Couchbase buckets, using both memory and storage, seexref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage]. | |
To see how Couchbase Server saves new items and updates existing items in Couchbase buckets, using both memory and storage, see xref:buckets-memory-and-storage/memory-and-storage.adoc[Memory and Storage]. |
You can control the number of reader and writer threads. | ||
In the Couchbase Server Web Console, you can have Couchbase Server automatically choose a default value or a value that optimizes disk I/O. | ||
You can also manually set the number of threads per node to a value between 1 and 64. | ||
Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a high large of cores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a high large of cores. | |
Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a larger number of cores. |
You can also manually set the number of threads per node to a value between 1 and 64. | ||
Using a higher number of threads may improve performance if your hardware supports it, such as when your CPU has a high large of cores. | ||
Increasing the number of writer threads helps optimize durable writes. | ||
For more information, see xref:learn:data/durability.adoc[Durability]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it applies to the whole paragraph, when it really is just relevant to the prior sentence, perhaps it could be phrased better?
Left-clicking on the *Advanced Data Settings* tab displays radio buttons for *Reader Thread Settings* and *Writer Thread Settings*: | ||
The *Reader Thread Settings* and *Writer Thread Settings* options let you control the number of threads the Data Service uses on each node to read and write data. | ||
Allocating more threads can improve performance. | ||
In particular, adding more writer threads can improve durable write performance,. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, adding more writer threads can improve durable write performance,. | |
In particular, adding more writer threads can improve durable write performance. |
[[get-privs]] | ||
=== Required Privileges | ||
|
||
You must have at least on one of the following roles: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must have at least on one of the following roles: | |
You must have at least one of the following roles: |
[source,bash] | ||
---- | ||
curl -u Administrator:password \ | ||
-X GET 'http://127.0.0.1:8091//settings/resourceManagement' | jq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-X GET 'http://127.0.0.1:8091//settings/resourceManagement' | jq | |
-X GET 'http://127.0.0.1:8091/settings/resourceManagement' | jq |
@@ -173,6 +184,12 @@ NOTE: If the node exceeds 90% of the available system connections, then please c | |||
|
|||
* `memcachedUserConnectionWarningThreshold`. Trigger the `xref:manage:manage-settings/configure-alerts.adoc#memcached-alert[memcached_connections]` alert if the number of `user` connections in use exceeds the given percentage of connections available. (E.g., if this value is set to `90`, the system will trigger an alert if the number of user connections used by the data service exceeds 90% of the available connections.) | |||
* `stuckRebalanceThresholdIndex` and `stuckRebalanceThresholdKV`. | |||
Sets the timeout threshold for an index rebalance and a data operation to be considered stuck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets the timeout threshold for an index rebalance and a data operation to be considered stuck. | |
Sets the timeout threshold for a data or index service rebalance to make no identified progress to be considered stuck. |
|
||
For all information on using the REST API for compaction, see the xref:rest-api:compaction-rest-api.adoc[Compaction API]. | ||
You can enable a feature to have Couchbase Server stop writing to the Data Service storage path when it reaches a certain percentage of disk usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can enable a feature to have Couchbase Server stop writing to the Data Service storage path when it reaches a certain percentage of disk usage. | |
You can enable a feature to have Couchbase Server Data Service stop writing to the Data Service storage path when it reaches a certain percentage of disk usage. |
Also worth noting that this storage path may be on the same disk as other data, which may still be written to
You can also perform compaction manually on a specific bucket. | ||
For information about performing manual compaction with the command line, see xref:cli:cbcli/couchbase-cli-bucket-compact.adoc[bucket-compact]. | ||
|
||
For all information about using the REST API for compaction, see the xref:rest-api:compaction-rest-api.adoc[Compaction API]. | ||
|
||
== Disk I/O Priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is the Bucket Priority (in the UI), my understanding is that this doesn't actually do anything. I'm not too sure why we've kept the config around, but it would be worth getting confirmation from KV how this should be documented (they look to be planning on cleaning this up in Ponyo: https://jira.issues.couchbase.com/browse/MB-66579)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@owend74 Do we need to change anything (from what is in the current documentation for "Disk I/O Priority" for 8.0 documentation (in light of MB-66579)? Since MB-66579 is still open, I'm thinking that we should just leave things the way it is for now and clean-up the item from documentation and the UI in Totoro or Ponyo (based on how MB-66579 resolves). Please advise.
@@ -202,17 +202,30 @@ The size of the change history may need to be increased. | |||
For information, on establishing change-history size, see xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets]. | |||
| `history_size_warning` | |||
|
|||
| Low Indexer Residence Percentage | |||
| Approaching Indexer low resident percentage | |||
| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`. | |
| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`%. |
|
||
A high thread-allocation may improve performance on systems whose hardware-resources are commensurately supportive (for example, where the number of CPU cores is high). | ||
In particular, a high number of _writer_ threads on such systems may significantly optimize the performance of _durable writes_: see xref:learn:data/durability.adoc[Durability], for information. | ||
*Prevent writes to buckets when storage becomes <number>% full* controls whether Couchbase Server prevents the filesystem containing the data path from becoming full. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"whether Couchbase Server prevents the filesystem containing the data path from becoming full."
This is too strongly worded. We can't prevent the filesystem becoming full, so lets be careful not to imply that we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @Peter-Searby . All Couchbase is doing is that the Data Service is stopping Data Service writes by returning errors to try and prevent the file system containing the data path from becoming full.
However, the disk specified by the data disk path can still become full if the user has used the same path for other services, or if the user has put multiple services storage paths (different paths) on the same file system (the paths are subdirectories of the same file system), or if the user has other things outside of Couchbase writing to the same file system as the data disk path file system.
This alert warns you that the disk is becoming full. | ||
It occurs even if data disk usage limits are not enabled. | ||
The value must be an integer between `1` and `100`, which is the percentage of disk space used. | ||
It defaults to `90`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually defaults to 75%. Also, if the data disk limit is enabled, then it will ignore the configured threshold and use 10% less than the enforcement threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, you'll need to update this picture again, since the UI change associated with the Reader/Writer Thread Settings had not been complete -- see MB-65204.
Also, the Data Reader and Writer Thread settings documentation will need to be reviewed by Shivani Gupta (PM), Sarath Lakshman (storage eng), and Jim Walker (KV eng) ... so, might be better to make the Data Reader and Writer changes separately...
@@ -152,6 +154,15 @@ See xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets], for inf | |||
Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, which is the value of `lowIndexerResidentPerc`. | |||
The default value is `10`. | |||
* `maxDataDiskUsedPerc`. | |||
The percentage of disk space used that will trigger an alert on the filesystem containing the data service, index service, or the `ns_log` or `audit_log` storage paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The percentage of disk space used that will trigger an alert on the filesystem containing the data service storage path.
The maxDataDiskUsedPerc
only applies to the data service storage path. So, the sentence should just say the data service storage path. If you don't specify a custom path for data, indexes, eventing, analytics when you initialize a cluster or a node, then, it's not easy to tell what goes into each individual storage paths since they then all end up in a default location. The ns_log does not get put in the data service storage path. Not sure what audit_log is ... but if it's the audit.log, then, that gets put into a location specified by the customer, so it shouldn't be mentioned here either.
cc @Peter-Searby
|
||
* Data Service | ||
* Index Service | ||
* `ns_log` and `audit_log` paths` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is for lines 128 through 132:
By default, Couchbase Server alerts you if the filesystem containing the Data Service storage path becomes more than 75% full (10% before the limit, which is set to 85% by default).
The Index Service on line 131 and ns_log
and audit_log
paths on line 132 should be removed.
The maxDataDiskUsedPerc only applies to the data service storage path. So, the sentence should just say the data service storage path. If you don't specify a custom path for data, indexes, eventing, analytics when you initialize a cluster or a node, then, it's not easy to tell what goes into each individual storage paths since they then all end up in a default location. The ns_log does not get put in the data service storage path. Not sure what audit_log is ... but if it's the audit.log, then, that gets put into a location specified by the customer, so it shouldn't be mentioned here either.
I think that you are getting maxDiskUsedPerc (value 90) and maxDataDiskUsedPerc (value 75) mixed up -- on line 134 below, you have maxDiskUsedPerc when you mean to have maxDataDiskUsedPerc.
* `ns_log` and `audit_log` paths` | ||
|
||
See xref:manage:manage-settings/configure-alerts.adoc[] for more information about alerts. | ||
You can change how full the disk becomes before triggering this alert by changing the xref:rest-api:rest-cluster-email-notifications.adoc#maxdatadiskusedperc[maxDiskUsedPerc] alert limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 135, you have maxDiskUsedPerc showing in the text when you mean maxDataDiskUsedPerc
% curl -s -X GET http://localhost:8091/settings/alerts/limits -u Administrator:password | jq .
{
"certExpirationDays": 30,
"historyWarningThreshold": 90,
"lowIndexerResidentPerc": 10,
"maxDataDiskUsedPerc": 75, <---- this is one you want
"maxDiskUsedPerc": 90, <---- getting it mixed up with this
"maxIndexerRamPerc": 75,
"maxOverheadPerc": 50,
"memcachedSystemConnectionWarningThreshold": 90,
"memcachedUserConnectionWarningThreshold": 90,
"memoryCriticalThreshold": 90,
"memoryNoticeThreshold": -1,
"memoryWarningThreshold": 85,
"stuckRebalanceThresholdIndex": 1800,
"stuckRebalanceThresholdKV": 1800
}
| The used disk space on the a filesystem containing the Data Service storage path is within 10% of the configured limit. | ||
This limit is set either through the Advanced Data Settings in the Couchbase Server Web Console, or by using the `/settings/resourceManagement` REST API endpoint. | ||
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information. | ||
| `disk_guardrail` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor point, but I'm confused as to where this name disk_guardrail
is coming from -- since that word isn't used anywhere else in the documentation or the REST API in reference to this feature.
% curl -s -X GET http://localhost:8091/settings/resourceManagement -u Administrator:password | jq .
{
"diskUsage": {
"enabled": false,
"maximum": 85
}
}
Note: I do appreciate that this feature is being documented thoroughly since the resource name is pretty generic.
This doc PR covers the Morpheus feature that lets users set threshold to prevent the data storage path from becoming full (MB-59113).
It also addresses several other issues in the areas of the documentation that were being updated anyhow:
Main changes in this PR, with links to the preview site (see here for username/password for the site):