-
Notifications
You must be signed in to change notification settings - Fork 99
ECH: Move nodes off allocator doc updated #1619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
@kunisen , about the comment you have shared:
The headings are in "Q&A" format style already, but that's something I wasn't sure if it was the right approach, and I wanted to double check that with other docs folks. I agree if the headings are kept in this Q&A format, then a "Frequently Asked Questions" heading would make all sense, but maybe we rewrite the headers to be in a different format. cc: @shainaraskas , what would you say? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few small comments, but other than that LGTM!
this isn't really in our style (reasoning) and could be reworked a couple of them should be removed (e.g. the support CTA), or integrated into the doc ("Could such a system maintenance be avoided or skipped?" should just be introductory information about why this happens and its inevitability) some could be pulled into an "Availability during system maintenance" section and perhaps "Data loss risk for non-HA deployments" some of them could be reworded ("How can I be notified when a node is changed?" > "Notifications for moved or changed nodes" [more task-based]). I do think that if we want to keep these together, they do need a heading of their own so they're not nested below "Possible causes and impact" |
Co-authored-by: Stef Nestor <[email protected]>
@shainaraskas : I'll do some rework on this to avoid the FAQ style while keeping all the key points we want to communicate to the users. Thanks a lot for your feedback! |
Thanks for being patient and all the help! 🙏 [1]I made a bunch of updates based on internal ticket comments - https://github.com/elastic/support-tech-lead/issues/1576#issuecomment-2948156720. Here's the preview: [2]@eedugon I totally get what you and @shainaraskas said above #1619 (comment). Please feel free to make any updates from docs perspective based on your writing standard. I still added Again, please feel free to make your change even including the removal of that one. [3]Also, I believe it's technically clear now so no longer need to discuss anything further internally. But if still anything is technically unclear or regarding the expectation, let's still discuss it internally ha :) |
@shainaraskas : I've worked on your suggestions and removed the FAQ style. I'm pretty happy with the outcome and final sections / sub-sections, let me know your thoughts. I also updated some minor paragraphs and added a couple of introductory sentences that felt needed (mainly in The content is 90% similar to the KB article but I think it reads better and it's organized by topic more than by questions. @kunisen , please share your thoughts too! |
Thanks @eedugon looks nice from my side - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1619/troubleshoot/monitoring/node-moves-outages#why-data-loss-can-occur-even-with-multiple-zones tho I am still a bit unfamiliar with this non FAQ way, but let's try it. Some small things: WDYT we say "Service availability during node vacate" WDYT we say "Data loss risk without replica shards"? If you are good with it, then I am good to merge :) |
@kunisen , very good suggestion, next time feel free to add them directly in the code (as I've applied the changes, thanks a lot! |
Thanks @eedugon indeed I will use suggest next time. 🙏 @shainaraskas could you kindly help us double check if we are good to go please? Then I think we should be good to go :) |
|
||
**What is the impact?** | ||
This document explains the "`Move nodes off of allocator...`" message that appears on the [activity page](../../deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) in {{ech}} deployments, helping you understand its meaning, implications, and what to expect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest splitting this apart so the error message is in its own codeblock and the full text is present. just put [allocatorname] or something as a placeholder
|
||
During the routine system maintenance, having replicas and multiple availability zones ensures minimal interruption to your service. When nodes are vacated, as long as you have high availability, all search and indexing requests are expected to work within the reduced capacity until the node is back to normal. | ||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this screenshot adds value if we share the entire error message on the page. it's also very small and hard to read so I'd prefer to skip it
::::{admonition} Availability zones and performance | ||
Increasing the number of zones should not be used to add more resources. The concept of zones is meant for High Availability (2 zones) and Fault Tolerance (3 zones), but neither will work if the cluster relies on the resources from those zones to be operational. | ||
|
||
The recommendation is to **scale up the resources within a single zone until the cluster can take the full load (add some buffer to be prepared for a peak of requests)**, then scale out by adding additional zones depending on your requirements: 2 zones for High Availability, 3 zones for Fault Tolerance. | ||
:::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be careful about repeating info that's elsewhere - this concept is something we should probably use a snippet for
you should also avoid bolding and brackets generally. "high availability" and "fault tolerance" also do not need Title Case.
"the recommendation is" is not an ideal sentence struture. Try "You should [blank]"
|
||
1. Enable [Stack monitoring](/deploy-manage/monitor/stack-monitoring/ece-ech-stack-monitoring.md#enable-logging-and-monitoring-steps) (logs and metrics) on your deployment. Only metrics collection is required for these notifications to work. | ||
|
||
In the deployment used as the destination of Stack monitoring: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be integrated into the list of steps. this should be step 2 and steps 2-4 should be made children
Co-authored-by: shainaraskas <[email protected]>
🔍 Preview links for changed docs: 🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes. |
As described in #1527, this PR is promoting a knowledge article into our existing doc, per @kunisen and support team request.
Preview:
Changes:
Links to existing KB:
Closes #1527