-
Notifications
You must be signed in to change notification settings - Fork 562
Add basic (single) pipeline bucket stats. #9599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic (single) pipeline bucket stats. #9599
Conversation
Signed-off-by: Dave Welsch <[email protected]>
Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer. When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. |
@jainankitk Could you please review this PR? Thanks! |
@dwelsch-esi Could you remove the info related to these aggregations from the pipeline aggregation index page? |
Not sure which info you're referring to here? I removed the old pipeline aggregation page (_aggregations/pipeline-agg.md) in PR#9598. Its replacement, _aggregations/pipeline/index.md, cites these aggregations as examples of sibling type pipeline aggregations, and sum_buckets is used in an example illustrating the buckets_path syntax. Do you want me to remove these examples for some reason? |
@dwelsch-esi I initially missed that you replaced the pipeline-agg with pipeine/index in #9598. Please ignore. The link checker is failing because #9598 is not merged yet. |
_aggregations/pipeline/avg-bucket.md
Outdated
parent: Pipeline aggregations | ||
nav_order: 10 | ||
redirect_from: | ||
- /query-dsl/aggregations/pipeline-agg#avg_bucket-sum_bucket-min_bucket-max_bucket/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming this redirect is part of sibling-aggregations
? Currently the indexed path is https://docs.opensearch.org/docs/latest/aggregations/pipeline/index/#sibling-aggregations
. This does not really look consistent with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only needed for moved pages. Since this is a new page, removing the redirect.
_aggregations/pipeline/avg-bucket.md
Outdated
| Parameter | Required/Optional | Data type | Description | | ||
| :-- | :-- | :-- | :-- | | ||
| `buckets_path` | Required | String | The path of the aggregation buckets to be aggregated. See [Pipeline aggregations]({{site.url}}{{site.baseurl}}/aggregations/pipeline/index#pipeline-aggregation-syntax). | | ||
| `gap_policy` | Optional | String | The policy to apply to missing data. Valid values are `skip`, `insert_zeros`, and `keep_values`. Default is `skip`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While skip
and insert_zeros
seem intuitive, I am wondering what keep_values
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep_values
is not a valid parameter. Removing.
_aggregations/pipeline/avg-bucket.md
Outdated
|
||
## Example | ||
|
||
The following example creates a date histogram with a one-month interval from the OpenSearch Dashboards e-commerce sample data. The `sum` sub-aggregation calculates the sum of all bytes for each month. Finally, the `avg_bucket` aggregation calculates the average number of bytes per month from these sums: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
average bytes per month
reads better instead of average number of bytes per month
sum of bytes
reads better instead of sum of all bytes
_aggregations/pipeline/max-bucket.md
Outdated
|
||
## Example response | ||
|
||
The aggregation returns the maximum number of bytes from the monthly buckets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it is reasonably intuitive, should explicitly specify what keys indicate. For example - it is array because more than one key might have same value as max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an extended description in both max and min bucket pages.
Signed-off-by: Fanit Kolchina <[email protected]>
@jainankitk Thank you for the review! I addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!
_aggregations/pipeline/max-bucket.md
Outdated
|
||
## Example response | ||
|
||
The `max_bucket` aggregation returns the maximum value from a specified metric across multiple buckets. In this example, it calculates the maximum number of bytes per month from the `sum_of_bytes` metric inside `visits_per_month`. The `value` field shows the maximum value found across all buckets. The `keys` array contains the bucket keys where this maximum value was observed. It's an array because more than one bucket can have the same maximum value. In such cases, all matching bucket keys are included. This ensures the result is accurate even if multiple time periods (or terms) tied for the maximum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4th sentence: Something like "The keys
array contains the keys of the buckets in which this maximum value was observed"? Last sentence: "This ensures that the result is accurate even if multiple time periods (or terms) have the same maximum value"?
_aggregations/pipeline/min-bucket.md
Outdated
|
||
## Example response | ||
|
||
The `max_bucket` aggregation returns the minimum value from a specified metric across multiple buckets. In this example, it calculates the minimum number of bytes per month from the `sum_of_bytes` metric inside `visits_per_month`. The `value` field shows the minimum value found across all buckets. The `keys` array contains the bucket keys where this minimum value was observed. It's an array because more than one bucket can have the same minimum value. In such cases, all matching bucket keys are included. This ensures the result is accurate even if multiple time periods (or terms) tied for the minimum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in previous file re: rephrasing
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
* Add basic (single) pipeline bucket stats: avg, sum, min, max. Signed-off-by: Dave Welsch <[email protected]> * Doc review Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> --------- Signed-off-by: Dave Welsch <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]> (cherry picked from commit 0795814) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
Add:
These pages are almost identical. Example is reused; only the stat is change.
Dependent on addition of the /_aggregations/pipeline directory. See #9598.
Issues Resolved
Version
Frontend features
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.