Skip to content

Add basic (single) pipeline bucket stats. #9599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 21, 2025

Conversation

dwelsch-esi
Copy link
Contributor

Description

Add:

  • avg_bucket
  • min_bucket
  • max_bucket
  • sum_bucket

These pages are almost identical. Example is reused; only the stat is change.

Dependent on addition of the /_aggregations/pipeline directory. See #9598.

Issues Resolved

Version

Frontend features

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@kolchfa-aws
Copy link
Collaborator

@jainankitk Could you please review this PR? Thanks!

@kolchfa-aws
Copy link
Collaborator

@dwelsch-esi Could you remove the info related to these aggregations from the pipeline aggregation index page?

@dwelsch-esi
Copy link
Contributor Author

dwelsch-esi commented Apr 11, 2025

@dwelsch-esi Could you remove the info related to these aggregations from the pipeline aggregation index page?

@kolchfa-aws

Not sure which info you're referring to here?

I removed the old pipeline aggregation page (_aggregations/pipeline-agg.md) in PR#9598. Its replacement, _aggregations/pipeline/index.md, cites these aggregations as examples of sibling type pipeline aggregations, and sum_buckets is used in an example illustrating the buckets_path syntax. Do you want me to remove these examples for some reason?

@kolchfa-aws
Copy link
Collaborator

@dwelsch-esi I initially missed that you replaced the pipeline-agg with pipeine/index in #9598. Please ignore. The link checker is failing because #9598 is not merged yet.

parent: Pipeline aggregations
nav_order: 10
redirect_from:
- /query-dsl/aggregations/pipeline-agg#avg_bucket-sum_bucket-min_bucket-max_bucket/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming this redirect is part of sibling-aggregations? Currently the indexed path is https://docs.opensearch.org/docs/latest/aggregations/pipeline/index/#sibling-aggregations. This does not really look consistent with it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed for moved pages. Since this is a new page, removing the redirect.

| Parameter | Required/Optional | Data type | Description |
| :-- | :-- | :-- | :-- |
| `buckets_path` | Required | String | The path of the aggregation buckets to be aggregated. See [Pipeline aggregations]({{site.url}}{{site.baseurl}}/aggregations/pipeline/index#pipeline-aggregation-syntax). |
| `gap_policy` | Optional | String | The policy to apply to missing data. Valid values are `skip`, `insert_zeros`, and `keep_values`. Default is `skip`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While skip and insert_zeros seem intuitive, I am wondering what keep_values mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep_values is not a valid parameter. Removing.


## Example

The following example creates a date histogram with a one-month interval from the OpenSearch Dashboards e-commerce sample data. The `sum` sub-aggregation calculates the sum of all bytes for each month. Finally, the `avg_bucket` aggregation calculates the average number of bytes per month from these sums:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

average bytes per month reads better instead of average number of bytes per month

sum of bytes reads better instead of sum of all bytes


## Example response

The aggregation returns the maximum number of bytes from the monthly buckets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is reasonably intuitive, should explicitly specify what keys indicate. For example - it is array because more than one key might have same value as max

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an extended description in both max and min bucket pages.

@kolchfa-aws
Copy link
Collaborator

@jainankitk Thank you for the review! I addressed your comments.

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!


## Example response

The `max_bucket` aggregation returns the maximum value from a specified metric across multiple buckets. In this example, it calculates the maximum number of bytes per month from the `sum_of_bytes` metric inside `visits_per_month`. The `value` field shows the maximum value found across all buckets. The `keys` array contains the bucket keys where this maximum value was observed. It's an array because more than one bucket can have the same maximum value. In such cases, all matching bucket keys are included. This ensures the result is accurate even if multiple time periods (or terms) tied for the maximum:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4th sentence: Something like "The keys array contains the keys of the buckets in which this maximum value was observed"? Last sentence: "This ensures that the result is accurate even if multiple time periods (or terms) have the same maximum value"?


## Example response

The `max_bucket` aggregation returns the minimum value from a specified metric across multiple buckets. In this example, it calculates the minimum number of bytes per month from the `sum_of_bytes` metric inside `visits_per_month`. The `value` field shows the minimum value found across all buckets. The `keys` array contains the bucket keys where this minimum value was observed. It's an array because more than one bucket can have the same minimum value. In such cases, all matching bucket keys are included. This ensures the result is accurate even if multiple time periods (or terms) tied for the minimum:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in previous file re: rephrasing

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@kolchfa-aws kolchfa-aws merged commit 0795814 into opensearch-project:main May 21, 2025
5 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 21, 2025
* Add basic (single) pipeline bucket stats: avg, sum, min, max.

Signed-off-by: Dave Welsch <[email protected]>

* Doc review

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: Dave Welsch <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Fanit Kolchina <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 0795814)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants