Skip to content

Add bucket_sort pipeline aggregation. #9605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 21, 2025

Conversation

dwelsch-esi
Copy link
Contributor

Description

Add bucket_sort pipeline aggregation.

Issues Resolved

Version

Frontend features

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

dwelsch-esi and others added 3 commits April 10, 2025 16:00
…ve aggregations/pipeline-agg.md; Add aggregations/pipeline/index.md. Individual aggregation files added in other PRs.

Signed-off-by: Dave Welsch <[email protected]>
Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@kolchfa-aws
Copy link
Collaborator

@sandeshkr419 Could you please review this PR? Thanks!

Copy link
Contributor

@sandeshkr419 sandeshkr419 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Please rebase & get rid of unwanted pipeline-agg.md changes.

@@ -1,1260 +0,0 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since quite a few weeks have passed since you have opened the PR (apologies for the delay in reviewing it), I think we had already migrated this page to: https://github.com/opensearch-project/documentation-website/blob/main/_aggregations/pipeline/index.md - Is that correct?

If yes, let's get rid of this change from the PR?


The `bucket_sort` aggregation is a parent aggregation that sorts buckets of a previous aggregation.

You can specify several sort fields together with a sort order for each. You can sort each bucket based on its key, count, or its sub-aggregations. You can also truncate the buckets by setting `from` and `size` parameters, or use the `from` or `size` parameters without `sort` to truncate the buckets without sorting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this explanation slightly redundant at the last line and had to re-read to understand it. Suggestive change:


For bucket sort aggregations, you can define multiple sort fields with individual sort orders. Each bucket can be sorted by their key, count, or its sub-aggregations. Additionally, you can use the from and size parameters to truncate results, with or without applying a sort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!

| :-- | :-- | :-- | :-- |
| `gap_policy` | Optional | String | The policy to apply to missing data. Valid values are `skip`, `insert_zeros`, and `keep_values`. Default is `skip`. |
| `sort` | Optional | String | A list of fields to sort. See [Sort results](https://opensearch.org/docs/latest/search-plugins/searching-data/sort/). |
| `from` | Optional | String | The index of the first result to return. Indexing starts at `0`. Must be a non-negative integer. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also specify the default is 0. I can't phrase it appropriately I guess. Something like this maybe:


The index of the first result to return. Indexing starts at default value 0. Must be a non-negative integer.

"sort": [
{ "total_bytes": { "order": "desc" } }
],
"size": 1
Copy link
Contributor

@sandeshkr419 sandeshkr419 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include from parameter as well in the request? And ofcourse, update the response accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included in the truncation example.

@github-project-automation github-project-automation bot moved this from In-Review to In Progress in Performance Roadmap May 21, 2025
@kolchfa-aws
Copy link
Collaborator

@sandeshkr419 Thanks for reviewing! I addressed your comments.

Copy link
Contributor

@sandeshkr419 sandeshkr419 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick revert on comments! LGTM

@kolchfa-aws kolchfa-aws merged commit daf7a90 into opensearch-project:main May 21, 2025
9 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap May 21, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 21, 2025
* Refactor pipeline aggregations to match other aggregation types. Remove aggregations/pipeline-agg.md; Add aggregations/pipeline/index.md. Individual aggregation files added in other PRs.

Signed-off-by: Dave Welsch <[email protected]>

* Add bucket_sort pipeline aggregation.

Signed-off-by: Dave Welsch <[email protected]>

* Doc review

Signed-off-by: Fanit Kolchina <[email protected]>

* Add more details and a link

Signed-off-by: Fanit Kolchina <[email protected]>

* Remove unnecessary redirect

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Dave Welsch <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Fanit Kolchina <[email protected]>
(cherry picked from commit daf7a90)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and let me know if you have any questions. Thanks!


## Example

The following example creates a date histogram with a one-month interval from the OpenSearch Dashboards e-commerce sample data. The `sum` subaggregation calculates the sum of all bytes for each month. Finally, the aggregation sorts the buckets in descending order of number of bytes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"of number of bytes" => "by number of bytes"?


## Example response

The aggregation reorders the buckets descending order of total bytes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The aggregation reorders the buckets in descending order by total number of bytes"?


## Example: Truncating the results

To truncate the results, provide the `from` and/or `size` parameters. The following example performs the same sort, but returns two buckets, starting with the second bucket:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd sentence: Delete the comma after "sort". I would also add "operation" after "sort".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave as is. Perform a sort is an established phrase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants