-
Notifications
You must be signed in to change notification settings - Fork 563
Add bucket_sort pipeline aggregation. #9605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bucket_sort pipeline aggregation. #9605
Conversation
…ve aggregations/pipeline-agg.md; Add aggregations/pipeline/index.md. Individual aggregation files added in other PRs. Signed-off-by: Dave Welsch <[email protected]>
Signed-off-by: Dave Welsch <[email protected]>
Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer. When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. |
@sandeshkr419 Could you please review this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Please rebase & get rid of unwanted pipeline-agg.md
changes.
_aggregations/pipeline-agg.md
Outdated
@@ -1,1260 +0,0 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since quite a few weeks have passed since you have opened the PR (apologies for the delay in reviewing it), I think we had already migrated this page to: https://github.com/opensearch-project/documentation-website/blob/main/_aggregations/pipeline/index.md - Is that correct?
If yes, let's get rid of this change from the PR?
|
||
The `bucket_sort` aggregation is a parent aggregation that sorts buckets of a previous aggregation. | ||
|
||
You can specify several sort fields together with a sort order for each. You can sort each bucket based on its key, count, or its sub-aggregations. You can also truncate the buckets by setting `from` and `size` parameters, or use the `from` or `size` parameters without `sort` to truncate the buckets without sorting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this explanation slightly redundant at the last line and had to re-read to understand it. Suggestive change:
For bucket sort aggregations, you can define multiple sort fields with individual sort orders. Each bucket can be sorted by their key, count, or its sub-aggregations. Additionally, you can use the from
and size
parameters to truncate results, with or without applying a sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
| :-- | :-- | :-- | :-- | | ||
| `gap_policy` | Optional | String | The policy to apply to missing data. Valid values are `skip`, `insert_zeros`, and `keep_values`. Default is `skip`. | | ||
| `sort` | Optional | String | A list of fields to sort. See [Sort results](https://opensearch.org/docs/latest/search-plugins/searching-data/sort/). | | ||
| `from` | Optional | String | The index of the first result to return. Indexing starts at `0`. Must be a non-negative integer. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also specify the default is 0
. I can't phrase it appropriately I guess. Something like this maybe:
The index of the first result to return. Indexing starts at default value 0. Must be a non-negative integer.
"sort": [ | ||
{ "total_bytes": { "order": "desc" } } | ||
], | ||
"size": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include from
parameter as well in the request? And ofcourse, update the response accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included in the truncation example.
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@sandeshkr419 Thanks for reviewing! I addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick revert on comments! LGTM
* Refactor pipeline aggregations to match other aggregation types. Remove aggregations/pipeline-agg.md; Add aggregations/pipeline/index.md. Individual aggregation files added in other PRs. Signed-off-by: Dave Welsch <[email protected]> * Add bucket_sort pipeline aggregation. Signed-off-by: Dave Welsch <[email protected]> * Doc review Signed-off-by: Fanit Kolchina <[email protected]> * Add more details and a link Signed-off-by: Fanit Kolchina <[email protected]> * Remove unnecessary redirect Signed-off-by: Fanit Kolchina <[email protected]> --------- Signed-off-by: Dave Welsch <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> (cherry picked from commit daf7a90) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Please see my comments and let me know if you have any questions. Thanks!
|
||
## Example | ||
|
||
The following example creates a date histogram with a one-month interval from the OpenSearch Dashboards e-commerce sample data. The `sum` subaggregation calculates the sum of all bytes for each month. Finally, the aggregation sorts the buckets in descending order of number of bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"of number of bytes" => "by number of bytes"?
|
||
## Example response | ||
|
||
The aggregation reorders the buckets descending order of total bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The aggregation reorders the buckets in descending order by total number of bytes"?
|
||
## Example: Truncating the results | ||
|
||
To truncate the results, provide the `from` and/or `size` parameters. The following example performs the same sort, but returns two buckets, starting with the second bucket: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2nd sentence: Delete the comma after "sort". I would also add "operation" after "sort".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to leave as is. Perform a sort is an established phrase.
Description
Add bucket_sort pipeline aggregation.
Issues Resolved
Version
Frontend features
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.