Skip to content

Commit 8f50da5

Browse files
committed
Add sorting/ordering
1 parent 2e192df commit 8f50da5

File tree

2 files changed

+182
-1
lines changed

2 files changed

+182
-1
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
2+
=== Sorting multi-value buckets
3+
4+
Multi-value buckets -- like the `terms`, `histogram` and `date_histogram` --
5+
dynamically produce many buckets. How does Elasticsearch decide what order
6+
these buckets are presented to the user?
7+
8+
By default, buckets are ordered by `doc_count` in descending order. This is a
9+
good default because often we want to find the documents that maximize some
10+
criteria: price, population, frequency.
11+
12+
But sometimes you'll want to modify this sort order, and there are a few ways to
13+
do it depending on the bucket.
14+
15+
==== Intrinsic sorts
16+
17+
These sort modes are "intrinsic" to the bucket...they operate on data that bucket
18+
generates such as `doc_count`. They share the same syntax but differ slightly
19+
depending on the bucket being used.
20+
21+
Let's perform a `terms` aggregation but sort by `doc_count` ascending:
22+
23+
[source,js]
24+
--------------------------------------------------
25+
GET /cars/transactions/_search?search_type=count
26+
{
27+
"aggs" : {
28+
"colors" : {
29+
"terms" : {
30+
"field" : "color",
31+
"order": {
32+
"_count" : "asc" <1>
33+
}
34+
}
35+
}
36+
}
37+
}
38+
--------------------------------------------------
39+
// SENSE: 300_Aggregations/50_sorting_ordering.json
40+
<1> Using the `_count` keyword, we can sort by `doc_count` ascending
41+
42+
We introduce a "order" object into the aggregation, which allows us to sort on
43+
one of several values:
44+
45+
- `_count`: Sort by document count. Works with `terms`, `histogram`, `date_histogram`
46+
- `_term`: Sort by the string value of a term alphabetically. Works only with `terms`
47+
- `_key`: Sort by the numeric value of each bucket's key (conceptually similar to `_term`).
48+
Works only with `histogram` and `date_histogram`
49+
50+
==== Sorting by a metric
51+
52+
Often, you'll find yourself wanting to sort based on a metric's calculated value.
53+
For our car sales analytics dashboard, we may want to build a bar chart of
54+
sales by car color, but order the bars by the average price ascending.
55+
56+
We can do this by adding a metric to our bucket, then referencing that
57+
metric from the "order" parameter:
58+
59+
[source,js]
60+
--------------------------------------------------
61+
GET /cars/transactions/_search?search_type=count
62+
{
63+
"aggs" : {
64+
"colors" : {
65+
"terms" : {
66+
"field" : "color",
67+
"order": {
68+
"avg_price" : "asc" <2>
69+
}
70+
},
71+
"aggs": {
72+
"avg_price": {
73+
"avg": {"field": "price"} <1>
74+
}
75+
}
76+
}
77+
}
78+
}
79+
--------------------------------------------------
80+
// SENSE: 300_Aggregations/50_sorting_ordering.json
81+
<1> The average price is calculated for each bucket
82+
<2> Then the buckets are ordered by the calculated average in ascending order
83+
84+
This lets you over-ride the sort order with any metric, simply by referencing
85+
the name of the metric. Some metrics, however, emit multiple values. The
86+
`extended_stats` metric is a good example: it provides half a dozen individual
87+
metrics.
88+
89+
[INFO]
90+
.Applicable buckets
91+
====
92+
Metric-based sorting works with `terms`, `histogram` and `date_histogram`
93+
====
94+
95+
If you want to sort on a multi-value metric, you just need to use the fully-qualified
96+
dot path:
97+
98+
[source,js]
99+
--------------------------------------------------
100+
GET /cars/transactions/_search?search_type=count
101+
{
102+
"aggs" : {
103+
"colors" : {
104+
"terms" : {
105+
"field" : "color",
106+
"order": {
107+
"stats.variance" : "asc" <1>
108+
}
109+
},
110+
"aggs": {
111+
"stats": {
112+
"extended_stats": {"field": "price"}
113+
}
114+
}
115+
}
116+
}
117+
}
118+
--------------------------------------------------
119+
// SENSE: 300_Aggregations/50_sorting_ordering.json
120+
<1> Using dot notation, we can sort on the metric we are interested in
121+
122+
In this example we are sorting on the variance of each bucket, so that colors
123+
with the least variance in price will appear before those that have more variance.
124+
125+
==== Sorting based on "deep" metrics
126+
127+
In the prior examples, the metric was a direct child of the bucket. An average
128+
price was calculated for each term. It is possible to sort on "deeper" metrics,
129+
which are grandchildren or great-grandchildren of the bucket...with some limitations.
130+
131+
You can define a path to a deeper, nested metric using angle brackets (`>`), like
132+
so: `my_bucket>another_bucket>metric`
133+
134+
The caveat is that each nested bucket in the path must be a "single value" bucket.
135+
A `filter` bucket produces a single bucket: all documents which match the
136+
filtering criteria. Multi-valued buckets (such as `terms`) generate many
137+
dynamic buckets, which makes it impossible to specify a deterministic path.
138+
139+
Currently there are only two single-value buckets: `filter` and `global`. As
140+
a quick example, let's build a histogram of car prices, but order the buckets
141+
by the variance in price of red and green (but not blue) cars in each price range.
142+
143+
[source,js]
144+
--------------------------------------------------
145+
GET /cars/transactions/_search?search_type=count
146+
{
147+
"aggs" : {
148+
"colors" : {
149+
"histogram" : {
150+
"field" : "price",
151+
"interval": 20000,
152+
"order": {
153+
"red_green_cars>stats.variance" : "asc" <1>
154+
}
155+
},
156+
"aggs": {
157+
"red_green_cars": {
158+
"filter": { "terms": {"color": ["red", "green"]}}, <2>
159+
"aggs": {
160+
"stats": {"extended_stats": {"field" : "price"}} <3>
161+
}
162+
}
163+
}
164+
}
165+
}
166+
}
167+
--------------------------------------------------
168+
// SENSE: 300_Aggregations/50_sorting_ordering.json
169+
<1> Sort the buckets generated by the histogram according to the variance of a nested metric
170+
<2> Because we are using a single-value `filter`, we can use nested sorting
171+
<3> Sort on the stats generated by this metric
172+
173+
In this example, you can see that we are accessing a nested metric. The `stats`
174+
metric is a child of `red_green_cars`, which is in turn a child of `colors`. To
175+
sort on that metric, we define the path as `"red_green_cars>stats.variance"`.
176+
This is allowed because the `filter` bucket is a single-valued bucket.
177+
178+
179+

303_Making_Graphs.asciidoc

+3-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,6 @@ include::300_Aggregations/35_date_histogram.asciidoc[]
66

77
include::300_Aggregations/40_scope.asciidoc[]
88

9-
include::300_Aggregations/45_filtering.asciidoc[]
9+
include::300_Aggregations/45_filtering.asciidoc[]
10+
11+
include::300_Aggregations/50_sorting_ordering.asciidoc[]

0 commit comments

Comments
 (0)