|
| 1 | + |
| 2 | +=== Sorting multi-value buckets |
| 3 | + |
| 4 | +Multi-value buckets -- like the `terms`, `histogram` and `date_histogram` -- |
| 5 | +dynamically produce many buckets. How does Elasticsearch decide what order |
| 6 | +these buckets are presented to the user? |
| 7 | + |
| 8 | +By default, buckets are ordered by `doc_count` in descending order. This is a |
| 9 | +good default because often we want to find the documents that maximize some |
| 10 | +criteria: price, population, frequency. |
| 11 | + |
| 12 | +But sometimes you'll want to modify this sort order, and there are a few ways to |
| 13 | +do it depending on the bucket. |
| 14 | + |
| 15 | +==== Intrinsic sorts |
| 16 | + |
| 17 | +These sort modes are "intrinsic" to the bucket...they operate on data that bucket |
| 18 | +generates such as `doc_count`. They share the same syntax but differ slightly |
| 19 | +depending on the bucket being used. |
| 20 | + |
| 21 | +Let's perform a `terms` aggregation but sort by `doc_count` ascending: |
| 22 | + |
| 23 | +[source,js] |
| 24 | +-------------------------------------------------- |
| 25 | +GET /cars/transactions/_search?search_type=count |
| 26 | +{ |
| 27 | + "aggs" : { |
| 28 | + "colors" : { |
| 29 | + "terms" : { |
| 30 | + "field" : "color", |
| 31 | + "order": { |
| 32 | + "_count" : "asc" <1> |
| 33 | + } |
| 34 | + } |
| 35 | + } |
| 36 | + } |
| 37 | +} |
| 38 | +-------------------------------------------------- |
| 39 | +// SENSE: 300_Aggregations/50_sorting_ordering.json |
| 40 | +<1> Using the `_count` keyword, we can sort by `doc_count` ascending |
| 41 | + |
| 42 | +We introduce a "order" object into the aggregation, which allows us to sort on |
| 43 | +one of several values: |
| 44 | + |
| 45 | +- `_count`: Sort by document count. Works with `terms`, `histogram`, `date_histogram` |
| 46 | +- `_term`: Sort by the string value of a term alphabetically. Works only with `terms` |
| 47 | +- `_key`: Sort by the numeric value of each bucket's key (conceptually similar to `_term`). |
| 48 | +Works only with `histogram` and `date_histogram` |
| 49 | + |
| 50 | +==== Sorting by a metric |
| 51 | + |
| 52 | +Often, you'll find yourself wanting to sort based on a metric's calculated value. |
| 53 | +For our car sales analytics dashboard, we may want to build a bar chart of |
| 54 | +sales by car color, but order the bars by the average price ascending. |
| 55 | + |
| 56 | +We can do this by adding a metric to our bucket, then referencing that |
| 57 | +metric from the "order" parameter: |
| 58 | + |
| 59 | +[source,js] |
| 60 | +-------------------------------------------------- |
| 61 | +GET /cars/transactions/_search?search_type=count |
| 62 | +{ |
| 63 | + "aggs" : { |
| 64 | + "colors" : { |
| 65 | + "terms" : { |
| 66 | + "field" : "color", |
| 67 | + "order": { |
| 68 | + "avg_price" : "asc" <2> |
| 69 | + } |
| 70 | + }, |
| 71 | + "aggs": { |
| 72 | + "avg_price": { |
| 73 | + "avg": {"field": "price"} <1> |
| 74 | + } |
| 75 | + } |
| 76 | + } |
| 77 | + } |
| 78 | +} |
| 79 | +-------------------------------------------------- |
| 80 | +// SENSE: 300_Aggregations/50_sorting_ordering.json |
| 81 | +<1> The average price is calculated for each bucket |
| 82 | +<2> Then the buckets are ordered by the calculated average in ascending order |
| 83 | + |
| 84 | +This lets you over-ride the sort order with any metric, simply by referencing |
| 85 | +the name of the metric. Some metrics, however, emit multiple values. The |
| 86 | +`extended_stats` metric is a good example: it provides half a dozen individual |
| 87 | +metrics. |
| 88 | + |
| 89 | +[INFO] |
| 90 | +.Applicable buckets |
| 91 | +==== |
| 92 | +Metric-based sorting works with `terms`, `histogram` and `date_histogram` |
| 93 | +==== |
| 94 | + |
| 95 | +If you want to sort on a multi-value metric, you just need to use the fully-qualified |
| 96 | +dot path: |
| 97 | + |
| 98 | +[source,js] |
| 99 | +-------------------------------------------------- |
| 100 | +GET /cars/transactions/_search?search_type=count |
| 101 | +{ |
| 102 | + "aggs" : { |
| 103 | + "colors" : { |
| 104 | + "terms" : { |
| 105 | + "field" : "color", |
| 106 | + "order": { |
| 107 | + "stats.variance" : "asc" <1> |
| 108 | + } |
| 109 | + }, |
| 110 | + "aggs": { |
| 111 | + "stats": { |
| 112 | + "extended_stats": {"field": "price"} |
| 113 | + } |
| 114 | + } |
| 115 | + } |
| 116 | + } |
| 117 | +} |
| 118 | +-------------------------------------------------- |
| 119 | +// SENSE: 300_Aggregations/50_sorting_ordering.json |
| 120 | +<1> Using dot notation, we can sort on the metric we are interested in |
| 121 | + |
| 122 | +In this example we are sorting on the variance of each bucket, so that colors |
| 123 | +with the least variance in price will appear before those that have more variance. |
| 124 | + |
| 125 | +==== Sorting based on "deep" metrics |
| 126 | + |
| 127 | +In the prior examples, the metric was a direct child of the bucket. An average |
| 128 | +price was calculated for each term. It is possible to sort on "deeper" metrics, |
| 129 | +which are grandchildren or great-grandchildren of the bucket...with some limitations. |
| 130 | + |
| 131 | +You can define a path to a deeper, nested metric using angle brackets (`>`), like |
| 132 | +so: `my_bucket>another_bucket>metric` |
| 133 | + |
| 134 | +The caveat is that each nested bucket in the path must be a "single value" bucket. |
| 135 | +A `filter` bucket produces a single bucket: all documents which match the |
| 136 | +filtering criteria. Multi-valued buckets (such as `terms`) generate many |
| 137 | +dynamic buckets, which makes it impossible to specify a deterministic path. |
| 138 | + |
| 139 | +Currently there are only two single-value buckets: `filter` and `global`. As |
| 140 | +a quick example, let's build a histogram of car prices, but order the buckets |
| 141 | +by the variance in price of red and green (but not blue) cars in each price range. |
| 142 | + |
| 143 | +[source,js] |
| 144 | +-------------------------------------------------- |
| 145 | +GET /cars/transactions/_search?search_type=count |
| 146 | +{ |
| 147 | + "aggs" : { |
| 148 | + "colors" : { |
| 149 | + "histogram" : { |
| 150 | + "field" : "price", |
| 151 | + "interval": 20000, |
| 152 | + "order": { |
| 153 | + "red_green_cars>stats.variance" : "asc" <1> |
| 154 | + } |
| 155 | + }, |
| 156 | + "aggs": { |
| 157 | + "red_green_cars": { |
| 158 | + "filter": { "terms": {"color": ["red", "green"]}}, <2> |
| 159 | + "aggs": { |
| 160 | + "stats": {"extended_stats": {"field" : "price"}} <3> |
| 161 | + } |
| 162 | + } |
| 163 | + } |
| 164 | + } |
| 165 | + } |
| 166 | +} |
| 167 | +-------------------------------------------------- |
| 168 | +// SENSE: 300_Aggregations/50_sorting_ordering.json |
| 169 | +<1> Sort the buckets generated by the histogram according to the variance of a nested metric |
| 170 | +<2> Because we are using a single-value `filter`, we can use nested sorting |
| 171 | +<3> Sort on the stats generated by this metric |
| 172 | + |
| 173 | +In this example, you can see that we are accessing a nested metric. The `stats` |
| 174 | +metric is a child of `red_green_cars`, which is in turn a child of `colors`. To |
| 175 | +sort on that metric, we define the path as `"red_green_cars>stats.variance"`. |
| 176 | +This is allowed because the `filter` bucket is a single-valued bucket. |
| 177 | + |
| 178 | + |
| 179 | + |
0 commit comments