Skip to content

Commit 2e192df

Browse files
committed
Rearrange file/dir structure
1 parent 9986c1f commit 2e192df

12 files changed

+335
-313
lines changed

300_Aggregations.asciidoc

-21
This file was deleted.
+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
2+
=== Adding a metric to the mix
3+
4+
The previous example told us how many documents were in each bucket, which is
5+
useful. But often, our applications require more sophisticated _metrics_ about
6+
the documents. For example, what is the average price of cars in each bucket?
7+
8+
// "nesting"-> need to tell Elasticsearch which metrics to calculate, and on which fields.
9+
To get this information, we need to start nesting metrics inside of the buckets.
10+
Metrics will calculate some kind of mathematical statistic based on the values
11+
in the documents residing within a particular bucket.
12+
13+
Let's go ahead and add an `average` metric to our car example:
14+
15+
[source,js]
16+
--------------------------------------------------
17+
GET /cars/transactions/_search?search_type=count
18+
{
19+
"aggs": {
20+
"colors": {
21+
"terms": {
22+
"field": "color"
23+
},
24+
"aggs": { <1>
25+
"avg_price": { <2>
26+
"avg": {
27+
"field": "price" <3>
28+
}
29+
}
30+
}
31+
}
32+
}
33+
}
34+
--------------------------------------------------
35+
// SENSE: 300_Aggregations/20_basic_example.json
36+
<1> We add a new `aggs` level to hold the metric
37+
<2> We then give the metric a name: "avg_price"
38+
<3> And finally define it as an `avg` metric over the "price" field
39+
40+
As you can see, we took the previous example and tacked on a new `agga` level.
41+
This new aggregation level allows us to nest the `avg` metric inside the
42+
`terms` bucket. Effectively, this means we will generate an average for each
43+
color.
44+
45+
Just like the "colors" example, we need to name our metric ("avg_price") so we
46+
can retrieve the values later. Finally, we specify the metric itself (`avg`)
47+
and what field we want the average to be calculated on (`price`).
48+
49+
// Delete this para
50+
The response is, not surprisingly, nearly identical to the previous response...except
51+
there is now a new "avg_price" element added to each color bucket:
52+
53+
[source,js]
54+
--------------------------------------------------
55+
{
56+
...
57+
"aggregations": {
58+
"colors": {
59+
"buckets": [
60+
{
61+
"key": "red",
62+
"doc_count": 4,
63+
"avg_price": { <1>
64+
"value": 32500
65+
}
66+
},
67+
{
68+
"key": "blue",
69+
"doc_count": 2,
70+
"avg_price": {
71+
"value": 20000
72+
}
73+
},
74+
{
75+
"key": "green",
76+
"doc_count": 2,
77+
"avg_price": {
78+
"value": 21000
79+
}
80+
}
81+
]
82+
}
83+
}
84+
...
85+
}
86+
--------------------------------------------------
87+
<1> New "avg_price" element in response
88+
89+
// Would love to have a graph under each example showing how the data can be displayed (later, i know)
90+
Although the response has changed minimally, the data we get out of it has grown
91+
substantially. Before, we knew there were four red cars. Now we know that the
92+
average price of red cars is $32,500. This is something that you can plug directly
93+
into reports or graphs.
+101
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
2+
=== Buckets inside of buckets
3+
4+
The true power of aggregations becomes apparent once you start playing with
5+
different nesting schemes. In the previous examples, we saw how you could nest
6+
a metric inside a bucket, which is already quite powerful.
7+
8+
But the real exciting analytics come from nesting buckets inside _other buckets_.
9+
This time, we want to find out the distribution of car manufacturers for each
10+
color:
11+
12+
13+
[source,js]
14+
--------------------------------------------------
15+
GET /cars/transactions/_search?search_type=count
16+
{
17+
"aggs": {
18+
"colors": {
19+
"terms": {
20+
"field": "color"
21+
},
22+
"aggs": {
23+
"avg_price": { <1>
24+
"avg": {
25+
"field": "price"
26+
}
27+
},
28+
"make": { <2>
29+
"terms": {
30+
"field": "make" <3>
31+
}
32+
}
33+
}
34+
}
35+
}
36+
}
37+
--------------------------------------------------
38+
// SENSE: 300_Aggregations/20_basic_example.json
39+
<1> Notice that we can leave the previous "avg_price" metric in place
40+
<2> Another aggregation named "make" is added to the "color" bucket
41+
<3> This aggregation is a `terms` bucket and will generate unique buckets for
42+
each car make
43+
44+
A few interesting things happened here. First, you'll notice that the previous
45+
"avg_price" metric is left entirely intact. Each "level" of an aggregation can
46+
have many metrics or buckets. The "avg_price" metric tells us the average price
47+
for each car color. This is independent of other buckets and metrics which
48+
are also being built.
49+
50+
This is very important for your application, since there are often many related,
51+
but entirely distinct, metrics which you need to collect. Aggregations allow
52+
you to collect all of them in a single pass over the data.
53+
54+
The other important thing to note is that the aggregation we added, "make", is
55+
a `terms` bucket (nested inside the "colors" `terms` bucket). This means we will
56+
generate a (color, make) tuple for every unique combination in your dataset.
57+
58+
Let's take a look at the response (truncated for brevity, since it is now
59+
growing quite long):
60+
61+
62+
[source,js]
63+
--------------------------------------------------
64+
{
65+
...
66+
"aggregations": {
67+
"colors": {
68+
"buckets": [
69+
{
70+
"key": "red",
71+
"doc_count": 4,
72+
"make": { <1>
73+
"buckets": [
74+
{
75+
"key": "honda", <2>
76+
"doc_count": 3
77+
},
78+
{
79+
"key": "bmw",
80+
"doc_count": 1
81+
}
82+
]
83+
},
84+
"avg_price": {
85+
"value": 32500 <3>
86+
}
87+
},
88+
89+
...
90+
}
91+
--------------------------------------------------
92+
<1> Our new aggregation is nested under each color bucket, as expected
93+
<2> We now see a breakdown of car makes for each color
94+
<3> Finally, you can see that our previous "avg_price" metric is still intact
95+
96+
The response tells us:
97+
98+
- There are four red cars
99+
- The average price of a red car is $32,500
100+
- Three of the red cars are made by Honda, and one is a BMW
101+
- Similar analytics are generated for other colors and makes
+97
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
2+
3+
==== One final modification
4+
5+
Just to drive the point home, let's make one final modification to our example
6+
before moving on to new topics. Let's add two metrics to calculate the min and
7+
max price for each make:
8+
9+
10+
[source,js]
11+
--------------------------------------------------
12+
GET /cars/transactions/_search?search_type=count
13+
{
14+
"aggs": {
15+
"colors": {
16+
"terms": {
17+
"field": "color"
18+
},
19+
"aggs": {
20+
"avg_price": { "avg": { "field": "price" }
21+
},
22+
"make" : {
23+
"terms" : {
24+
"field" : "make"
25+
},
26+
"aggs" : { <1>
27+
"min_price" : { "min": { "field": "price"} }, <2>
28+
"max_price" : { "max": { "field": "price"} } <3>
29+
}
30+
}
31+
}
32+
}
33+
}
34+
}
35+
--------------------------------------------------
36+
// SENSE: 300_Aggregations/20_basic_example.json
37+
38+
// Careful with the "no surprise", it makes it sound like you're bored :)
39+
40+
<1> No surprise...we need to add another "aggs" level for nesting
41+
<2> Then we include a `min` metric
42+
<3> And a `max` metric
43+
44+
Which gives us the following output (again, truncated):
45+
46+
[source,js]
47+
--------------------------------------------------
48+
{
49+
...
50+
"aggregations": {
51+
"colors": {
52+
"buckets": [
53+
{
54+
"key": "red",
55+
"doc_count": 4,
56+
"make": {
57+
"buckets": [
58+
{
59+
"key": "honda",
60+
"doc_count": 3,
61+
"min_price": {
62+
"value": 10000 <1>
63+
},
64+
"max_price": {
65+
"value": 20000 <1>
66+
}
67+
},
68+
{
69+
"key": "bmw",
70+
"doc_count": 1,
71+
"min_price": {
72+
"value": 80000
73+
},
74+
"max_price": {
75+
"value": 80000
76+
}
77+
}
78+
]
79+
},
80+
"avg_price": {
81+
"value": 32500
82+
}
83+
},
84+
...
85+
--------------------------------------------------
86+
<1> The `min` and `max` metrics that we added now appear under each "make"
87+
88+
With those two buckets, we've expanded the information derived from this query
89+
to include:
90+
91+
// Nice, but "Similar analytics.." -> "etc."?
92+
- There are four red cars
93+
- The average price of a red car is $32,500
94+
- Three of the red cars are made by Honda, and one is a BMW
95+
- The cheapest Honda is $10,000
96+
- The most expensive Honda is $20,000
97+
- Similar analytics are generated for all other colors and makes

0 commit comments

Comments
 (0)