You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// "the following chapters"... obvious. just delete the 2nd sentence, and merge with next para
4
5
Up until this point, the this book has been dedicated to search. The following
5
6
chapters deal with Aggregations, an entirely different set of functionality
6
7
built into Elasticsearch.
7
8
8
9
With search, we have a query and we wish to find a subset of documents which
9
-
match the query. We are looking for the proverbial needle(s) in the
10
+
match the query. We are looking for the proverbial needle(s) in the
10
11
haystack.
11
12
12
-
Aggregations take a step back. Instead of looking for individual documents,
13
-
we want to analyze and summarize our complete set of data.
13
+
// perhaps "zoom out to get an overview"?
14
+
// something about "showing users the data that exists in your index, leading them to the right results"?
15
+
Aggregations take a step back. Instead of looking for individual documents,
16
+
we want to analyze and summarize our complete set of data.
14
17
15
-
- How many needles are in the haystack?
16
-
- What is the average length of the needles?
18
+
// Popular manufacturers? Unusual clumps of needles in the haystack?
19
+
- How many needles are in the haystack?
20
+
- What is the average length of the needles?
17
21
- What is the median length of needle broken down by manufacturer?
18
22
- How many needles were added to the haystack each month?
19
23
20
24
Aggregations allow us to ask sophisticated questions of our data. And yet, while
21
-
the functionality is completely different from search, it leverages the
25
+
the functionality is completely different from search, it leverages the
22
26
same data-structures. This means aggregations execute quickly and are
23
27
_near-realtime_, just like search.
24
28
29
+
// perhaps hadoop instead of sql? reputation for slowness
30
+
// "slice ... realtime" -> "visualize your data in realtime, allowing you to respond immediately"
25
31
This is extremely powerful for reporting and dashboards. Instead of performing
26
32
nightly "rollups" of your data (_e.g. large, batch SQL joins which
27
33
are run nightly on a cron job because they are so slow_), you can slice and dice
28
34
your data in realtime.
29
35
36
+
// Perhaps mention "not precalculated, out of date, and irrelevant"?
37
+
// Perhaps "aggs are calculated in the context of the user's search, so you're not showing them that you have 10 4 star hotels on your site, but that you have 10 4 star hotels that *match their criteria*".
30
38
Finally, aggregations operate alongside search requests, which means you can
31
39
both search/filter documents _and_ perform analytics at the same time, on the
32
40
same data, in a single request.
33
41
42
+
// for aggs -> for analytics?
34
43
Aggregations are so powerful that many companies have built large Elasticsearch
Copy file name to clipboardExpand all lines: 300_Aggregations/20_basic_example.asciidoc
+24-12Lines changed: 24 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
-
1
+
// This section feels like you're worrying too much about explaining the syntax, rather than the point of aggs. By this stage in the book, people should be used to the ES api, so I think we can assume more. I'd change the emphasis here and state that intention: we want to find out what the most popular colours are. To do that we'll use a "terms" agg, which counts up every term in the "color" field and returns the 10 most popular.
2
+
// Step two: Add a query, to show that the aggs are calculated live on the results from the user's query.
2
3
=== Aggregation Test-drive
3
4
4
5
We could spend the next few pages defining the various aggregations
5
-
available and their syntax, but aggregations are truly best learned by example.
6
+
and their syntax, but aggregations are truly best learned by example.
6
7
Once you learn how to think about aggregations, and how to nest them appropriately,
7
8
the syntax is fairly trivial.
8
9
@@ -38,7 +39,8 @@ Now that we have some data, let's construct our first aggregation. A car dealer
38
39
may want to know which color car sells the best. This is easily accomplished
39
40
using a simple aggregation.
40
41
41
-
The syntax may look overwhelming at first, but hold on...we'll decompose the query
42
+
// I don't think it's overwhelming, and users probably won't either... unless you mention it ;)
43
+
The syntax may look overwhelming at first, but hold on... we'll decompose the query
42
44
and discuss what each portion means. First, the aggregation:
43
45
44
46
[source,js]
@@ -48,42 +50,48 @@ GET /cars/transactions/_search?search_type=count <1>
// Add the search_type=count thing as a sidebar, so it doesn't get in the way
58
62
<1> Because we don't care about search results, we are going to use the `count`
59
-
search_type, which will be faster
60
-
<2> Aggregations are placed under the top-level `"aggs"` (the longer `"aggregations"`
63
+
<<search-type,`search_type`>, which will be faster.
64
+
<2> Aggregations are placed under the top-level `"aggs"` parameter (the longer `"aggregations"`
61
65
will also work if you prefer that)
62
66
<3> We then name the aggregation whatever we want -- "colors" in this example
63
67
<4> Finally, we define a single bucket of type `terms`
64
68
69
+
// Meh - Point here is that aggregations are executed in the context of the search results, rather than which endpoint is used.
65
70
The first thing to notice is that aggregations are executed as a search, using the
66
71
`/_search` endpoint. As mentioned at the top of the chapter, aggregations are built
67
72
from the same data structures that power search, which means they use the same
68
-
endpoint. Aggregations are also defined as a top-level parameter, just like using
69
-
`"query"` for search.
73
+
endpoint. Aggregations are also defined as a top-level parameter, just like using
74
+
`"query"` for search.
70
75
76
+
// Delete this and make the point in the para above. It feels like you're scared to introduce the idea of context this early. I think it's OK. You don't have to explain how to change the context yet, but at least make the point that there is one.
71
77
.Can you use aggregations and queries together?
72
78
****
73
79
Absolutely! But hold that thought, we'll discuss it later in <todo>
74
80
****
75
81
82
+
// I think it is OK to assume that naming aggs is a good idea. Probably easier to make the point if you name it "popular_colors"
76
83
Next we define a name for our aggregation. This is entirely up to you...
77
84
the response will be labeled with the name you provide so that your application
78
85
can parse the results. You may also specify more than one aggregation per search
79
86
request, so giving each aggregation a unique, identifiable name is important
80
87
(we'll look at an example of this later).
81
88
82
89
Next we define the aggregation itself. For this example, we are defining
83
-
a single `terms` bucket. The `terms` bucket will dynamically create a new
84
-
bucket for every unique term it encounters. Since we are telling it to use the
90
+
a single `terms` bucket. The `terms` bucket will dynamically create a new
91
+
bucket for every unique term it encounters. Since we are telling it to use the
85
92
"color" field, the `terms` bucket will dynamically create a new bucket for each color.
86
93
94
+
// Trim the results here. By this stage people have gone through 300 pages, so they should be familiar with what ES returns. Also, they can execute the query themselves in Sense
87
95
Let's execute that aggregation and take a look at the results:
88
96
89
97
[source,js]
@@ -120,16 +128,20 @@ Let's execute that aggregation and take a look at the results:
120
128
<1> No search hits are returned because we used the `search_type=count` param
121
129
<2> Our "colors" aggregation is returned as part of the "aggregations" field
122
130
<3> The key to each bucket corresponds to a unique term found in the "color" field
131
+
132
+
// Perhaps: We always get back the `doc_count` metric which tells us how many documents contained this term.
133
+
123
134
<4> The count of each bucket represents the number of documents with this color
124
135
125
136
126
137
The response contains a list of buckets, each corresponding to a unique color
127
-
(red, green, etc). Each bucket also includes a count of how many documents
138
+
(red, green, etc). Each bucket also includes a count of how many documents
128
139
"fell into" that particular bucket. For example, there are four red cars.
129
140
130
141
Before we move on, there are some important yet not immediately obvious things
131
142
to point out.
132
143
144
+
// Delete the above line and make the realtime point in a para, which says that you could pipe this into a graphing library and display a dashboard showing real time trends. As soon as you sell a silver car, it'll show up in the graph. (And no need for the last sentence)
133
145
- The buckets were created dynamically. Our application had no prior knowledge about
134
146
how many colors in the index. If you were to index a "silver" car next, a new
135
147
"silver" bucket would automatically appear in the response.
@@ -139,7 +151,7 @@ directly into graphing libraries for near real-time dashboards
139
151
- The aggregation is operating on all of the documents in your index at the moment.
140
152
This can be changed, which we will talk about <here>.
0 commit comments