@@ -224,7 +224,10 @@ and y in [9,11], the engine does the following:
224224
225225## Doc values
226226
227- Until Lucene 4.0 columns were indexed using an inverted index data structure
227+ :::{rubric} Data storage prior to Lucene 4.0
228+ :::
229+
230+ Until Lucene 4.0, columns were indexed using an inverted index data structure
228231that maps terms to document ids. For searching documents by terms, this approach
229232is effective and well-suited.
230233However, if we have to find field values given document id, this solution
@@ -234,12 +237,29 @@ retrieval of data, it was necessary to traverse and extract all fields that
234237appear in the collection of documents. This can cause memory and performance
235238issues if we need to extract a large amount of data.
236239
240+ :::{rubric} What are doc values?
241+ :::
242+
237243To improve the performance of aggregations and sorting, a new data structure was
238244introduced, namely doc values. Doc values is a column-based data storage built
239245at document index time. They store all field values that are not analyzed as
240246strings in a compact column, making it more effective for sorting and
241247aggregations.
242248
249+ > Doc values are Lucene's column-stride field value storage, letting you
250+ store numerics (single- or multivalued), sorted keywords (single or
251+ multivalued) and binary data blobs per document.
252+ These values are quite fast to access at search time, since they are
253+ stored column-stride such that only the value for that one field needs
254+ to be decoded per hit. This is in contrast to Lucene's stored document
255+ fields, which store all field values for one document together in a
256+ row-stride fashion, and are therefore relatively slow to access.
257+ >
258+ > -- [ Document values with Apache Lucene]
259+
260+ :::{rubric} CrateDB's column store
261+ :::
262+
243263CrateDB implements Column Store based on doc values in Lucene. The Column Store
244264is created for each field in a document and generated as the following
245265structures for fields in the Product table:
@@ -275,3 +295,6 @@ the following:
275295
276296The use of Column Store results in a small disk footprint, thanks to specialized
277297compression algorithms such as delta encoding, bit packing, and GCD.
298+
299+
300+ [ Document values with Apache Lucene ] : https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene
0 commit comments