Indexing and storage: Improve "doc values" section

amotl · amotl · commit 274f9836d820 · 2025-10-24T08:02:17.000+02:00
diff --git a/docs/feature/storage/indexing-and-storage.md b/docs/feature/storage/indexing-and-storage.md
@@ -224,7 +224,10 @@ and y in [9,11], the engine does the following:
 
 ## Doc values
 
-Until Lucene 4.0 columns were indexed using an inverted index data structure
+:::{rubric} Data storage prior to Lucene 4.0
+:::
+
+Until Lucene 4.0, columns were indexed using an inverted index data structure
 that maps terms to document ids. For searching documents by terms, this approach
 is effective and well-suited.
 However, if we have to find field values given document id, this solution
@@ -234,12 +237,29 @@ retrieval of data, it was necessary to traverse and extract all fields that
 appear in the collection of documents. This can cause memory and performance
 issues if we need to extract a large amount of data.
 
+:::{rubric} What are doc values?
+:::
+
 To improve the performance of aggregations and sorting, a new data structure was
 introduced, namely doc values. Doc values is a column-based data storage built
 at document index time. They store all field values that are not analyzed as
 strings in a compact column, making it more effective for sorting and
 aggregations.
 
+> Doc values are Lucene's column-stride field value storage, letting you
+store numerics (single- or multivalued), sorted keywords (single or
+multivalued) and binary data blobs per document.
+These values are quite fast to access at search time, since they are
+stored column-stride such that only the value for that one field needs
+to be decoded per hit. This is in contrast to Lucene's stored document
+fields, which store all field values for one document together in a
+row-stride fashion, and are therefore relatively slow to access.
+>
+> -- [Document values with Apache Lucene]
+
+:::{rubric} CrateDB's column store
+:::
+
 CrateDB implements Column Store based on doc values in Lucene. The Column Store
 is created for each field in a document and generated as the following
 structures for fields in the Product table:
@@ -275,3 +295,6 @@ the following:
 
 The use of Column Store results in a small disk footprint, thanks to specialized
 compression algorithms such as delta encoding, bit packing, and GCD.
+
+
+[Document values with Apache Lucene]: https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene