Skip to content

Commit 33e28fe

Browse files
authored
Merge branch 'main' into faster-shard-scaling
2 parents 0c8e9ca + 98098bb commit 33e28fe

File tree

47 files changed

+1142
-143
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1142
-143
lines changed

.github/workflows/ci.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,8 @@ jobs:
131131
if: always() && steps.modified.outputs.rust_src == 'true'
132132
uses: taiki-e/cache-cargo-install-action@v2
133133
with:
134-
tool: cargo-deny
134+
# 0.18 requires rustc 1.85
135+
135136
- name: cargo clippy
136137
if: always() && steps.modified.outputs.rust_src == 'true'
137138
run: cargo clippy --workspace --tests --all-features

LICENSE-3rdparty.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ difflib,https://github.com/DimaKudosh/difflib,MIT,Dima Kudosh <dimakudosh@gmail.
117117
digest,https://github.com/RustCrypto/traits,MIT OR Apache-2.0,RustCrypto Developers
118118
displaydoc,https://github.com/yaahc/displaydoc,MIT OR Apache-2.0,Jane Lusby <[email protected]>
119119
downcast,https://github.com/fkoep/downcast-rs,MIT,Felix Köpge <[email protected]>
120-
downcast-rs,https://github.com/marcianx/downcast-rs,MIT OR Apache-2.0,"Ashish Myles <[email protected]>, Runji Wang <[email protected]>"
120+
downcast-rs,https://github.com/marcianx/downcast-rs,MIT OR Apache-2.0,The downcast-rs Authors
121121
dtoa,https://github.com/dtolnay/dtoa,MIT OR Apache-2.0,David Tolnay <[email protected]>
122122
dyn-clone,https://github.com/dtolnay/dyn-clone,MIT OR Apache-2.0,David Tolnay <[email protected]>
123123
ecdsa,https://github.com/RustCrypto/signatures/tree/master/ecdsa,Apache-2.0 OR MIT,RustCrypto Developers

docs/configuration/index-config.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -365,7 +365,7 @@ fast:
365365
| `description` | Optional description for the field. | `None` |
366366
| `stored` | Whether value is stored in the document store | `true` |
367367
| `indexed` | Whether value is indexed | `true` |
368-
| `fast` | Whether value is stored in a fast field. The default behaviour for text in the JSON is to store the text unchanged. An normalizer can be configured via `normalizer: lowercase`. ([See normalizers](#description-of-available-normalizers)) for a list of available normalizers. | `true` |
368+
| `fast` | Whether value is stored in a fast field. The default behaviour for text in the JSON is to store the text unchanged. An normalizer can be configured via `normalizer: lowercase`. ([See normalizers](#description-of-available-normalizers)) for a list of available normalizers. | `false` |
369369
| `tokenizer` | **Only affects strings in the json object**. Name of the `Tokenizer`, choices between `raw`, `default`, `en_stem` and `chinese_compatible` | `raw` |
370370
| `record` | **Only affects strings in the json object**. Describes the amount of information indexed, choices between `basic`, `freq` and `position` | `basic` |
371371
| `expand_dots` | If true, json keys containing a `.` should be expanded. For instance, if `expand_dots` is set to true, `{"k8s.node.id": "node-2"}` will be indexed as if it was `{"k8s": {"node": {"id": "node2"}}}`. The benefit is that escaping the `.` will not be required at query time. In other words, `k8s.node.id:node2` will match the document. This does not impact the way the document is stored. | `true` |
@@ -497,7 +497,7 @@ doc_mapping:
497497
dynamic_mapping:
498498
indexed: true
499499
stored: true
500-
tokenizer: default
500+
tokenizer: raw
501501
record: basic
502502
expand_dots: true
503503
fast: true

docs/configuration/node-config.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,7 @@ This section contains the configuration options for an indexer. The split store
159159
| `merge_concurrency` | Maximum number of merge operations that can be executed on the node at one point in time. | `(2 x num threads available) / 3` |
160160
| `enable_otlp_endpoint` | If true, enables the OpenTelemetry exporter endpoint to ingest logs and traces via the OpenTelemetry Protocol (OTLP). | `false` |
161161
| `cpu_capacity` | Advisory parameter used by the control plane. The value can expressed be in threads (e.g. `2`) or in term of millicpus (`2000m`). The control plane will attempt to schedule indexing pipelines on the different nodes proportionally to the cpu capacity advertised by the indexer. It is NOT used as a limit. All pipelines will be scheduled regardless of whether the cluster has sufficient capacity or not. The control plane does not attempt to spread the work equally when the load is well below the `cpu_capacity`. Users who need a balanced load on all of their indexer nodes can set the `cpu_capacity` to an arbitrarily low value as long as they keep it proportional to the number of threads available. | `num threads available` |
162+
| `enable_cooperative_indexing` | Enable sharing resources more efficiently when the number of indexes actively written to is significantly higher than the number of cores but might decrease the overall indexing throughput. | `false` |
162163

163164
Example:
164165

@@ -205,7 +206,7 @@ This section contains the configuration options for a Searcher.
205206

206207
### Searcher split cache configuration
207208

208-
This section contains the configuration options for the on disk searcher split cache.
209+
This section contains the configuration options for the on-disk searcher split cache. Files are stored in the data directory under `searcher-split-cache/`.
209210

210211
| Property | Description | Default value |
211212
| --- | --- | --- |

docs/configuration/template-config.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
title: Index template configuration
3+
sidebar_position: 7
4+
toc_max_heading_level: 4
5+
---
6+
7+
This page describes how to configure an index template.
8+
9+
Index templates let you dynamically create indexes according to predefined rules. Templates are used automatically when documents are received on the ingest API for an index that doesn't exist.
10+
11+
The index template configuration lets you define the following parameters:
12+
- `template_id` (required)
13+
- `description`
14+
- `index_id_patterns` (required)
15+
- `index_root_uri`
16+
- `priority`
17+
18+
Besides, the following parameters can also be configured and are the same as those found in the [index configuration](../configuration/index-config.md):
19+
- doc mapping (required)
20+
- indexing settings
21+
- search settings
22+
- retention policy
23+
24+
You can manage templates using the [index template API](../reference/rest-api.md#index-template-api).
25+
26+
## Config file format
27+
28+
The index configuration format is YAML or JSON. When a key is absent from the configuration file, the default value is used.
29+
Here is a complete example:
30+
31+
```yaml
32+
version: 0.9 # File format version.
33+
34+
template_id: "hdfs-dev"
35+
36+
index_root_uri: "s3://my-bucket/hdfs-dev/"
37+
38+
description: "HDFS log management dev"
39+
40+
index_id_patterns:
41+
- hdfs-dev-*
42+
- hdfs-staging-*
43+
44+
priority: 100
45+
46+
doc_mapping:
47+
mode: lenient
48+
field_mappings:
49+
- name: timestamp
50+
type: datetime
51+
input_formats:
52+
- unix_timestamp
53+
output_format: unix_timestamp_secs
54+
fast_precision: seconds
55+
fast: true
56+
- name: severity_text
57+
type: text
58+
tokenizer: raw
59+
fast:
60+
- tokenizer: lowercase
61+
- name: body
62+
type: text
63+
tokenizer: default
64+
record: position
65+
- name: resource
66+
type: object
67+
field_mappings:
68+
- name: service
69+
type: text
70+
tokenizer: raw
71+
tag_fields: ["resource.service"]
72+
timestamp_field: timestamp
73+
index_field_presence: true
74+
75+
search_settings:
76+
default_search_fields: [severity_text, body]
77+
78+
retention:
79+
period: 90 days
80+
schedule: daily
81+
```
82+
83+
## Template ID
84+
85+
The `template_id` is a string that uniquely identifies the index template within the metastore. It may only contain uppercase or lowercase ASCII letters, digits, hyphens (`-`), and underscores (`_`). It must start with a letter and contain at least 3 characters but no more than 255.
86+
87+
## Description
88+
89+
An optional string that describes what the index template is used for.
90+
91+
## Index root uri
92+
93+
The `index_root_uri` defines where the index files (also called splits) should be stored.
94+
This parameter expects a [storage uri](storage-config#storage-uris).
95+
96+
The actual URI of the index is the path concatenation of the `index_root_uri` with the index id.
97+
98+
If `index_root_uri` is not defined, the `default_index_root_uri` from [Quickwit's node config](node-config) will be used.
99+
100+
## Index ID patterns
101+
102+
`index_id_patterns` is a list of strings that define which indices should be created according to this template. Use [glob-like](https://en.wikipedia.org/wiki/Glob_(programming)) wildcard ( \* ) expressions to target indices that match a pattern: test\* or \*test or te\*t or \*test\*. You can also use negative patterns by prepending the hyphen `-` character.
103+
104+
Patterns must obey the following rules:
105+
- It must follow the regex `^-?[a-zA-Z\*][a-zA-Z0-9-_\.\*]{0,254}$`.
106+
- It cannot contain consecutive asterisks (`*`).
107+
- If it does not contain an asterisk (`*`), the length must be greater than or equal to 3 characters.
108+
109+
## Priority
110+
111+
When multiple templates match a new index ID, the template with the highest `priority` is used to configure the index.

docs/ingest-data/ingest-api.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ version: 0.7
2626
index_id: stackoverflow-schemaless
2727
doc_mapping:
2828
mode: dynamic
29+
dynamic_mapping:
30+
tokenizer: default
2931
indexing_settings:
3032
commit_timeout_secs: 30
3133
EOF
@@ -35,6 +37,8 @@ EOF
3537
curl -XPOST -H 'Content-Type: application/yaml' 'http://localhost:7280/api/v1/indexes' --data-binary @stackoverflow-schemaless-config.yaml
3638
```
3739

40+
Note that for this example, we configure the dynamic mapping to use the [default tokenizer](../configuration/index-config.md#description-of-available-tokenizers). This is necessary to enable full-text search on all text fields.
41+
3842
## Ingest data
3943

4044
Let's first download a sample of the [StackOverflow dataset](https://www.kaggle.com/stackoverflow/stacksample).
@@ -83,6 +87,6 @@ By default, both ingestion services are enabled and ingest V2 is used. You can t
8387

8488
:::note
8589

86-
These configuration drive the ingest service used both by the `api/v1/<index-id>/ingest` endpoint and the [bulk API](../reference/es_compatible_api.md#_bulk--batch-ingestion-endpoint).
90+
These configurations drive the ingest service used both by the `api/v1/<index-id>/ingest` endpoint and the [bulk API](../reference/es_compatible_api.md#_bulk--batch-ingestion-endpoint).
8791

8892
:::

docs/operating/data-directory.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ This directory is used for caching splits that will undergo a merge operation to
5454

5555
You can [configure](../configuration/node-config#indexer-configuration) the number of splits the cache can hold with `split_store_max_num_splits` and limit the overall size in bytes of splits with `split_store_max_num_bytes`.
5656

57+
### `/searcher-split-cache` directory
58+
59+
This directory is used by searcher nodes to cache entire splits and reduce calls to the object store. It won't be created unless you set the `split_cache` fields in the [searcher configuration](../configuration/node-config.md#searcher-configuration).
60+
5761

5862
## Setting the right splits cache limits
5963

docs/overview/concepts/querying.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,10 +98,18 @@ Search stream queries can take a huge amount of RAM. Quickwit limits the number
9898

9999
Quickwit does caching in many places to deliver a highly performing query engine.
100100

101+
In memory:
102+
101103
- Hotcache caching: A static cache that holds information about a split file internal representation. It helps speed up the opening of a split file. Its size can be defined via the `split_footer_cache_capacity` configuration parameter.
102104
- Fast field caching: Fast fields tend to be accessed very frequently by users especially for stream requests. They are cached in a RAM whose size can be limited by the `fast_field_cache_capacity` configuration value.
103105
- Partial request caching: In some cases, like when using dashboards, some very similar requests might be issued, with only timestamp bounds changing. Some partial results can be cached to make these requests faster and issue less requests to the storage. They are cached in a RAM whose size can be limited by the `partial_request_cache_capacity` configuration value.
104106

107+
On disk:
108+
109+
- The split cache stores entire splits on disk. It can be enabled by setting the `split_cache` configuration fields. This cache can help reduce object store costs and load. Searchers populate this cache when splits are created or queried and evict them with a simple LRU strategy.
110+
111+
Learn more about cache parameters in the [searcher configuration docs](../../configuration/node-config.md#searcher-configuration).
112+
105113
### Scoring
106114

107115
Quickwit supports sorting docs by their BM25 scores. In order to query by score, [fieldnorms](../../configuration/index-config.md#text-type) must be enabled for the field. By default, BM25 scoring is disabled to improve query latencies but it can be opt-in by setting the `sort_by` option to `_score` in queries.

docs/reference/es_compatible_api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -749,7 +749,7 @@ The multi-target expression has the following constraints:
749749

750750
- It must follow the regex `^[a-zA-Z\*][a-zA-Z0-9-_\.\*]{0,254}$`.
751751
- It cannot contain consecutive asterisks (`*`).
752-
- If it contains an asterisk (`*`), the length must be greater than or equal to 3 characters.
752+
- If it does not contain an asterisk (`*`), the length must be greater than or equal to 3 characters.
753753

754754
### Examples
755755
```

docs/reference/rest-api.md

Lines changed: 116 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The following are some constrains about the multi-target expression.
9696

9797
- It must follow the regex `^[a-zA-Z\*][a-zA-Z0-9-_\.\*]{0,254}$`.
9898
- It cannot contain consecutive asterisks (`*`).
99-
- If it contains an asterisk (`*`), the length must be greater than or equal to 3 characters.
99+
- If it does not contain an asterisk (`*`), the length must be greater than or equal to 3 characters.
100100

101101
#### Examples
102102
```
@@ -794,3 +794,118 @@ Get the list of delete tasks for a given `index_id`.
794794
#### Response
795795

796796
The response is an array of `DeleteTask`.
797+
798+
799+
## Index template API
800+
801+
This API manages index template resources. Templates are higher level configuration objects used to automatically create indexes according to predefined rules. See [index template configuration](../configuration/template-config.md).
802+
803+
### Create a template
804+
805+
```
806+
POST api/v1/templates
807+
```
808+
809+
#### POST payload
810+
811+
Create an index template by posting a [template configuration](../configuration/template-config.md) payload. The API accepts JSON with the header `content-type: application/json` and YAML with `content-type: application/yaml`.
812+
813+
**Example**
814+
815+
```yaml
816+
version: 0.9 # File format version.
817+
818+
template_id: "all-logs"
819+
820+
index_root_uri: "s3://my-bucket/logs/"
821+
822+
description: "All my logs"
823+
824+
index_id_patterns:
825+
- logs-*
826+
827+
priority: 100
828+
829+
doc_mapping:
830+
mode: dynamic
831+
field_mappings:
832+
- name: timestamp
833+
type: datetime
834+
input_formats:
835+
- unix_timestamp
836+
output_format: unix_timestamp_secs
837+
fast: true
838+
timestamp_field: timestamp
839+
```
840+
841+
#### Response
842+
843+
The created index template configuration as JSON.
844+
845+
846+
### Update a template
847+
848+
```
849+
PUT api/v1/templates/<template id>
850+
```
851+
852+
#### Path variable
853+
854+
| Variable | Description |
855+
| ------------- | ------------- |
856+
| `template id` | The template id |
857+
858+
859+
#### POST payload
860+
861+
Update an index template by posting an [template configuration](../configuration/template-config.md) payload. The API accepts JSON with the header `content-type: application/json` and YAML with `content-type: application/yaml`.
862+
863+
**Example**
864+
865+
See [create endpoint](#create-a-template).
866+
867+
#### Response
868+
869+
The updated template configuration as JSON.
870+
871+
### List the templates
872+
873+
```
874+
GET api/v1/templates
875+
```
876+
877+
#### Response
878+
879+
An array with all the existing index template configurations as JSON.
880+
881+
### Get a template
882+
883+
```
884+
GET api/v1/templates/<template id>
885+
```
886+
887+
#### Path variable
888+
889+
| Variable | Description |
890+
| ------------- | ------------- |
891+
| `template id` | The template id |
892+
893+
#### Response
894+
895+
The requested index template configuration as JSON.
896+
897+
### Delete a template
898+
899+
```
900+
DELETE api/v1/templates/<template id>
901+
```
902+
903+
#### Path variable
904+
905+
| Variable | Description |
906+
| ------------- | ------------- |
907+
| `template id` | The template id |
908+
909+
#### Response
910+
911+
Empty response.

0 commit comments

Comments
 (0)