@@ -6,24 +6,197 @@ orphan: true
6
6
7
7
# Clustering
8
8
9
- :::{todo} Implement.
10
9
11
- About scalability through partitioning, sharding, and replication.
12
- Also about cross cluster replication.
10
+ :::{include} /_ include/links.md
13
11
:::
12
+ :::{include} /_ include/styles.html
13
+ :::
14
+
15
+ <style >
16
+ .field-list dd {
17
+ margin-bottom : 1em !important ;
18
+ }
19
+ .field-list p {
20
+ margin-bottom : 0.5em ;
21
+ }
22
+ </style >
23
+
24
+
25
+ :::::{grid}
26
+ :padding: 0
27
+
28
+ ::::{grid-item}
29
+ :class: rubric-slim
30
+ :columns: auto 9 9 9
14
31
32
+ ** CrateDB provides scalability through partitioning, sharding, and replication.**
15
33
16
- :::{seealso}
17
- ** Domains:**
18
- [ ] ( #metrics-store ) •
19
- [ ] ( #analytics ) •
20
- [ ] ( #industrial ) •
21
- [ ] ( #timeseries ) •
22
- [ ] ( #machine-learning )
34
+ :::{rubric} Overview
35
+ :::
36
+ CrateDB uses a shared nothing architecture to form high-availability, resilient
37
+ database clusters with minimal effort of configuration, effectively implementing
38
+ a distributed SQL database.
39
+
40
+ :::{rubric} About
41
+ :::
42
+ CrateDB relies on Lucene for storage and inherits components from Elasticsearch/
43
+ OpenSearch for cluster consensus. Fundamental concepts of CrateDB are familiar
44
+ to Elasticsearch users, because the fundamental implementation is actually the same.
23
45
24
- ** Product:**
25
- [ Relational Database]
46
+ :::{rubric} Details
26
47
:::
27
48
49
+ Sharding and partitioning are techniques used to distribute data evenly across
50
+ multiple nodes in a cluster, ensuring data scalability, availability, and
51
+ performance.
52
+
53
+ Replication can be applied to increase redundancy, which reduces the chance of
54
+ data loss, and to improve read performance.
55
+
56
+ :Sharding:
57
+
58
+ In CrateDB, tables are split into a configured number of shards. Then, the
59
+ shards are distributed across multiple nodes of the database cluster.
60
+ Each shard in CrateDB is stored in a dedicated Lucene index.
61
+
62
+ You can think of shards as a self-contained part of a table, that includes
63
+ both a subset of records and the corresponding indexing structures.
64
+
65
+ Figuring out how many shards to use for your tables requires you to think about
66
+ the type of data you are processing, the types of queries you are running, and
67
+ the type of hardware you are using.
68
+
69
+ :Partitioning:
70
+
71
+ CrateDB also supports splitting up data across another dimension with
72
+ partitioning.
73
+ Tables can be partitioned by defining partition columns.
74
+ You can think of a partition as a set of shards.
75
+
76
+ - Partitioned tables optimize access efficiency when querying data, because only
77
+ a subset of data needs to be addressed and acquired.
78
+ - Each partition can be backed up and restored individually, for efficient operations.
79
+ - Tables allow to change the number of shards even after creation time for future
80
+ partitions. This feature enables you to start out with few shards per partition,
81
+ and scale up the number of shards for later partitions once traffic
82
+ and ingest rates increase over the lifetime of your application or system.
83
+
84
+ :Replication:
85
+
86
+ You can configure CrateDB to replicate tables. When you configure replication,
87
+ CrateDB will ensure that every table shard has one or more copies available
88
+ at all times.
89
+
90
+ Replication can also improve read performance because any increase in the
91
+ number of shards distributed across a cluster also increases the
92
+ opportunities for CrateDB to parallelize query execution across multiple nodes.
93
+
94
+ ::::
95
+
96
+ ::::{grid-item}
97
+ :class: rubric-slim
98
+ :columns: auto 3 3 3
99
+
100
+ :::{rubric} Concepts
101
+ :::
102
+ - {ref}` crate-reference:concept-clustering `
103
+ - {ref}` crate-reference:concept-storage-consistency `
104
+ - {ref}` crate-reference:concept-resiliency `
105
+
106
+ :::{rubric} Reference Manual
107
+ :::
108
+ - {ref}` crate-reference:ddl-sharding `
109
+ - {ref}` crate-reference:partitioned-tables `
110
+ - {ref}` Partition columns <gloss-partition-column> `
111
+ - {ref}` crate-reference:ddl-replication `
112
+
113
+ {tags-primary}` Clustering `
114
+ {tags-primary}` Sharding `
115
+ {tags-primary}` Partitioning `
116
+ {tags-primary}` Replication `
117
+ ::::
118
+
119
+ :::::
120
+
121
+
122
+ ## Synopsis
123
+ With a monthly throughput of 300 GB, partitioning your table by month,
124
+ and using six shards, each shard will manage 50 GB of data, which is
125
+ within the recommended size range (5 - 100 GB).
126
+
127
+ Through replication, the table will store three copies of your data,
128
+ in order to reduce the chance of permanent data loss.
129
+ ``` sql
130
+ CREATE TABLE timeseries_table (
131
+ ts TIMESTAMP ,
132
+ val DOUBLE PRECISION ,
133
+ part GENERATED ALWAYS AS date_trunc(' month' , ts)
134
+ )
135
+ CLUSTERED INTO 6 SHARDS
136
+ PARTITIONED BY (part)
137
+ WITH (number_of_replicas = 2 );
138
+ ```
139
+
140
+
141
+ ## Learn
142
+ Individual characteristics and shapes of data need different sharding and
143
+ partitioning strategies. Learn about the details of shard allocation, that
144
+ will support you to choose the right strategy for your data and your most
145
+ prominent types of workloads.
146
+
147
+ ::::{grid} 2 2 2 2
148
+ :padding: 0
149
+
150
+ :::{grid-item-card}
151
+ :link : sharding-partitioning
152
+ :link-type: ref
153
+ :link-alt: Sharding and Partitioning
154
+ :padding: 3
155
+ :class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
156
+ :class-body: sd-text-center2 sd-fs2-5
157
+ :class-footer: text-smaller
158
+ Sharding and Partitioning
159
+ ^^^
160
+ - Introduction to the concepts of sharding and partitioning.
161
+ - Learn how to choose a strategy that fits your needs.
162
+ +++
163
+ {material-outlined}` lightbulb;1.8em `
164
+ An in-depth guide on how to configure sharding and partitioning,
165
+ presenting best practices and examples.
166
+ :::
167
+
168
+ :::{grid-item-card}
169
+ :link : sharding-performance
170
+ :link-type: ref
171
+ :link-alt: Sharding and Partitioning
172
+ :padding: 3
173
+ :class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
174
+ :class-body: sd-text-center2 sd-fs2-5
175
+ :class-footer: text-smaller
176
+ Sharding Performance Guide
177
+ ^^^
178
+ - Optimising for query performance.
179
+ - Optimising for ingestion performance.
180
+ +++
181
+ {material-outlined}` lightbulb;1.8em `
182
+ Guidelines about balancing your strategy to yield the best performance for your workloads.
183
+ :::
184
+
185
+ :::{grid-item-card}
186
+ :link : https://community.cratedb.com/t/sharding-and-partitioning-guide-for-time-series-data/737
187
+ :link-alt: Sharding and partitioning guide for time-series data
188
+ :padding: 3
189
+ :class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
190
+ :class-body: sd-text-center2 sd-fs2-5
191
+ :class-footer: text-smaller
192
+ Sharding and partitioning guide for time-series data
193
+ ^^^
194
+ A hands-on walkthrough to support you with building a sharding and partitioning
195
+ strategy for your time series data.
196
+ +++
197
+ {material-outlined}` lightbulb;1.8em `
198
+ Includes details about partitioning, sharding, and replication. Gives valuable
199
+ advises about relevant topic matters.
200
+ :::
28
201
29
- [ Relational Database ] : https://cratedb.com/solutions/relational-database
202
+ ::::
0 commit comments