Skip to content

Commit 533fda0

Browse files
committed
Feature / Clustering: Implement page
1 parent c354fb2 commit 533fda0

File tree

4 files changed

+197
-17
lines changed

4 files changed

+197
-17
lines changed

docs/feature/cluster/index.md

Lines changed: 186 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,197 @@ orphan: true
66

77
# Clustering
88

9-
:::{todo} Implement.
109

11-
About scalability through partitioning, sharding, and replication.
12-
Also about cross cluster replication.
10+
:::{include} /_include/links.md
1311
:::
12+
:::{include} /_include/styles.html
13+
:::
14+
15+
<style>
16+
.field-list dd {
17+
margin-bottom: 1em !important;
18+
}
19+
.field-list p {
20+
margin-bottom: 0.5em;
21+
}
22+
</style>
23+
24+
25+
:::::{grid}
26+
:padding: 0
27+
28+
::::{grid-item}
29+
:class: rubric-slim
30+
:columns: auto 9 9 9
1431

32+
**CrateDB provides scalability through partitioning, sharding, and replication.**
1533

16-
:::{seealso}
17-
**Domains:**
18-
[](#metrics-store)
19-
[](#analytics)
20-
[](#industrial)
21-
[](#timeseries)
22-
[](#machine-learning)
34+
:::{rubric} Overview
35+
:::
36+
CrateDB uses a shared nothing architecture to form high-availability, resilient
37+
database clusters with minimal effort of configuration, effectively implementing
38+
a distributed SQL database.
39+
40+
:::{rubric} About
41+
:::
42+
CrateDB relies on Lucene for storage and inherits components from Elasticsearch/
43+
OpenSearch for cluster consensus. Fundamental concepts of CrateDB are familiar
44+
to Elasticsearch users, because the fundamental implementation is actually the same.
2345

24-
**Product:**
25-
[Relational Database]
46+
:::{rubric} Details
2647
:::
2748

49+
Sharding and partitioning are techniques used to distribute data evenly across
50+
multiple nodes in a cluster, ensuring data scalability, availability, and
51+
performance.
52+
53+
Replication can be applied to increase redundancy, which reduces the chance of
54+
data loss, and to improve read performance.
55+
56+
:Sharding:
57+
58+
In CrateDB, tables are split into a configured number of shards. Then, the
59+
shards are distributed across multiple nodes of the database cluster.
60+
Each shard in CrateDB is stored in a dedicated Lucene index.
61+
62+
You can think of shards as a self-contained part of a table, that includes
63+
both a subset of records and the corresponding indexing structures.
64+
65+
Figuring out how many shards to use for your tables requires you to think about
66+
the type of data you are processing, the types of queries you are running, and
67+
the type of hardware you are using.
68+
69+
:Partitioning:
70+
71+
CrateDB also supports splitting up data across another dimension with
72+
partitioning.
73+
Tables can be partitioned by defining partition columns.
74+
You can think of a partition as a set of shards.
75+
76+
- Partitioned tables optimize access efficiency when querying data, because only
77+
a subset of data needs to be addressed and acquired.
78+
- Each partition can be backed up and restored individually, for efficient operations.
79+
- Tables allow to change the number of shards even after creation time for future
80+
partitions. This feature enables you to start out with few shards per partition,
81+
and scale up the number of shards for later partitions once traffic
82+
and ingest rates increase over the lifetime of your application or system.
83+
84+
:Replication:
85+
86+
You can configure CrateDB to replicate tables. When you configure replication,
87+
CrateDB will ensure that every table shard has one or more copies available
88+
at all times.
89+
90+
Replication can also improve read performance because any increase in the
91+
number of shards distributed across a cluster also increases the
92+
opportunities for CrateDB to parallelize query execution across multiple nodes.
93+
94+
::::
95+
96+
::::{grid-item}
97+
:class: rubric-slim
98+
:columns: auto 3 3 3
99+
100+
:::{rubric} Concepts
101+
:::
102+
- {ref}`crate-reference:concept-clustering`
103+
- {ref}`crate-reference:concept-storage-consistency`
104+
- {ref}`crate-reference:concept-resiliency`
105+
106+
:::{rubric} Reference Manual
107+
:::
108+
- {ref}`crate-reference:ddl-sharding`
109+
- {ref}`crate-reference:partitioned-tables`
110+
- {ref}`Partition columns <gloss-partition-column>`
111+
- {ref}`crate-reference:ddl-replication`
112+
113+
{tags-primary}`Clustering`
114+
{tags-primary}`Sharding`
115+
{tags-primary}`Partitioning`
116+
{tags-primary}`Replication`
117+
::::
118+
119+
:::::
120+
121+
122+
## Synopsis
123+
With a monthly throughput of 300 GB, partitioning your table by month,
124+
and using six shards, each shard will manage 50 GB of data, which is
125+
within the recommended size range (5 - 100 GB).
126+
127+
Through replication, the table will store three copies of your data,
128+
in order to reduce the chance of permanent data loss.
129+
```sql
130+
CREATE TABLE timeseries_table (
131+
ts TIMESTAMP,
132+
val DOUBLE PRECISION,
133+
part GENERATED ALWAYS AS date_trunc('month', ts)
134+
)
135+
CLUSTERED INTO 6 SHARDS
136+
PARTITIONED BY (part)
137+
WITH (number_of_replicas = 2);
138+
```
139+
140+
141+
## Learn
142+
Individual characteristics and shapes of data need different sharding and
143+
partitioning strategies. Learn about the details of shard allocation, that
144+
will support you to choose the right strategy for your data and your most
145+
prominent types of workloads.
146+
147+
::::{grid} 2 2 2 2
148+
:padding: 0
149+
150+
:::{grid-item-card}
151+
:link: sharding-partitioning
152+
:link-type: ref
153+
:link-alt: Sharding and Partitioning
154+
:padding: 3
155+
:class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
156+
:class-body: sd-text-center2 sd-fs2-5
157+
:class-footer: text-smaller
158+
Sharding and Partitioning
159+
^^^
160+
- Introduction to the concepts of sharding and partitioning.
161+
- Learn how to choose a strategy that fits your needs.
162+
+++
163+
{material-outlined}`lightbulb;1.8em`
164+
An in-depth guide on how to configure sharding and partitioning,
165+
presenting best practices and examples.
166+
:::
167+
168+
:::{grid-item-card}
169+
:link: sharding-performance
170+
:link-type: ref
171+
:link-alt: Sharding and Partitioning
172+
:padding: 3
173+
:class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
174+
:class-body: sd-text-center2 sd-fs2-5
175+
:class-footer: text-smaller
176+
Sharding Performance Guide
177+
^^^
178+
- Optimising for query performance.
179+
- Optimising for ingestion performance.
180+
+++
181+
{material-outlined}`lightbulb;1.8em`
182+
Guidelines about balancing your strategy to yield the best performance for your workloads.
183+
:::
184+
185+
:::{grid-item-card}
186+
:link: https://community.cratedb.com/t/sharding-and-partitioning-guide-for-time-series-data/737
187+
:link-alt: Sharding and partitioning guide for time-series data
188+
:padding: 3
189+
:class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
190+
:class-body: sd-text-center2 sd-fs2-5
191+
:class-footer: text-smaller
192+
Sharding and partitioning guide for time-series data
193+
^^^
194+
A hands-on walkthrough to support you with building a sharding and partitioning
195+
strategy for your time series data.
196+
+++
197+
{material-outlined}`lightbulb;1.8em`
198+
Includes details about partitioning, sharding, and replication. Gives valuable
199+
advises about relevant topic matters.
200+
:::
28201

29-
[Relational Database]: https://cratedb.com/solutions/relational-database
202+
::::

docs/feature/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,14 @@ industry-standard SQL.
3030
::::{grid-item-card} {material-outlined}`group;2em` Operational
3131
:::{toctree}
3232
:maxdepth: 1
33+
3334
connectivity/index
3435
:::
36+
:::{toctree}
37+
:maxdepth: 1
38+
39+
cluster/index
40+
:::
3541
+++
3642
CrateDB scales horizontally using a shared-nothing
3743
architecture, inherited from Elasticsearch.

docs/performance/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,6 @@ performance tuning and sharding.
1212
.. toctree::
1313
:maxdepth: 2
1414

15-
sharding
15+
Sharding <sharding>
1616
inserts/index
1717
selects

docs/performance/sharding.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
.. _sharding_guide:
2+
.. _sharding-performance:
23

3-
==============
4-
Sharding Guide
5-
==============
4+
==========================
5+
Sharding Performance Guide
6+
==========================
67

78
This document is a sharding best practice guide for CrateDB.
89

0 commit comments

Comments
 (0)