Skip to content

Commit 18fb775

Browse files
committed
fix broken anchors
1 parent 228809d commit 18fb775

File tree

16 files changed

+27
-28
lines changed

16 files changed

+27
-28
lines changed

docs/integrations/data-ingestion/aws-glue/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,6 @@ job.commit()
104104
</TabItem>
105105
</Tabs>
106106

107-
For more details, please visit our [Spark & JDBC documentation](/integrations/apache-spark#read-data).
107+
For more details, please visit our [Spark & JDBC documentation](/integrations/apache-spark/spark-jdbc#read-data).
108108

109109

docs/integrations/data-ingestion/data-formats/arrow-avro-orc.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ FORMAT Avro;
4848

4949
### Avro and ClickHouse data types {#avro-and-clickhouse-data-types}
5050

51-
Consider [data types matching](/interfaces/formats.md/#data_types-matching) when importing or exporting Avro files. Use explicit type casting to convert when loading data from Avro files:
51+
Consider [data types matching](/interfaces/formats/Avro#data-types-matching) when importing or exporting Avro files. Use explicit type casting to convert when loading data from Avro files:
5252

5353
```sql
5454
SELECT
@@ -100,7 +100,7 @@ INTO OUTFILE 'export.arrow'
100100
FORMAT Arrow
101101
```
102102

103-
Also, check [data types matching](/interfaces/formats.md/#data-types-matching-arrow) to know if any should be converted manually.
103+
Also, check [data types matching](/interfaces/formats/Arrow#data-types-matching) to know if any should be converted manually.
104104

105105
### Arrow data streaming {#arrow-data-streaming}
106106

@@ -150,7 +150,7 @@ FROM INFILE 'data.orc'
150150
FORMAT ORC;
151151
```
152152

153-
Also, check [data types matching](/interfaces/formats.md/#data-types-matching-orc) as well as [additional settings](/interfaces/formats.md/#parquet-format-settings) to tune export and import.
153+
Also, check [data types matching](/interfaces/formats/ORC) as well as [additional settings](/interfaces/formats/Parquet#format-settings) to tune export and import.
154154

155155
## Further reading {#further-reading}
156156

docs/integrations/data-ingestion/data-formats/binary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ FORMAT CapnProto
222222
SETTINGS format_schema = 'schema:PathStats'
223223
```
224224

225-
Note that we had to cast the `Date` column as `UInt32` to [match corresponding types](/interfaces/formats.md/#data_types-matching-capnproto).
225+
Note that we had to cast the `Date` column as `UInt32` to [match corresponding types](/interfaces/formats/CapnProto#data_types-matching-capnproto).
226226

227227
## Other formats {#other-formats}
228228

docs/integrations/data-ingestion/data-formats/json/inference.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -90,13 +90,13 @@ SETTINGS describe_compact_output = 1
9090
└────────────────┴─────────────────────────────────────────────────────────────────────────┘
9191
```
9292
:::note Avoid nulls
93-
You can see a lot of the columns are detected as Nullable. We [do not recommend using the Nullable](/sql-reference/data-types/nullable#storage-features) type when not absolutely needed. You can use [schema_inference_make_columns_nullable](/interfaces/schema-inference#schema_inference_make_columns_nullable) to control the behavior of when Nullable is applied.
93+
You can see a lot of the columns are detected as Nullable. We [do not recommend using the Nullable](/sql-reference/data-types/nullable#storage-features) type when not absolutely needed. You can use [schema_inference_make_columns_nullable](/operations/settings/formats#schema_inference_make_columns_nullable) to control the behavior of when Nullable is applied.
9494
:::
9595

9696
We can see that most columns have automatically been detected as `String`, with `update_date` column correctly detected as a `Date`. The `versions` column has been created as an `Array(Tuple(created String, version String))` to store a list of objects, with `authors_parsed` being defined as `Array(Array(String))` for nested arrays.
9797

9898
:::note Controlling type detection
99-
The auto-detection of dates and datetimes can be controlled through the settings [`input_format_try_infer_dates`](/interfaces/schema-inference#input_format_try_infer_dates) and [`input_format_try_infer_datetimes`](/interfaces/schema-inference#input_format_try_infer_datetimes) respectively (both enabled by default). The inference of objects as tuples is controlled by the setting [`input_format_json_try_infer_named_tuples_from_objects`](/operations/settings/formats#input_format_json_try_infer_named_tuples_from_objects). Other settings which control schema inference for JSON, such as the auto-detection of numbers, can be found [here](/interfaces/schema-inference#text-formats).
99+
The auto-detection of dates and datetimes can be controlled through the settings [`input_format_try_infer_dates`](/operations/settings/formats#input_format_try_infer_dates) and [`input_format_try_infer_datetimes`](/operations/settings/formats#input_format_try_infer_datetimes) respectively (both enabled by default). The inference of objects as tuples is controlled by the setting [`input_format_json_try_infer_named_tuples_from_objects`](/operations/settings/formats#input_format_json_try_infer_named_tuples_from_objects). Other settings which control schema inference for JSON, such as the auto-detection of numbers, can be found [here](/interfaces/schema-inference#text-formats).
100100
:::
101101

102102
## Querying JSON {#querying-json}
@@ -183,7 +183,7 @@ ORDER BY update_date
183183
SETTINGS index_granularity = 8192
184184
```
185185

186-
The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/interfaces/schema-inference#input_format_max_rows_to_read_for_schema_inferenceinput_format_max_bytes_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/interfaces/schema-inference#input_format_max_rows_to_read_for_schema_inferenceinput_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection is not correct, users can provide hints as described [here](/interfaces/schema-inference#schema_inference_hints).
186+
The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/interfaces/schema-inference#input_format_max_rows_to_read_for_schema_inferenceinput_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection is not correct, users can provide hints as described [here](/interfaces/schema-inference#schema_inference_hints).
187187

188188
### Creating tables from snippets {#creating-tables-from-snippets}
189189

@@ -272,7 +272,7 @@ FORMAT PrettyJSONEachRow
272272

273273
## Handling errors {#handling-errors}
274274

275-
Sometimes, you might have bad data. For example, specific columns that do not have the right type or an improperly formatted JSON. For this, you can use the setting [`input_format_allow_errors_ratio`](/operations/settings/formats#input_format_allow_errors_ratio) to allow a certain number of rows to be ignored if the data is triggering insert errors. Additionally, [hints](/interfaces/schema-inference#schema_inference_hints) can be provided to assist inference.
275+
Sometimes, you might have bad data. For example, specific columns that do not have the right type or an improperly formatted JSON. For this, you can use the setting [`input_format_allow_errors_ratio`](/operations/settings/formats#input_format_allow_errors_ratio) to allow a certain number of rows to be ignored if the data is triggering insert errors. Additionally, [hints](/operations/settings/formats#schema_inference_hints) can be provided to assist inference.
276276

277277
## Further reading {#further-reading}
278278

docs/integrations/data-ingestion/data-formats/json/schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,7 @@ SELECT JSONExtractString(tags, 'holidays') as holidays FROM people
508508
1 row in set. Elapsed: 0.002 sec.
509509
```
510510

511-
Notice how the functions require both a reference to the `String` column `tags` and a path in the JSON to extract. Nested paths require functions to be nested e.g. `JSONExtractUInt(JSONExtractString(tags, 'car'), 'year')` which extracts the column `tags.car.year`. The extraction of nested paths can be simplified through the functions [JSON_QUERY](/sql-reference/functions/json-functions.md/#json_queryjson-path) AND [JSON_VALUE](/sql-reference/functions/json-functions.md/#json_valuejson-path).
511+
Notice how the functions require both a reference to the `String` column `tags` and a path in the JSON to extract. Nested paths require functions to be nested e.g. `JSONExtractUInt(JSONExtractString(tags, 'car'), 'year')` which extracts the column `tags.car.year`. The extraction of nested paths can be simplified through the functions [JSON_QUERY](/sql-reference/functions/json-functions#json_query) AND [JSON_VALUE](/sql-reference/functions/json-functions#json_value).
512512

513513
Consider the extreme case with the `arxiv` dataset where we consider the entire body to be a `String`.
514514

docs/integrations/data-ingestion/data-formats/parquet.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ DESCRIBE TABLE imported_from_parquet;
125125
└──────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
126126
```
127127

128-
By default, ClickHouse is strict with column names, types, and values. But sometimes, we can skip nonexistent columns or unsupported values during import. This can be managed with [Parquet settings](/interfaces/formats.md/#parquet-format-settings).
128+
By default, ClickHouse is strict with column names, types, and values. But sometimes, we can skip nonexistent columns or unsupported values during import. This can be managed with [Parquet settings](/interfaces/formats/Parquet#format-settings).
129129

130130

131131
## Exporting to Parquet format {#exporting-to-parquet-format}
@@ -146,7 +146,7 @@ FORMAT Parquet
146146
This will create the `export.parquet` file in a working directory.
147147

148148
## ClickHouse and Parquet data types {#clickhouse-and-parquet-data-types}
149-
ClickHouse and Parquet data types are mostly identical but still [differ a bit](/interfaces/formats.md/#data-types-matching-parquet). For example, ClickHouse will export `DateTime` type as a Parquets' `int64`. If we then import that back to ClickHouse, we're going to see numbers ([time.parquet file](assets/time.parquet)):
149+
ClickHouse and Parquet data types are mostly identical but still [differ a bit](/interfaces/formats/Parquet#data-types-matching-parquet). For example, ClickHouse will export `DateTime` type as a Parquets' `int64`. If we then import that back to ClickHouse, we're going to see numbers ([time.parquet file](assets/time.parquet)):
150150

151151
```sql
152152
SELECT * FROM file('time.parquet', Parquet);

docs/integrations/data-ingestion/dbms/jdbc-with-clickhouse.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Using JDBC requires the ClickHouse JDBC bridge, so you will need to use `clickho
1717

1818
**Overview:** The <a href="https://github.com/ClickHouse/clickhouse-jdbc-bridge" target="_blank">ClickHouse JDBC Bridge</a> in combination with the [jdbc table function](/sql-reference/table-functions/jdbc.md) or the [JDBC table engine](/engines/table-engines/integrations/jdbc.md) allows ClickHouse to access data from any external data source for which a <a href="https://en.wikipedia.org/wiki/JDBC_driver" target="_blank">JDBC driver</a> is available:
1919
<img src={require('./images/jdbc-01.png').default} class="image" alt="ClickHouse JDBC Bridge"/>
20-
This is handy when there is no native built-in [integration engine](/engines/table-engines/index.md#integration-engines-integration-engines), table function, or external dictionary for the external data source available, but a JDBC driver for the data source exists.
20+
This is handy when there is no native built-in [integration engine](/engines/table-engines/integrations), table function, or external dictionary for the external data source available, but a JDBC driver for the data source exists.
2121

2222
You can use the ClickHouse JDBC Bridge for both reads and writes. And in parallel for multiple external data sources, e.g. you can run distributed queries on ClickHouse across multiple external and internal data sources in real time.
2323

docs/integrations/data-ingestion/insert-local-files.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ ENGINE = MergeTree
3737
ORDER BY toYYYYMMDD(timestamp)
3838
```
3939

40-
3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions/#lower-lcase). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions/#extractallhaystack-pattern). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
40+
3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions#lower). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions#extractall). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
4141

4242
```bash
4343
clickhouse-client \

docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ The following additional parameters are relevant to using the HTTP Sink with Cli
137137
* `ssl.enabled` - set to true if using SSL.
138138
* `connection.user` - username for ClickHouse.
139139
* `connection.password` - password for ClickHouse.
140-
* `batch.max.size` - The number of rows to send in a single batch. Ensure this set is to an appropriately large number. Per ClickHouse [recommendations](../../../../concepts/why-clickhouse-is-so-fast.md#performance-when-inserting-data) a value of 1000 is should be considered a minimum.
140+
* `batch.max.size` - The number of rows to send in a single batch. Ensure this set is to an appropriately large number. Per ClickHouse [recommendations](/sql-reference/statements/insert-into#performance-considerations) a value of 1000 should be considered a minimum.
141141
* `tasks.max` - The HTTP Sink connector supports running one or more tasks. This can be used to increase performance. Along with batch size this represents your primary means of improving performance.
142142
* `key.converter` - set according to the types of your keys.
143143
* `value.converter` - set based on the type of data on your topic. This data does not need a schema. The format here must be consistent with the FORMAT specified in the parameter `http.api.url`. The simplest here is to use JSON and the org.apache.kafka.connect.json.JsonConverter converter. Treating the value as a string, via the converter org.apache.kafka.connect.storage.StringConverter, is also possible - although this will require the user to extract a value in the insert statement using functions. [Avro format](../../../../interfaces/formats.md#data-format-avro) is also supported in ClickHouse if using the io.confluent.connect.avro.AvroConverter converter.

docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ The following parameters are relevant to using the JDBC connector with ClickHous
5555
* `_connection.url_` - this should take the form of `jdbc:clickhouse://&lt;clickhouse host>:&lt;clickhouse http port>/&lt;target database>`
5656
* `connection.user` - a user with write access to the target database
5757
* `table.name.format`- ClickHouse table to insert data. This must exist.
58-
* `batch.size` - The number of rows to send in a single batch. Ensure this set is to an appropriately large number. Per ClickHouse [recommendations](../../../concepts/why-clickhouse-is-so-fast.md#performance-when-inserting-data) a value of 1000 should be considered a minimum.
58+
* `batch.size` - The number of rows to send in a single batch. Ensure this set is to an appropriately large number. Per ClickHouse [recommendations](/sql-reference/statements/insert-into#performance-considerations) a value of 1000 should be considered a minimum.
5959
* `tasks.max` - The JDBC Sink connector supports running one or more tasks. This can be used to increase performance. Along with batch size this represents your primary means of improving performance.
6060
* `value.converter.schemas.enable` - Set to false if using a schema registry, true if you embed your schemas in the messages.
6161
* `value.converter` - Set according to your datatype e.g. for JSON, `io.confluent.connect.json.JsonSchemaConverter`.

0 commit comments

Comments
 (0)