Skip to content

Commit b89e89c

Browse files
authored
add FAQs about collation for JDBC connections (#20848)
1 parent af3e688 commit b89e89c

File tree

3 files changed

+88
-1
lines changed

3 files changed

+88
-1
lines changed

develop/dev-guide-sample-application-java-jdbc.md

+15-1
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,23 @@ In this tutorial, you can learn how to use TiDB and JDBC to accomplish the follo
1414
- Connect to your TiDB cluster using JDBC.
1515
- Build and run your application. Optionally, you can find [sample code snippets](#sample-code-snippets) for basic CRUD operations.
1616

17+
<CustomContent platform="tidb">
18+
1719
> **Note:**
1820
>
19-
> This tutorial works with TiDB Cloud Serverless, TiDB Cloud Dedicated, and TiDB Self-Managed.
21+
> - This tutorial works with TiDB Cloud Serverless, TiDB Cloud Dedicated, and TiDB Self-Managed.
22+
> - Starting from TiDB v7.4, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the collation used in a JDBC connection depends on the JDBC driver version. For more information, see [Collation used in JDBC connections](/faq/sql-faq.md#collation-used-in-jdbc-connections).
23+
24+
</CustomContent>
25+
26+
<CustomContent platform="tidb-cloud">
27+
28+
> **Note:**
29+
>
30+
> - This tutorial works with TiDB Cloud Serverless, TiDB Cloud Dedicated, and TiDB Self-Managed.
31+
> - Starting from TiDB v7.4, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the collation used in a JDBC connection depends on the JDBC driver version. For more information, see [Collation used in JDBC connections](https://docs.pingcap.com/tidb/stable/sql-faq#collation-used-in-jdbc-connections).
32+
33+
</CustomContent>
2034

2135
## Prerequisites
2236

faq/sql-faq.md

+67
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,73 @@ Whether your cluster is a new cluster or an upgraded cluster from an earlier ver
337337
- If the owner does not exist, try manually triggering owner election with: `curl -X POST http://{TiDBIP}:10080/ddl/owner/resign`.
338338
- If the owner exists, export the Goroutine stack and check for the possible stuck location.
339339

340+
## Collation used in JDBC connections
341+
342+
This section lists questions related to collations used in JDBC connections. For information about character sets and collations supported by TiDB, see [Character Set and Collation](/character-set-and-collation.md).
343+
344+
### What collation is used in a JDBC connection when `connectionCollation` is not configured in the JDBC URL?
345+
346+
When `connectionCollation` is not configured in the JDBC URL, there are two scenarios:
347+
348+
**Scenario 1**: Neither `connectionCollation` nor `characterEncoding` is configured in the JDBC URL
349+
350+
- For Connector/J 8.0.25 and earlier versions, the JDBC driver attempts to use the server's default character set. Because the default character set of TiDB is `utf8mb4`, the driver uses `utf8mb4_bin` as the connection collation.
351+
- For Connector/J 8.0.26 and later versions, the JDBC driver uses the `utf8mb4` character set and automatically selects the collation based on the return value of `SELECT VERSION()`.
352+
353+
- When the return value is less than `8.0.1`, the driver uses `utf8mb4_general_ci` as the connection collation. TiDB follows the driver and uses `utf8mb4_general_ci` as the collation.
354+
- When the return value is greater than or equal to `8.0.1`, the driver uses `utf8mb4_0900_ai_ci` as the connection collation. TiDB v7.4.0 and later versions follow the driver and use `utf8mb4_0900_ai_ci` as the collation, while TiDB versions earlier than v7.4.0 fall back to using the default collation `utf8mb4_bin` because the `utf8mb4_0900_ai_ci` collation is not supported in these versions.
355+
356+
**Scenario 2**: `characterEncoding=utf8` is configured in the JDBC URL but `connectionCollation` is not configured. The JDBC driver uses the `utf8mb4` character set according to the mapping rules. The collation is determined according to the rules described in scenario 1.
357+
358+
### How to handle collation changes after upgrading TiDB?
359+
360+
In TiDB v7.4 and earlier versions, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the TiDB [`collation_connection`](/system-variables.md#collation_connection) variable defaults to the `utf8mb4_bin` collation.
361+
362+
Starting from TiDB v7.4, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the value of the [`collation_connection`](/system-variables.md#collation_connection) variable depends on the JDBC driver version. For example, for Connector/J 8.0.26 and later versions, the JDBC driver defaults to the `utf8mb4` character set and uses `utf8mb4_general_ci` as the connection collation. TiDB follows the driver, and the [`collation_connection`](/system-variables.md#collation_connection) variable uses the `utf8mb4_0900_ai_ci` collation. For more information, see [Collation used in JDBC connections](#what-collation-is-used-in-a-jdbc-connection-when-connectioncollation-is-not-configured-in-the-jdbc-url).
363+
364+
When upgrading from an earlier version to v7.4 or later (for example, from v6.5 to v7.5), if you need to maintain the `collation_connection` as `utf8mb4_bin` for JDBC connections, it is recommended to configure the `connectionCollation` parameter in the JDBC URL.
365+
366+
The following is a common JDBC URL configuration in TiDB v6.5:
367+
368+
```
369+
spring.datasource.url=JDBC:mysql://{TiDBIP}:{TiDBPort}/{DBName}?characterEncoding=UTF-8&useSSL=false&useServerPrepStmts=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=1000&useConfigs=maxPerformance&rewriteBatchedStatements=true&defaultfetchsize=-2147483648&allowMultiQueries=true
370+
```
371+
372+
After upgrading to TiDB v7.5 or a later version, it is recommended to configure the `connectionCollation` parameter in the JDBC URL:
373+
374+
```
375+
spring.datasource.url=JDBC:mysql://{TiDBIP}:{TiDBPort}/{DBName}?characterEncoding=UTF-8&connectionCollation=utf8mb4_bin&useSSL=false&useServerPrepStmts=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=1000&useConfigs=maxPerformance&rewriteBatchedStatements=true&defaultFetchSize=-2147483648&allowMultiQueries=true
376+
```
377+
378+
### What are the differences between the `utf8mb4_bin` and `utf8mb4_0900_ai_ci` collations?
379+
380+
| Collation | Case-sensitive | Ignore trailing spaces | Accent-sensitive | Comparison method |
381+
|----------------------|----------------|------------------|--------------|------------------------|
382+
| `utf8mb4_bin` | Yes | Yes | Yes | Compare binary values |
383+
| `utf8mb4_0900_ai_ci` | No | No | No | Use Unicode sorting algorithm |
384+
385+
For example:
386+
387+
```sql
388+
-- utf8mb4_bin is case-sensitive
389+
SELECT 'apple' = 'Apple' COLLATE utf8mb4_bin; -- Returns 0 (FALSE)
390+
391+
-- utf8mb4_0900_ai_ci is case-insensitive
392+
SELECT 'apple' = 'Apple' COLLATE utf8mb4_0900_ai_ci; -- Returns 1 (TRUE)
393+
394+
-- utf8mb4_bin ignores trailing spaces
395+
SELECT 'Apple ' = 'Apple' COLLATE utf8mb4_bin; -- Returns 1 (TRUE)
396+
397+
-- utf8mb4_0900_ai_ci does not ignore trailing spaces
398+
SELECT 'Apple ' = 'Apple' COLLATE utf8mb4_0900_ai_ci; -- Returns 0 (FALSE)
399+
400+
-- utf8mb4_bin is accent-sensitive
401+
SELECT 'café' = 'cafe' COLLATE utf8mb4_bin; -- Returns 0 (FALSE)
402+
403+
-- utf8mb4_0900_ai_ci is accent-insensitive
404+
SELECT 'café' = 'cafe' COLLATE utf8mb4_0900_ai_ci; -- Returns 1 (TRUE)
405+
```
406+
340407
## SQL optimization
341408
342409
### TiDB execution plan description

faq/upgrade-faq.md

+6
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,12 @@ It is not recommended to upgrade TiDB using the binary. Instead, it is recommend
3636

3737
This section lists some FAQs and their solutions after you upgrade TiDB.
3838

39+
### The collation in JDBC connections changes after upgrading TiDB
40+
41+
When upgrading from an earlier version to v7.4 or later, if the `connectionCollation` is not configured, and the `characterEncoding` is either not configured or configured as `UTF-8` in the JDBC URL, the default collation in your JDBC connections might change from `utf8mb4_bin` to `utf8mb4_0900_ai_ci` after upgrading. If you need to maintain the collation as `utf8mb4_bin`, configure `connectionCollation=utf8mb4_bin` in the JDBC URL.
42+
43+
For more information, see [Collation used in JDBC connections](/faq/sql-faq.md#collation-used-in-jdbc-connections).
44+
3945
### The character set (charset) errors when executing DDL operations
4046

4147
In v2.1.0 and earlier versions (including all versions of v2.0), the character set of TiDB is UTF-8 by default. But starting from v2.1.1, the default character set has been changed into UTF8MB4.

0 commit comments

Comments
 (0)