re-org the migrating from other database to databend (#2114)

BohuTANG · web-flow · commit e7f7b194d594 · 2025-05-16T20:26:25.000+08:00
* add tutorials/migrate/index.md

* add tutorials/migrate/migrating-from-mysql-with-kafka-connect.md

* change the migrating title
diff --git a/docs/en/tutorials/migrate/index.md b/docs/en/tutorials/migrate/index.md
@@ -0,0 +1,49 @@
+---
+title: Data Migration to Databend
+---
+
+# Data Migration to Databend
+
+Select your source database and migration requirements to find the most suitable migration method to Databend.
+
+## MySQL to Databend
+
+### When to Choose Real-time Migration (CDC)
+
+> **Recommendation**: For real-time migration, we recommend **Debezium** as the default choice.
+
+- You need continuous data synchronization with minimal latency
+- You need to capture all data changes (inserts, updates, deletes)
+
+| Tool | Capabilities | Best For | Choose When |
+|------|------------|----------|-------------|
+| [Debezium](/tutorials/migrate/migrating-from-mysql-with-debezium) | CDC, Full Load | Capturing row-level changes with minimal latency | You need complete CDC with all DML operations (INSERT/UPDATE/DELETE); You want binlog-based replication for minimal impact on source database |
+| [Flink CDC](/tutorials/migrate/migrating-from-mysql-with-flink-cdc) | CDC, Full Load, Transformation | Complex ETL with real-time transformation | You need to filter or transform data during migration; You need a scalable processing framework; You want SQL-based transformation capabilities |
+| [Kafka Connect](/tutorials/migrate/migrating-from-mysql-with-kafka-connect) | CDC, Incremental, Full Load | Existing Kafka infrastructure | You already use Kafka; You need simple configuration; You can use timestamp or auto-increment columns for incremental sync |
+
+### When to Choose Batch Migration
+
+> **Recommendation**: For batch migration, we recommend **db-archiver** as the default choice.
+
+- You need one-time or scheduled data transfers
+- You have large volumes of historical data to migrate
+- You don't need real-time synchronization
+
+| Tool | Capabilities | Best For | Choose When |
+|------|------------|----------|-------------|
+| [db-archiver](/tutorials/migrate/migrating-from-mysql-with-db-archiver) | Full Load, Incremental | Efficient historical data archiving | You have time-partitioned data; You need to archive historical data; You want a lightweight, focused tool |
+| [DataX](/tutorials/migrate/migrating-from-mysql-with-datax) | Full Load, Incremental | High-performance large dataset transfers | You need high throughput for large datasets; You want parallel processing capabilities; You need a mature, widely-used tool |
+| [Addax](/tutorials/migrate/migrating-from-mysql-with-addax) | Full Load, Incremental | Enhanced DataX with better performance | You need better error handling than DataX; You want improved monitoring capabilities; You need more recent updates and features |
+
+## Snowflake to Databend
+
+### When to Choose Snowflake Migration
+
+| Tool | Capabilities | Best For | Choose When |
+|------|------------|----------|-------------|
+| [Snowflake Migration](/tutorials/migrate/migrating-from-snowflake) | Full Load | Complete data warehouse transition | You need to migrate your entire Snowflake warehouse; You want to use Parquet format for efficient data transfer; You need to maintain schema compatibility between systems |
+
+## Related Topics
+
+- [Loading Data](/guides/load-data/)
+- [Unloading Data](/guides/unload-data/)
diff --git a/docs/en/tutorials/migrate/migrating-from-mysql-with-addax.md b/docs/en/tutorials/migrate/migrating-from-mysql-with-addax.md
@@ -1,7 +1,10 @@
 ---
 title: Migrating from MySQL with Addax
+sidebar_label: 'MySQL → Databend: Addax'
 ---
 
+> **Capabilities**: Full Load, Incremental
+
 In this tutorial, you will load data from MySQL to Databend with Addax. Before you start, make sure you have successfully set up Databend, MySQL, and Addax in your environment.
 
 1. In MySQL, create a SQL user that you will use for data loading and then create a table and populate it with sample data.
diff --git a/docs/en/tutorials/migrate/migrating-from-mysql-with-datax.md b/docs/en/tutorials/migrate/migrating-from-mysql-with-datax.md
@@ -1,7 +1,10 @@
 ---
 title: Migrating from MySQL with DataX
+sidebar_label: 'MySQL → Databend: DataX'
 ---
 
+> **Capabilities**: Full Load, Incremental
+
 In this tutorial, you will load data from MySQL to Databend with DataX. Before you start, make sure you have successfully set up Databend, MySQL, and DataX in your environment.
 
 1. In MySQL, create a SQL user that you will use for data loading and then create a table and populate it with sample data.
diff --git a/docs/en/tutorials/migrate/migrating-from-mysql-with-db-archiver.md b/docs/en/tutorials/migrate/migrating-from-mysql-with-db-archiver.md
@@ -1,7 +1,11 @@
 ---
 title: Migrating from MySQL with db-archiver
+sidebar_label: 'MySQL → Databend: db-archiver'
 ---
 
+> **Capabilities**: Full Load, Incremental  
+> **✅ Recommended** for batch migration of historical data
+
 In this tutorial, we'll walk you through the process of migrating from MySQL to Databend Cloud using db-archiver.
 
 ## Before You Start
diff --git a/docs/en/tutorials/migrate/migrating-from-mysql-with-debezium.md b/docs/en/tutorials/migrate/migrating-from-mysql-with-debezium.md
@@ -1,7 +1,11 @@
 ---
 title: Migrating from MySQL with Debezium
+sidebar_label: 'MySQL → Databend: Debezium'
 ---
 
+> **Capabilities**: CDC, Full Load  
+> **✅ Recommended** for real-time migration with complete change data capture
+
 In this tutorial, you will load data from MySQL to Databend with Debezium. Before you start, make sure you have successfully set up Databend, MySQL, and Debezium in your environment.
 
 ## Step 1. Prepare Data in MySQL
diff --git a/docs/en/tutorials/migrate/migrating-from-mysql-with-flink-cdc.md b/docs/en/tutorials/migrate/migrating-from-mysql-with-flink-cdc.md
@@ -1,7 +1,10 @@
 ---
 title: Migrating from MySQL with Flink CDC
+sidebar_label: 'MySQL → Databend: Flink CDC'
 ---
 
+> **Capabilities**: CDC, Full Load, Transformation
+
 In this tutorial, we'll walk you through the process of migrating from MySQL to Databend Cloud using Apache Flink CDC.
 
 ## Before You Start
diff --git a/docs/en/tutorials/migrate/migrating-from-mysql-with-kafka-connect.md b/docs/en/tutorials/migrate/migrating-from-mysql-with-kafka-connect.md
@@ -0,0 +1,230 @@
+---
+title: Migrating from MySQL with Kafka Connect
+sidebar_label: 'MySQL → Databend: Kafka Connect'
+---
+
+> **Capabilities**: CDC, Incremental, Full Load
+
+This tutorial shows how to build a real-time data pipeline from MySQL to Databend using Kafka Connect.
+
+## Overview
+
+Kafka Connect is a tool for streaming data between Apache Kafka and other systems reliably and at scale. It simplifies building real-time data pipelines by standardizing data movement in and out of Kafka. For MySQL to Databend migration, Kafka Connect provides a seamless solution that enables:
+
+- Real-time data synchronization from MySQL to Databend
+- Automatic schema evolution and table creation
+- Support for both new data capture and updates to existing data
+
+The migration pipeline consists of two main components:
+
+- **MySQL JDBC Source Connector**: Reads data from MySQL and publishes it to Kafka topics
+- **Databend Sink Connector**: Consumes data from Kafka topics and writes it to Databend
+
+## Prerequisites
+
+- MySQL database with data you want to migrate
+- Apache Kafka installed ([Kafka quickstart guide](https://kafka.apache.org/quickstart))
+- Databend instance running
+- Basic knowledge of SQL and command line
+
+## Step 1: Set Up Kafka Connect
+
+Kafka Connect supports two execution modes: Standalone and Distributed. For this tutorial, we'll use Standalone mode which is simpler for testing.
+
+### Configure Kafka Connect
+
+Create a basic worker configuration file `connect-standalone.properties` in your Kafka `config` directory:
+
+```properties
+bootstrap.servers=localhost:9092
+key.converter=org.apache.kafka.connect.json.JsonConverter
+value.converter=org.apache.kafka.connect.json.JsonConverter
+key.converter.schemas.enable=true
+value.converter.schemas.enable=true
+offset.storage.file.filename=/tmp/connect.offsets
+offset.flush.interval.ms=10000
+```
+
+## Step 2: Configure MySQL Source Connector
+
+### Install Required Components
+
+1. Download the [Kafka Connect JDBC](https://www.confluent.io/hub/confluentinc/kafka-connect-jdbc) plugin from Confluent Hub and extract it to your Kafka `libs` directory
+
+2. Download the [MySQL JDBC Driver](https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.32/) and copy the JAR file to the same `libs` directory
+
+### Create MySQL Source Configuration
+
+Create a file `mysql-source.properties` in your Kafka `config` directory with the following content:
+
+```properties
+name=mysql-source
+connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
+tasks.max=1
+
+# Connection settings
+connection.url=jdbc:mysql://localhost:3306/your_database?useSSL=false
+connection.user=your_username
+connection.password=your_password
+
+# Table selection and topic mapping
+table.whitelist=your_database.your_table
+topics=mysql_data
+
+# Sync mode configuration
+mode=incrementing
+incrementing.column.name=id
+
+# Polling frequency
+poll.interval.ms=5000
+```
+
+Replace the following values with your actual MySQL configuration:
+- `your_database`: Your MySQL database name
+- `your_username`: MySQL username
+- `your_password`: MySQL password
+- `your_table`: The table you want to migrate
+
+### Sync Modes
+
+The MySQL Source Connector supports three synchronization modes:
+
+1. **Incrementing Mode**: Best for tables with an auto-incrementing ID column
+   ```properties
+   mode=incrementing
+   incrementing.column.name=id
+   ```
+
+2. **Timestamp Mode**: Best for capturing both inserts and updates
+   ```properties
+   mode=timestamp
+   timestamp.column.name=updated_at
+   ```
+
+3. **Timestamp+Incrementing Mode**: Most reliable for all changes
+   ```properties
+   mode=timestamp+incrementing
+   incrementing.column.name=id
+   timestamp.column.name=updated_at
+   ```
+
+## Step 3: Configure Databend Sink Connector
+
+### Install Required Components
+
+1. Download the [Databend Kafka Connector](https://github.com/databendcloud/databend-kafka-connect/releases) and place it in your Kafka `libs` directory
+
+2. Download the [Databend JDBC Driver](https://central.sonatype.com/artifact/com.databend/databend-jdbc/) and copy it to your Kafka `libs` directory
+
+### Create Databend Sink Configuration
+
+Create a file `databend-sink.properties` in your Kafka `config` directory:
+
+```properties
+name=databend-sink
+connector.class=com.databend.kafka.connect.DatabendSinkConnector
+
+# Connection settings
+connection.url=jdbc:databend://localhost:8000
+connection.user=databend
+connection.password=databend
+connection.database=default
+
+# Topic to table mapping
+topics=mysql_data
+table.name.format=${topic}
+
+# Table management
+auto.create=true
+auto.evolve=true
+
+# Write behavior
+insert.mode=upsert
+pk.mode=record_value
+pk.fields=id
+batch.size=1000
+```
+
+Adjust the Databend connection settings as needed for your environment.
+
+## Step 4: Start the Migration Pipeline
+
+Start Kafka Connect with both connector configurations:
+
+```shell
+bin/connect-standalone.sh config/connect-standalone.properties \
+    config/mysql-source.properties \
+    config/databend-sink.properties
+```
+
+## Step 5: Verify the Migration
+
+### Check Data Synchronization
+
+1. **Monitor Kafka Connect Logs**
+
+   ```shell
+   tail -f /path/to/kafka/logs/connect.log
+   ```
+
+2. **Verify Data in Databend**
+
+   Connect to your Databend instance and run:
+
+   ```sql
+   SELECT * FROM mysql_data LIMIT 10;
+   ```
+
+### Test Schema Evolution
+
+If you add a new column to your MySQL table, the schema change will automatically propagate to Databend:
+
+1. **Add a column in MySQL**
+
+   ```sql
+   ALTER TABLE your_table ADD COLUMN new_field VARCHAR(100);
+   ```
+
+2. **Verify schema update in Databend**
+
+   ```sql
+   DESC mysql_data;
+   ```
+
+### Test Update Operations
+
+To test updates, ensure you're using timestamp or timestamp+incrementing mode:
+
+1. **Update your MySQL connector configuration**
+
+   Edit `mysql-source.properties` to use timestamp+incrementing mode if your table has a timestamp column.
+
+2. **Update data in MySQL**
+
+   ```sql
+   UPDATE your_table SET some_column='new value' WHERE id=1;
+   ```
+
+3. **Verify the update in Databend**
+
+   ```sql
+   SELECT * FROM mysql_data WHERE id=1;
+   ```
+
+## Key Features of Databend Kafka Connect
+
+1. **Automatic Table and Column Creation**: With `auto.create` and `auto.evolve` settings, tables and columns are created automatically based on Kafka topic data
+
+2. **Schema Support**: Supports Avro, JSON Schema, and Protobuf input data formats (requires Schema Registry)
+
+3. **Multiple Write Modes**: Supports both `insert` and `upsert` write modes
+
+4. **Multi-task Support**: Can run multiple tasks to improve performance
+
+5. **High Availability**: In distributed mode, workload is automatically balanced with dynamic scaling and fault tolerance
+
+## Troubleshooting
+
+- **Connector Not Starting**: Check Kafka Connect logs for errors
+- **No Data in Databend**: Verify topic exists and contains data using Kafka console consumer
+- **Schema Issues**: Ensure `auto.create` and `auto.evolve` are set to `true`
diff --git a/docs/en/tutorials/migrate/migrating-from-snowflake.md b/docs/en/tutorials/migrate/migrating-from-snowflake.md
@@ -1,7 +1,11 @@
 ---
 title: Migrating from Snowflake
+sidebar_label: Snowflake → Databend
+
 ---
 
+> **Capabilities**: Full Load
+
 In this tutorial, we'll walk you through the process of exporting data from Snowflake in Parquet format to an Amazon S3 bucket, and then loading it into Databend Cloud.
 
 ## Before You Start