(also provided Traditional Chinese version document README-CH.md.)
Build a Lakehouse architecture with Iceberg, which provides powerful schema evolution, ACID transactions, time travel, and multi-engine compatibility, enabling the data lake to have data warehouse-level management and query capabilities.
Provides a deployment guide for Spark + Iceberg REST + MinIO, integrating Apache Doris, covering setup, SQL operations, and schema-free data migration.
- Table Format: Iceberg v1.8.1
- Compute Engine: Spark v3.5.2
- Database: Doris v2.1.1.8
- Responsible for handling queries and data operations
- Relies on Iceberg REST for metadata management
- Reads and writes data to MinIO storage
- Manages metadata for Iceberg tables
- Interacts with MinIO to store Iceberg data
- Object storage (similar to S3)
- Stores Iceberg data and metadata
- The final location where data is stored
- MinIO reads and writes data through Volumes
docker exec -it spark-iceberg /bin/bashspark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
--conf spark.sql.catalog.spark_catalog.type=hive \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.local.type=hadoop \
--conf spark.sql.catalog.local.warehouse=/home/iceberg/warehouse \
--conf spark.sql.defaultCatalog=localCREATE TABLE demo.nyc.taxis
(
vendor_id bigint,
trip_id bigint,
trip_distance float,
fare_amount double,
store_and_fwd_flag string
)
PARTITIONED BY (vendor_id);INSERT INTO demo.nyc.taxis
VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');SELECT * FROM demo.nyc.taxis;Connect to Iceberg from Doris.
SELECT * FROM CATALOGS();DROP CATALOG IF EXISTS iceberg_catalog;aws.region is invalid parameter.
CREATE CATALOG iceberg_catalog
PROPERTIES (
"type"="iceberg",
"iceberg.catalog.type"="rest",
"uri"="{ICEBERG_IP}:8181",
"s3.endpoint"="{ICEBERG_IP}:9000",
"s3.access_key"="admin",
"s3.secret_key"="password",
"s3.region"="us-east-1"
);SHOW DATABASES FROM iceberg_catalog;SHOW TABLES FROM iceberg_catalog.nyc;SELECT * FROM iceberg_catalog.nyc.taxis;Move the Doris table to [catalog].[database].[new_table_name] without pre-creating the schema.
CREATE TABLE iceberg_catalog.database.table AS SELECT * FROM database.table;