Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Operation converting columns names to lowercase #3182

Open
swanandx opened this issue Feb 3, 2025 · 0 comments
Open

Write Operation converting columns names to lowercase #3182

swanandx opened this issue Feb 3, 2025 · 0 comments
Assignees
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@swanandx
Copy link

swanandx commented Feb 3, 2025

Environment

Delta-rs version: 0.24.0

Binding: Rust

Environment:

  • Cloud provider: local fs
  • OS: MacOS
  • Other:

Bug

What happened:

This is failing with error Error: Generic("Schema error: No field named status. Valid fields are \"?table?\".\"Status\", \"?table?\".timestamp, \"?table?\".date.")

What you expected to happen:

It should succeed and shouldn't be converting Status to status to check if it exists.

How to reproduce it:

use deltalake::{
    arrow::record_batch::RecordBatch,
    kernel::{DataType, PrimitiveType, StructField},
    operations::{collect_sendable_stream, write::SchemaMode},
    parquet::{
        basic::{Compression, ZstdLevel},
        file::properties::WriterProperties,
    },
    writer::utils::record_batch_from_message,
    DeltaOps, DeltaTable,
};

use std::{collections::HashMap, sync::Arc};

fn get_table_columns() -> Vec<StructField> {
    vec![
        StructField::new(
            String::from("Status"),
            DataType::Primitive(PrimitiveType::String),
            true,
        ),
        StructField::new(
            String::from("timestamp"),
            DataType::Primitive(PrimitiveType::TimestampNtz),
            true,
        ),
        StructField {
            name: String::from("date"),
            data_type: DataType::DATE,
            nullable: true,
            metadata: HashMap::from([(
                "delta.generationExpression".into(),
                "\"CAST(timestamp AS DATE)\"".into(),
            )]),
        },
    ]
}

fn get_table_batches(table: &DeltaTable) -> RecordBatch {
    let values = vec![
        serde_json::json!({"Status": "1", "timestamp": 1738236330}),
        serde_json::json!({"Status": "2", "timestamp": 1738236330}),
        serde_json::json!({"Status": "3", "timestamp": 1738236330}),
    ];

    let arrow_schema = <deltalake::arrow::datatypes::Schema as TryFrom<
        &deltalake::kernel::StructType,
    >>::try_from(table.schema().expect("failed to get schema"))
    .expect("Failed to convert to arrow schema");
    let arrow_schema_ref = Arc::new(arrow_schema);

    record_batch_from_message(arrow_schema_ref, &values).unwrap()
}

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), deltalake::errors::DeltaTableError> {
    env_logger::init();
    let ops = if let Ok(table_uri) = std::env::var("TABLE_URI") {
        DeltaOps::try_from_uri(table_uri).await?
    } else {
        DeltaOps::new_in_memory()
    };

    let mut table = ops
        .create()
        .with_columns(get_table_columns())
        .with_partition_columns(["date"])
        .with_table_name("my_table")
        .with_comment("A table to show how delta-rs works")
        .await?;

    table.load().await.unwrap();
    let writer_properties = WriterProperties::builder()
        .set_compression(Compression::ZSTD(ZstdLevel::try_new(3).unwrap()))
        .build();
    let batch = get_table_batches(&table);
    let table = DeltaOps(table)
        .write(vec![batch.clone()])
        .with_schema_mode(SchemaMode::Merge)
        // .with_partition_columns(["date"])
        .with_writer_properties(writer_properties)
        .await?;

    let (_table, stream) = DeltaOps(table).load().await?;
    let data: Vec<RecordBatch> = collect_sendable_stream(stream).await?;

    println!("{:?}", data);

    Ok(())
}

More details:

@swanandx swanandx added the bug Something isn't working label Feb 3, 2025
@rtyler rtyler added the binding/rust Issues for the Rust crate label Feb 3, 2025
@rtyler rtyler self-assigned this Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants