Skip to content

Some TPC-DS queries fail to get column statistics #1869

@rampage644

Description

@rampage644

Certain TPC-DS queries seemingly succeed but still generate warning in the logs:

message":"Failed to convert min values to array for column ss_sales_price

The error comes from here gttps://github.com/apache/datafusion/blob/b1deb1f376793f3d7c0d65859886fdff99016f25/datafusion/common/src/pruning.rs#L316

Some queries to reproduce:

SELECT sum(ss_quantity) FROM store_sales, store, customer_demographics, customer_address, date_dim WHERE s_store_sk = ss_store_sk AND ss_sold_date_sk = d_date_sk AND d_year = 2001 AND ((cd_demo_sk = ss_cdemo_sk AND cd_marital_status = 'M' AND cd_education_status = '4 yr Degree' AND ss_sales_price BETWEEN 100.00 AND 150.00) OR (cd_demo_sk = ss_cdemo_sk AND cd_marital_status = 'D' AND cd_education_status = '2 yr Degree' AND ss_sales_price BETWEEN 50.00 AND 100.00) OR (cd_demo_sk = ss_cdemo_sk AND cd_marital_status = 'S' AND cd_education_status = 'College' AND ss_sales_price BETWEEN 150.00 AND 200.00)) AND ((ss_addr_sk = ca_address_sk AND ca_country = 'United States' AND ca_state IN ('CO', 'OH', 'TX') AND ss_net_profit BETWEEN 0 AND 2000) OR (ss_addr_sk = ca_address_sk AND ca_country = 'United States' AND ca_state IN ('OR', 'MN', 'KY') AND ss_net_profit BETWEEN 150 AND 3000) OR (ss_addr_sk = ca_address_sk AND ca_country = 'United States' AND ca_state IN ('VA', 'CA', 'MS') AND ss_net_profit BETWEEN 50 AND 25000
SELECT SUBSTR(r.r_reason_desc, 1, 20) AS reason_prefix, avg(ws.ws_quantity) AS avg_ws_quantity, avg(wr.wr_refunded_cash) AS avg_wr_refunded_cash, avg(wr.wr_fee) AS avg_wr_fee FROM web_sales AS ws JOIN web_returns AS wr ON ws.ws_item_sk = wr.wr_item_sk AND ws.ws_order_number = wr.wr_order_number JOIN web_page AS wp ON ws.ws_web_page_sk = wp.wp_web_page_sk JOIN customer_demographics AS cd1 ON cd1.cd_demo_sk = wr.wr_refunded_cdemo_sk JOIN customer_demographics AS cd2 ON cd2.cd_demo_sk = wr.wr_returning_cdemo_sk JOIN customer_address AS ca ON ca.ca_address_sk = wr.wr_refunded_addr_sk JOIN date_dim AS d ON ws.ws_sold_date_sk = d.d_date_sk JOIN reason AS r ON r.r_reason_sk = wr.wr_reason_sk WHERE d.d_year = 2000 AND ((cd1.cd_marital_status = 'M' AND cd1.cd_marital_status = cd2.cd_marital_status AND cd1.cd_education_status = 'Advanced Degree' AND cd1.cd_education_status = cd2.cd_education_status AND ws.ws_sales_price BETWEEN 100.00 AND 150.00) OR (cd1.cd_marital_status = 'S' AND cd1.cd_marital_status = cd2.cd_marital_status AND cd1.cd_education_status = 'College' AND cd1.cd_education_status = cd2.cd_education_status AND ws.ws_sales_price BETWEEN 50.00 AND 100.00) OR (cd1.cd_marital_status = 'W' AND cd1.cd_marital_status = cd2.cd_marital_status AND cd1.cd_education_status = '2 yr Degree' AND cd1.cd_education_status = cd2.cd_education_status AND ws.ws_sales_price BETWEEN 150.00 AND 200.00)) AND ((ca.ca_country = 'United States' AND ca.ca_state IN ('IN', 'OH', 'NJ') AND ws.ws_net_profit BETWEEN 100 AND 200) OR (ca.ca_country = 'United States' AND ca.ca_state IN ('WI', 'CT', 'KY') AND ws.ws_net_profit BETWEEN 150 AND 300) OR (ca.ca_country = 'United States' AND ca.ca_state IN ('LA', 'IA', 'AR') AND ws.ws_net_profit BETWEEN 50 AND 250)) GROUP BY r.r_reason_desc ORDER BY SUBSTR(r.r_reason_desc, 1, 20), avg(ws.ws_quantity), avg(wr.wr_refunded_cash), avg(wr.wr_fee)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions