Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpcbench.py add --query support to run custom query #84

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhangxffff
Copy link

  1. add --query argument suport for tpcbench.py to run custom query with tpch tables.
  2. fix scripts in docs/contributing.md

@zhangxffff
Copy link
Author

test in local environment

>> RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpcbench.py --data=tpch --concurrency=2 --batch-size=8182 --worker-pool-min=10 --validate --query 'select c.c_name, sum(o.o_totalprice) as total from orders o inner join customer c on o.o_custkey = c.c_custkey group by c_name order by c_name limit 1' 
Executing custom query:  select c.c_name, sum(o.o_totalprice) as total from orders o inner join customer c on o.o_custkey = c.c_custkey group by c_name order by c_name limit 1
2025-03-12 11:30:36,038 INFO worker.py:1832 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
Registering table customer using path tpch/customer.parquet
Registering table lineitem using path tpch/lineitem.parquet
Registering table nation using path tpch/nation.parquet
Registering table orders using path tpch/orders.parquet
Registering table part using path tpch/part.parquet
Registering table partsupp using path tpch/partsupp.parquet
Registering table region using path tpch/region.parquet
Registering table supplier using path tpch/supplier.parquet
Writing results to datafusion-ray-tpch-1741750238210.json
statements = ['select c.c_name, sum(o.o_totalprice) as total from orders o inner join customer c on o.o_custkey = c.c_custkey group by c_name order by c_name limit 1']
executing  select c.c_name, sum(o.o_totalprice) as total from orders o inner join customer c on o.o_custkey = c.c_custkey group by c_name order by c_name limit 1
+--------------------+-----------+
| c_name             | total     |
+--------------------+-----------+
| Customer#000000001 | 587762.91 |
+--------------------+-----------+
done with query custom query
{
    "engine": "datafusion-ray",
    "benchmark": "tpch",
    "settings": {
        "concurrency": 2,
        "batch_size": 8182,
        "prefetch_buffer_size": 0,
        "partitions_per_worker": null
    },
    "data_path": "tpch",
    "queries": {
        "custom query": 0.19914793968200684
    },
    "validated": {
        "custom query": true
    }
}
benchmark complete. sleeping for 3 seconds for ray to clean up

@zhangxffff
Copy link
Author

Hi @robtandy , Would you mind reviewing this PR when you get a chance? This is the follow-up work we had discussed in #82 . Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant