Skip to content

Benchmark: Add micro-benchmark for Nested Loop Join operator #16819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

2010YOUY01
Copy link
Contributor

Which issue does this PR close?

  • NA

Rationale for this change

Now, NLJ operator still has some room to improve performance and efficiency (less memory consumption), and it has attracted interest from the community (cc @jonathanc-n ) recently.

Inspired by the benchmarks used by @UBarney in #16443 (comment), this PR added a similar micro-benchmark for NLJ into the DF benchmark suite.

What changes are included in this PR?

A new micro-benchmark for NLJ in the benchmark suite (./bench.sh ...)

The queries and the varied query characteristics can be found in the src.

The special (semi/anti/mark) joins are not included, I'm not sure what's the typical workload for those joins.

The bench runner has a validation step to ensure the queries are using NLJ in physical plan.
Also, the optimizer currently does not reorder joins, so the execution order follows the join order in the SQL string. (I wish there were an option to explicitly enforce this behavior.)

Are these changes tested?

I tested it locally:

Bench Run
yongting@Yongtings-MacBook-Pro-2 ~/C/d/benchmarks (nlj-bench *)> ./bench.sh data nlj
***************************
DataFusion Benchmark Runner and Data Generator
COMMAND: data
BENCHMARK: nlj
DATA_DIR: /Users/yongting/Code/datafusion/benchmarks/data
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************
NLJ benchmark does not require data generation

yongting@Yongtings-MacBook-Pro-2 ~/C/d/benchmarks (nlj-bench *)> ./bench.sh run nlj
***************************
DataFusion Benchmark Script
COMMAND: run
BENCHMARK: nlj
QUERY: All
DATAFUSION_DIR: /Users/yongting/Code/datafusion/benchmarks/..
BRANCH_NAME: nlj-bench
DATA_DIR: /Users/yongting/Code/datafusion/benchmarks/data
RESULTS_DIR: /Users/yongting/Code/datafusion/benchmarks/results/nlj-bench
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************
RESULTS_FILE: /Users/yongting/Code/datafusion/benchmarks/results/nlj-bench/nlj.json
Running nlj benchmark...
+ cargo run --release --bin dfbench -- nlj --iterations 5 -o /Users/yongting/Code/datafusion/benchmarks/results/nlj-bench/nlj.json

Compiling ...

Running NLJ benchmarks with the following options: RunOpt {
    query_name: None,
    common: CommonOpt {
        iterations: 5,
        partitions: None,
        batch_size: None,
        mem_pool_type: "fair",
        memory_limit: None,
        sort_spill_reservation_bytes: None,
        debug: false,
    },
    output_path: Some(
        "/Users/yongting/Code/datafusion/benchmarks/results/nlj-bench/nlj.json",
    ),
}

Query q1 iteration 0 returned 100000 rows in 287.247375ms
Query q1 iteration 1 returned 100000 rows in 285.833ms
Query q1 iteration 2 returned 100000 rows in 245.063084ms
Query q1 iteration 3 returned 100000 rows in 206.90325ms
Query q1 iteration 4 returned 100000 rows in 207.072917ms
Query q2 iteration 0 returned 20000000 rows in 254.630083ms
Query q2 iteration 1 returned 20000000 rows in 246.942708ms
Query q2 iteration 2 returned 20000000 rows in 239.448709ms
Query q2 iteration 3 returned 20000000 rows in 240.270583ms
Query q2 iteration 4 returned 20000000 rows in 251.336291ms
Query q3 iteration 0 returned 90000000 rows in 446.120291ms
Query q3 iteration 1 returned 90000000 rows in 453.314375ms
Query q3 iteration 2 returned 90000000 rows in 358.530208ms
Query q3 iteration 3 returned 90000000 rows in 394.261916ms
Query q3 iteration 4 returned 90000000 rows in 453.936083ms
Query q4 iteration 0 returned 180000000 rows in 1.118616083s
Query q4 iteration 1 returned 180000000 rows in 1.037793375s
Query q4 iteration 2 returned 180000000 rows in 952.131541ms
Query q4 iteration 3 returned 180000000 rows in 962.842834ms
Query q4 iteration 4 returned 180000000 rows in 1.056383333s
Query q5 iteration 0 returned 2000000 rows in 572.229083ms
Query q5 iteration 1 returned 2000000 rows in 611.111917ms
Query q5 iteration 2 returned 2000000 rows in 836.5735ms
Query q5 iteration 3 returned 2000000 rows in 622.4575ms
Query q5 iteration 4 returned 2000000 rows in 579.447708ms
Query q6 iteration 0 returned 2000000 rows in 9.371356959s
Query q6 iteration 1 returned 2000000 rows in 6.032997291s
Query q6 iteration 2 returned 2000000 rows in 5.728677125s
Query q6 iteration 3 returned 2000000 rows in 6.046709958s
Query q6 iteration 4 returned 2000000 rows in 5.766419917s
Query q7 iteration 0 returned 2000000 rows in 790.340125ms
Query q7 iteration 1 returned 2000000 rows in 654.001709ms
Query q7 iteration 2 returned 2000000 rows in 860.251ms
Query q7 iteration 3 returned 2000000 rows in 531.644959ms
Query q7 iteration 4 returned 2000000 rows in 525.802541ms
Query q8 iteration 0 returned 2000000 rows in 9.162710916s
Query q8 iteration 1 returned 2000000 rows in 5.64653225s
Query q8 iteration 2 returned 2000000 rows in 5.505889417s
Query q8 iteration 3 returned 2000000 rows in 5.58156175s
Query q8 iteration 4 returned 2000000 rows in 5.635720625s
Query q9 iteration 0 returned 900000 rows in 875.642083ms
Query q9 iteration 1 returned 900000 rows in 655.309166ms
Query q9 iteration 2 returned 900000 rows in 653.490167ms
Query q9 iteration 3 returned 900000 rows in 655.535958ms
Query q9 iteration 4 returned 900000 rows in 655.982292ms
Query q10 iteration 0 returned 810000000 rows in 2.26567725s
Query q10 iteration 1 returned 810000000 rows in 2.690937042s
Query q10 iteration 2 returned 810000000 rows in 3.48998175s
Query q10 iteration 3 returned 810000000 rows in 3.145351041s
Query q10 iteration 4 returned 810000000 rows in 5.294884292s
+ set +x
Done

yongting@Yongtings-MacBook-Pro-2 ~/C/d/benchmarks (nlj-bench *)> ./bench.sh compare nlj-bench nlj-bench
Comparing nlj-bench and nlj-bench
--------------------
--------------------
Benchmark nlj.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃  nlj-bench ┃  nlj-bench ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery q1    │  206.90 ms │  206.90 ms │ no change │
│ QQuery q2    │  239.45 ms │  239.45 ms │ no change │
│ QQuery q3    │  358.53 ms │  358.53 ms │ no change │
│ QQuery q4    │  952.13 ms │  952.13 ms │ no change │
│ QQuery q5    │  572.23 ms │  572.23 ms │ no change │
│ QQuery q6    │ 5728.68 ms │ 5728.68 ms │ no change │
│ QQuery q7    │  525.80 ms │  525.80 ms │ no change │
│ QQuery q8    │ 5505.89 ms │ 5505.89 ms │ no change │
│ QQuery q9    │  653.49 ms │  653.49 ms │ no change │
│ QQuery q10   │ 2265.68 ms │ 2265.68 ms │ no change │
└──────────────┴────────────┴────────────┴───────────┘

Are there any user-facing changes?

Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice start, I was thinking to add a benchmark that runs identical queries using the execution operators for different join algorithms. So this lets us compare NestedLoopJoin performance to others like HJ or SMJ

@jonathanc-n
Copy link
Contributor

Recorded existence join work at #16820

Copy link
Contributor

@UBarney UBarney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2010YOUY01 Thanks for providing such a comprehensive set of benchmark cases. It would be even better if it could also output the memory consumption for each SQL query, just like in this script.

Comment on lines 162 to 172
let query_index = match query_name {
"q1" => 0,
"q2" => 1,
"q3" => 2,
"q4" => 3,
"q5" => 4,
"q6" => 5,
"q7" => 6,
"q8" => 7,
"q9" => 8,
"q10" => 9,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can rewrite it as follows to avoid this match and available_queries.

let query_range = match self.query {
Some(query_id) => query_id..=query_id,
None => IMDB_QUERY_START_ID..=IMDB_QUERY_END_ID,
};
let mut benchmark_run = BenchmarkRun::new();
for query_id in query_range {

let query_range = match self.query {
    Some(query_id) => query_id..=query_id,
    None => 1..=NLJ_QUERIES.len(),
};

for query_id in query_range {
    // ...
    let sql = NLJ_QUERIES[query_id-1];
    // ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a3f5d05

Comment on lines 193 to 196
Err(e) => {
eprintln!("Query {query_name} failed: {e}");
benchmark_run.write_iter(std::time::Duration::from_secs(0), 0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return Err(e) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in a3f5d05

let physical_plan = df.create_physical_plan().await?;
let plan_string = format!("{physical_plan:#?}");

if !plan_string.contains("NestedLoopJoinExec") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍

let start = Instant::now();
let df = ctx.sql(sql).await?;
let batches = df.collect().await?;
let elapsed = start.elapsed(); //.as_secs_f64() * 1000.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's meaning of //.as_secs_f64() * 1000.0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in a3f5d05 to avoid confusion.

@2010YOUY01
Copy link
Contributor Author

@2010YOUY01 Thanks for providing such a comprehensive set of benchmark cases. It would be even better if it could also output the memory consumption for each SQL query, just like in this script.

I tried to do internal memory profiling with rust, it's a bit tricky. Perhaps integrating a external script is easier. @ding-young is currently working on it.

@2010YOUY01
Copy link
Contributor Author

Thank you for the review @UBarney @jonathanc-n

Comment on lines 148 to 152
// return Err(exec_datafusion_err!(
// "Query {} not found. Available queries: 1 to {}",
// query_id,
// NLJ_QUERIES.len()
// ));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can remove this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants