Benchmark: Add micro-benchmark for Nested Loop Join operator #16819

2010YOUY01 · 2025-07-18T11:28:40Z

Which issue does this PR close?

NA

Rationale for this change

Now, NLJ operator still has some room to improve performance and efficiency (less memory consumption), and it has attracted interest from the community (cc @jonathanc-n ) recently.

Inspired by the benchmarks used by @UBarney in #16443 (comment), this PR added a similar micro-benchmark for NLJ into the DF benchmark suite.

What changes are included in this PR?

A new micro-benchmark for NLJ in the benchmark suite (./bench.sh ...)

The queries and the varied query characteristics can be found in the src.

The special (semi/anti/mark) joins are not included, I'm not sure what's the typical workload for those joins.

The bench runner has a validation step to ensure the queries are using NLJ in physical plan.
Also, the optimizer currently does not reorder joins, so the execution order follows the join order in the SQL string. (I wish there were an option to explicitly enforce this behavior.)

Are these changes tested?

I tested it locally:

Bench Run

yongting@Yongtings-MacBook-Pro-2 ~/C/d/benchmarks (nlj-bench *)> ./bench.sh data nlj
***************************
DataFusion Benchmark Runner and Data Generator
COMMAND: data
BENCHMARK: nlj
DATA_DIR: /Users/yongting/Code/datafusion/benchmarks/data
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************
NLJ benchmark does not require data generation

yongting@Yongtings-MacBook-Pro-2 ~/C/d/benchmarks (nlj-bench *)> ./bench.sh run nlj
***************************
DataFusion Benchmark Script
COMMAND: run
BENCHMARK: nlj
QUERY: All
DATAFUSION_DIR: /Users/yongting/Code/datafusion/benchmarks/..
BRANCH_NAME: nlj-bench
DATA_DIR: /Users/yongting/Code/datafusion/benchmarks/data
RESULTS_DIR: /Users/yongting/Code/datafusion/benchmarks/results/nlj-bench
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************
RESULTS_FILE: /Users/yongting/Code/datafusion/benchmarks/results/nlj-bench/nlj.json
Running nlj benchmark...
+ cargo run --release --bin dfbench -- nlj --iterations 5 -o /Users/yongting/Code/datafusion/benchmarks/results/nlj-bench/nlj.json

Compiling ...

Running NLJ benchmarks with the following options: RunOpt {
    query_name: None,
    common: CommonOpt {
        iterations: 5,
        partitions: None,
        batch_size: None,
        mem_pool_type: "fair",
        memory_limit: None,
        sort_spill_reservation_bytes: None,
        debug: false,
    },
    output_path: Some(
        "/Users/yongting/Code/datafusion/benchmarks/results/nlj-bench/nlj.json",
    ),
}

Query q1 iteration 0 returned 100000 rows in 287.247375ms
Query q1 iteration 1 returned 100000 rows in 285.833ms
Query q1 iteration 2 returned 100000 rows in 245.063084ms
Query q1 iteration 3 returned 100000 rows in 206.90325ms
Query q1 iteration 4 returned 100000 rows in 207.072917ms
Query q2 iteration 0 returned 20000000 rows in 254.630083ms
Query q2 iteration 1 returned 20000000 rows in 246.942708ms
Query q2 iteration 2 returned 20000000 rows in 239.448709ms
Query q2 iteration 3 returned 20000000 rows in 240.270583ms
Query q2 iteration 4 returned 20000000 rows in 251.336291ms
Query q3 iteration 0 returned 90000000 rows in 446.120291ms
Query q3 iteration 1 returned 90000000 rows in 453.314375ms
Query q3 iteration 2 returned 90000000 rows in 358.530208ms
Query q3 iteration 3 returned 90000000 rows in 394.261916ms
Query q3 iteration 4 returned 90000000 rows in 453.936083ms
Query q4 iteration 0 returned 180000000 rows in 1.118616083s
Query q4 iteration 1 returned 180000000 rows in 1.037793375s
Query q4 iteration 2 returned 180000000 rows in 952.131541ms
Query q4 iteration 3 returned 180000000 rows in 962.842834ms
Query q4 iteration 4 returned 180000000 rows in 1.056383333s
Query q5 iteration 0 returned 2000000 rows in 572.229083ms
Query q5 iteration 1 returned 2000000 rows in 611.111917ms
Query q5 iteration 2 returned 2000000 rows in 836.5735ms
Query q5 iteration 3 returned 2000000 rows in 622.4575ms
Query q5 iteration 4 returned 2000000 rows in 579.447708ms
Query q6 iteration 0 returned 2000000 rows in 9.371356959s
Query q6 iteration 1 returned 2000000 rows in 6.032997291s
Query q6 iteration 2 returned 2000000 rows in 5.728677125s
Query q6 iteration 3 returned 2000000 rows in 6.046709958s
Query q6 iteration 4 returned 2000000 rows in 5.766419917s
Query q7 iteration 0 returned 2000000 rows in 790.340125ms
Query q7 iteration 1 returned 2000000 rows in 654.001709ms
Query q7 iteration 2 returned 2000000 rows in 860.251ms
Query q7 iteration 3 returned 2000000 rows in 531.644959ms
Query q7 iteration 4 returned 2000000 rows in 525.802541ms
Query q8 iteration 0 returned 2000000 rows in 9.162710916s
Query q8 iteration 1 returned 2000000 rows in 5.64653225s
Query q8 iteration 2 returned 2000000 rows in 5.505889417s
Query q8 iteration 3 returned 2000000 rows in 5.58156175s
Query q8 iteration 4 returned 2000000 rows in 5.635720625s
Query q9 iteration 0 returned 900000 rows in 875.642083ms
Query q9 iteration 1 returned 900000 rows in 655.309166ms
Query q9 iteration 2 returned 900000 rows in 653.490167ms
Query q9 iteration 3 returned 900000 rows in 655.535958ms
Query q9 iteration 4 returned 900000 rows in 655.982292ms
Query q10 iteration 0 returned 810000000 rows in 2.26567725s
Query q10 iteration 1 returned 810000000 rows in 2.690937042s
Query q10 iteration 2 returned 810000000 rows in 3.48998175s
Query q10 iteration 3 returned 810000000 rows in 3.145351041s
Query q10 iteration 4 returned 810000000 rows in 5.294884292s
+ set +x
Done

yongting@Yongtings-MacBook-Pro-2 ~/C/d/benchmarks (nlj-bench *)> ./bench.sh compare nlj-bench nlj-bench
Comparing nlj-bench and nlj-bench
--------------------
--------------------
Benchmark nlj.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃  nlj-bench ┃  nlj-bench ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery q1    │  206.90 ms │  206.90 ms │ no change │
│ QQuery q2    │  239.45 ms │  239.45 ms │ no change │
│ QQuery q3    │  358.53 ms │  358.53 ms │ no change │
│ QQuery q4    │  952.13 ms │  952.13 ms │ no change │
│ QQuery q5    │  572.23 ms │  572.23 ms │ no change │
│ QQuery q6    │ 5728.68 ms │ 5728.68 ms │ no change │
│ QQuery q7    │  525.80 ms │  525.80 ms │ no change │
│ QQuery q8    │ 5505.89 ms │ 5505.89 ms │ no change │
│ QQuery q9    │  653.49 ms │  653.49 ms │ no change │
│ QQuery q10   │ 2265.68 ms │ 2265.68 ms │ no change │
└──────────────┴────────────┴────────────┴───────────┘

Are there any user-facing changes?

jonathanc-n

This is a nice start, I was thinking to add a benchmark that runs identical queries using the execution operators for different join algorithms. So this lets us compare NestedLoopJoin performance to others like HJ or SMJ

jonathanc-n · 2025-07-18T14:20:43Z

Recorded existence join work at #16820

UBarney

@2010YOUY01 Thanks for providing such a comprehensive set of benchmark cases. It would be even better if it could also output the memory consumption for each SQL query, just like in this script.

UBarney · 2025-07-18T14:35:21Z

benchmarks/src/nlj.rs

+            let query_index = match query_name {
+                "q1" => 0,
+                "q2" => 1,
+                "q3" => 2,
+                "q4" => 3,
+                "q5" => 4,
+                "q6" => 5,
+                "q7" => 6,
+                "q8" => 7,
+                "q9" => 8,
+                "q10" => 9,


Perhaps we can rewrite it as follows to avoid this match and available_queries.

datafusion/benchmarks/src/imdb/run.rs

Lines 286 to 292 in 9ae41b1

let query_range = match self.query {

Some(query_id) => query_id..=query_id,

None => IMDB_QUERY_START_ID..=IMDB_QUERY_END_ID,

};

let mut benchmark_run = BenchmarkRun::new();

for query_id in query_range {

let query_range = match self.query { Some(query_id) => query_id..=query_id, None => 1..=NLJ_QUERIES.len(), }; for query_id in query_range { // ... let sql = NLJ_QUERIES[query_id-1]; // ... }

Updated in a3f5d05

UBarney · 2025-07-18T14:38:16Z

benchmarks/src/nlj.rs

+                Err(e) => {
+                    eprintln!("Query {query_name} failed: {e}");
+                    benchmark_run.write_iter(std::time::Duration::from_secs(0), 0);
+                }


Should we return Err(e) ?

Addressed in a3f5d05

UBarney · 2025-07-18T14:38:51Z

benchmarks/src/nlj.rs

+        let physical_plan = df.create_physical_plan().await?;
+        let plan_string = format!("{physical_plan:#?}");
+
+        if !plan_string.contains("NestedLoopJoinExec") {


UBarney · 2025-07-18T14:43:11Z

benchmarks/src/nlj.rs

+            let start = Instant::now();
+            let df = ctx.sql(sql).await?;
+            let batches = df.collect().await?;
+            let elapsed = start.elapsed(); //.as_secs_f64() * 1000.0;


What's meaning of //.as_secs_f64() * 1000.0;

Removed in a3f5d05 to avoid confusion.

2010YOUY01 · 2025-07-19T03:48:26Z

@2010YOUY01 Thanks for providing such a comprehensive set of benchmark cases. It would be even better if it could also output the memory consumption for each SQL query, just like in this script.

I tried to do internal memory profiling with rust, it's a bit tricky. Perhaps integrating a external script is easier. @ding-young is currently working on it.

2010YOUY01 · 2025-07-19T03:48:42Z

Thank you for the review @UBarney @jonathanc-n

UBarney · 2025-07-19T08:18:56Z

benchmarks/src/nlj.rs

+                    // return Err(exec_datafusion_err!(
+                    //     "Query {} not found. Available queries: 1 to {}",
+                    //     query_id,
+                    //     NLJ_QUERIES.len()
+                    // ));


maybe we can remove this ?

benchmarks/src/nlj.rs

2010YOUY01 added 2 commits July 18, 2025 19:12

Micro-benchmark for NLJ

90c96f4

clippy

305826c

jonathanc-n approved these changes Jul 18, 2025

View reviewed changes

jonathanc-n mentioned this pull request Jul 18, 2025

Add Semi/Anti/Mark join types to Nested Loop Join Benchmark #16820

Open

UBarney reviewed Jul 18, 2025

View reviewed changes

2010YOUY01 added 2 commits July 19, 2025 11:41

review

a3f5d05

small clean-up

25ffcf3

UBarney approved these changes Jul 19, 2025

View reviewed changes

2010YOUY01 commented Jul 19, 2025

View reviewed changes

benchmarks/src/nlj.rs Outdated Show resolved Hide resolved

Update benchmarks/src/nlj.rs

5690078

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: Add micro-benchmark for Nested Loop Join operator #16819

Benchmark: Add micro-benchmark for Nested Loop Join operator #16819

2010YOUY01 commented Jul 18, 2025

Uh oh!

jonathanc-n left a comment •

edited

Loading

Uh oh!

jonathanc-n commented Jul 18, 2025

Uh oh!

UBarney left a comment

Uh oh!

UBarney Jul 18, 2025

Uh oh!

2010YOUY01 Jul 19, 2025

Uh oh!

UBarney Jul 18, 2025

Uh oh!

2010YOUY01 Jul 19, 2025

Uh oh!

UBarney Jul 18, 2025

Uh oh!

UBarney Jul 18, 2025

Uh oh!

2010YOUY01 Jul 19, 2025

Uh oh!

2010YOUY01 commented Jul 19, 2025

Uh oh!

2010YOUY01 commented Jul 19, 2025

Uh oh!

UBarney Jul 19, 2025

Uh oh!

Uh oh!

Uh oh!

	let query_range = match self.query {
	Some(query_id) => query_id..=query_id,
	None => IMDB_QUERY_START_ID..=IMDB_QUERY_END_ID,
	};

	let mut benchmark_run = BenchmarkRun::new();
	for query_id in query_range {

Benchmark: Add micro-benchmark for Nested Loop Join operator #16819

Are you sure you want to change the base?

Benchmark: Add micro-benchmark for Nested Loop Join operator #16819

Conversation

2010YOUY01 commented Jul 18, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

jonathanc-n left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Jul 18, 2025

Uh oh!

UBarney left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 commented Jul 19, 2025

Uh oh!

2010YOUY01 commented Jul 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonathanc-n left a comment •

edited

Loading