Skip to content

Investigate TPC-H q4 hanging when not enough memory is allocated #1523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andygrove opened this issue Mar 13, 2025 · 6 comments · Fixed by #1614
Closed

Investigate TPC-H q4 hanging when not enough memory is allocated #1523

andygrove opened this issue Mar 13, 2025 · 6 comments · Fixed by #1614
Labels
bug Something isn't working
Milestone

Comments

@andygrove
Copy link
Member

Describe the bug

During benchmarking, I find that TPC-H q4 "hangs" indefinitely in the sort-merge join when there is not much memory allocated. I would expect the operator to be slow and spill but it seems to be in some kind of deadlock situation instead, with the stats never changing except for the "total time for joining".

Image

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

@andygrove andygrove added the bug Something isn't working label Mar 13, 2025
@andygrove andygrove added this to the 0.8.0 milestone Mar 13, 2025
@Kontinuation
Copy link
Member

The query blocked because we don't have enough number of blocking threads configured for the tokio runtime.

In merge phase, each spill file will be wrapped by a stream backed by a blocking thread (see read_spill_as_stream), so we'll spawn at least 183 blocking threads when there are 183 spill files to merge spilled data. The default number of blocking thread is 10, this make the query hang indefinitely.

Tuning spark.comet.blockingThreads to a higher value could resolve this problem. We may consider raising the default value of spark.comet.blockingThreads, or improving sort-merge in datafusion to not spawning so many blocking threads.

@andygrove
Copy link
Member Author

Thanks for debugging this @Kontinuation. Related to this, we currently create a new tokio runtime per plan. I do wonder if we should just have a global tokio runtime for the executor where we could allocate a higher number of threads that could be shared. Do you have an opinion on that?

@andygrove
Copy link
Member Author

I filed an issue in DataFusion: apache/datafusion#15323

@Kontinuation
Copy link
Member

I prefer reusing the global tokio runtime for running all comet physical plans within the same process. The current tokio runtime per plan approach spawns needlessly large number of threads. We can also have a larger default of max blocking threads and these blocking threads can be better utilized by concurrently running queries.

Having a global tokio runtime may prevent us from re-configuring the number of worker threads and blocking threads in an active Spark context using spark.conf.set, but I don't think it a big problem.

@andygrove
Copy link
Member Author

Here is an old PR that switched to using a global tokio runtime. I close the PR because I could not find a good justification for it at the time. Perhaps we should try this again and see if it helps with this issue.

#1104

@andygrove
Copy link
Member Author

I filed #1590 for switching to a global tokio runtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants