Skip to content

perf: Use a global tokio runtime #1614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 8, 2025

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Apr 6, 2025

Which issue does this PR close?

Closes #1590

Helps with #1523

Rationale for this change

If I configure Spark executor with 8GB + 1GB offHeap and using default settings for tokio threads, q4 hangs in main branch, but completes with these changes.

What changes are included in this PR?

  • Allocate one global tokio runtime per process (executor) rather than create one runtime per query
  • Use tokio defaults for number of threads
  • Allow tokio thread counts to be configured with env vars

Based on an executor configured for 8 concurrent tasks and 1 core per task, the defaults for the overall process now change as follows:

Config Before After
worker threads 32 8
max blocking threads 80 512

How are these changes tested?

Manual testing. I do not see any change in overall TPC-H performance, but I can now get q4 to complete with less memory than before.

Note that we cannot close #1523 until apache/datafusion#15323 is resolved.

@codecov-commenter
Copy link

codecov-commenter commented Apr 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.55%. Comparing base (f09f8af) to head (099edab).
Report is 127 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1614      +/-   ##
============================================
+ Coverage     56.12%   58.55%   +2.42%     
- Complexity      976     1063      +87     
============================================
  Files           119      125       +6     
  Lines         11743    12582     +839     
  Branches       2251     2374     +123     
============================================
+ Hits           6591     7367     +776     
- Misses         4012     4020       +8     
- Partials       1140     1195      +55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review April 6, 2025 17:52
@andygrove
Copy link
Member Author

@Kontinuation @wForget could you review?

Copy link
Member

@Kontinuation Kontinuation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Now we are using environment variables to configure tokio runtime, the proper way of setting up these environment variables on Spark executors will be --conf spark.executorEnv.COMET_BLOCKING_THREADS=N in a non-local setup.

Copy link
Member

@wForget wForget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM

@andygrove
Copy link
Member Author

@comphead @parthchandra could I get a committer review?

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@andygrove andygrove merged commit 23dfb03 into apache:main Apr 8, 2025
78 checks passed
@andygrove andygrove deleted the global-tokio-runtime-2 branch April 8, 2025 00:20
@andygrove
Copy link
Member Author

Thanks for the reviews @Kontinuation @wForget @parthchandra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use global tokio runtime per executor process Investigate TPC-H q4 hanging when not enough memory is allocated
5 participants