Skip to content

perf: Add memory profiling #1702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 5, 2025
Merged

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 1, 2025

Which issue does this PR close?

Closes #1701

Rationale for this change

We need a way to profile memory.

What changes are included in this PR?

Logging of JVM and native memory utilization on each call to execute_plan. After running TPC-H locally with 8GB executor memory and 8GB off-heap memory, I see these stats:

JVM_MEMORY: { heapUsed: 2696, heapCommitted: 4394, nonHeapUsed: 142, nonHeapCommitted: 160 }
NATIVE_MEMORY: { resident: 11531.53 }

When running with multiple iterations, I see overall memory exceeding 16GB. If this were running in a container environment then I would expect to see the executor be killed due to OOM.

JVM_MEMORY: { heapUsed: 4217, heapCommitted: 7440, nonHeapUsed: 145, nonHeapCommitted: 165 }
NATIVE_MEMORY: { resident: 18406.37 }

How are these changes tested?

@andygrove
Copy link
Member Author

@mbutrovich I'd like to get your feedback on this PR.

@codecov-commenter
Copy link

codecov-commenter commented May 1, 2025

Codecov Report

Attention: Patch coverage is 37.50000% with 10 lines in your changes missing coverage. Please review.

Project coverage is 58.69%. Comparing base (f09f8af) to head (077f5af).
Report is 177 commits behind head on main.

Files with missing lines Patch % Lines
...ain/scala/org/apache/comet/CometExecIterator.scala 23.07% 9 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1702      +/-   ##
============================================
+ Coverage     56.12%   58.69%   +2.56%     
- Complexity      976     1143     +167     
============================================
  Files           119      129      +10     
  Lines         11743    12615     +872     
  Branches       2251     2359     +108     
============================================
+ Hits           6591     7404     +813     
- Misses         4012     4038      +26     
- Partials       1140     1173      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -359,6 +365,21 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan(
// Retrieve the query
let exec_context = get_execution_context(exec_context);

// memory profiling is only available on linux
if exec_context.memory_profiling_enabled {
#[cfg(target_os = "linux")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the if statement be inside cfg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that originally, but the compiler complained about the unused variable memory_profiling_enabled on non-Linux platforms.

Comment on lines +144 to +149
// scalastyle:off println
println(
"JVM_MEMORY: { " +
s"heapUsed: ${mb(heap.getUsed)}, heapCommitted: ${mb(heap.getCommitted)}, " +
s"nonHeapUsed: ${mb(nonHeap.getUsed)}, nonHeapCommitted: ${mb(nonHeap.getCommitted)} " +
"}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps logInfo ?
If println is preferred, we may have to add back // scalastyle:on println

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @andygrove

@andygrove andygrove changed the title feat: Add memory profiling perf: Add memory profiling May 2, 2025
Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending ci

@andygrove andygrove marked this pull request as draft May 2, 2025 18:51
@andygrove
Copy link
Member Author

Moving to draft while I experiment with jemalloc-specific logging

@andygrove
Copy link
Member Author

I added some jemalloc specific logging:

NATIVE_MEMORY_JEMALLOC: { allocated=561605760, resident=3749654528 }
NATIVE_MEMORY_JEMALLOC: { allocated=360603312, resident=3691925504 }
NATIVE_MEMORY_JEMALLOC: { allocated=161601808, resident=3181555712 }
NATIVE_MEMORY_JEMALLOC: { allocated=14660072, resident=3086413824 }
NATIVE_MEMORY_JEMALLOC: { allocated=14271320, resident=3075936256 }

@mbutrovich does this seem useful?

@andygrove
Copy link
Member Author

andygrove commented May 2, 2025

I'm not sure how to reconcile the different resident numbers:

JVM_MEMORY: { heapUsed: 2027, heapCommitted: 3778, nonHeapUsed: 121, nonHeapCommitted: 131 }
NATIVE_MEMORY: { resident: 4702 }
NATIVE_MEMORY_JEMALLOC: { allocated: 13, resident: 2816 }

edit: updated to use consistent MB formatting

edit: resident memory is close to heapUsed + jemalloc resident, but is this the correct interpretation?

@andygrove andygrove marked this pull request as ready for review May 2, 2025 19:14
Comment on lines +378 to +381
println!(
"NATIVE_MEMORY: {{ resident: {:.0} }}",
(statm.resident * page_size) as f64 / (1024.0 * 1024.0)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
println!(
"NATIVE_MEMORY: {{ resident: {:.0} }}",
(statm.resident * page_size) as f64 / (1024.0 * 1024.0)
);

// Read statistics using MIB key:
let allocated = allocated.read().unwrap() as f64 / (1024.0 * 1024.0);
let resident = resident.read().unwrap() as f64 / (1024.0 * 1024.0);
println!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering should we feed the metrics to log or to plan metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics are for the overall process, so not specific to any particular plan. This feature just allows us to watch the executor logs and see when memory starts to approach the limit. I hope this will help us debug the cause of OOMs, although it will still primarily be a manual process.

@andygrove
Copy link
Member Author

Thanks for the reviews @comphead @kazuyukitanimura and @mbutrovich (offline review). I'm going to go ahead and merge this and then iterate on improving it once I have used it to debug a real issue.

@andygrove andygrove merged commit af02182 into apache:main May 5, 2025
78 checks passed
@andygrove andygrove deleted the memory-profiling branch May 5, 2025 15:59
@mbutrovich
Copy link
Contributor

Not the end of the world, but this introduces warnings on macOS when using jemalloc:

| use tikv_jemalloc_ctl::{epoch, stats};
|                         ^^^^^  ^^^^^
|
= note: `#[warn(unused_imports)]` on by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add memory profiling / logging
5 participants