Skip to content

Add memory profiling / logging #1701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andygrove opened this issue Apr 30, 2025 · 3 comments · Fixed by #1702
Closed

Add memory profiling / logging #1701

andygrove opened this issue Apr 30, 2025 · 3 comments · Fixed by #1702
Assignees
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

andygrove commented Apr 30, 2025

What is the problem the feature request solves?

I would like to add a config that enables memory profiling so that we can monitor JVM and native memory usage throughout the lifetime of a Spark session or job. This data should be written out in a structured file format from which we can generate charts.

In JVM side, we can use:

val memoryMXBean = ManagementFactory.getMemoryMXBean
val heap = memoryMXBean.getHeapMemoryUsage
val nonHeap = memoryMXBean.getNonHeapMemoryUsage

In native side, we can use the procfs crate:

let pid = std::process::id();
let process = Process::new(pid as i32).unwrap();
let statm = process.statm().unwrap();

By logging JVM usage and overall process memory information, we can infer how much native memory is used. We can also log how much memory is reserved in the native memory pools and start to see how that aligns with actual usage.

Describe the potential solution

No response

Additional context

No response

@andygrove andygrove added the enhancement New feature or request label Apr 30, 2025
@andygrove andygrove self-assigned this Apr 30, 2025
@andygrove
Copy link
Member Author

I plan on working on this

@alamb
Copy link
Contributor

alamb commented May 2, 2025

It would be amazing to have memory monitoring of native code in datafusion too -- it is an important feature that is currently hard for downstream crates

@alamb
Copy link
Contributor

alamb commented May 2, 2025

Related discussion in DataFusion:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants