-
Notifications
You must be signed in to change notification settings - Fork 39
Add total memory to job info 878 #879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
b62d135
1331c37
bb9447b
726d989
0c3fd4c
36ba2e1
f3b5b02
d18c69e
2404760
5c8185e
d2df78f
d7912ca
69fcb3e
3c905fb
30c9f73
304449b
cfc744e
b2f5874
892d542
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -871,8 +871,31 @@ def get_state(st) | |
STATE_MAP.fetch(st, :undetermined) | ||
end | ||
|
||
# Parse the memory string returned by Slurm and return bytes | ||
def parse_memory(mem_str) | ||
return nil if mem_str.nil? || mem_str.strip.empty? || !mem_str.match(/[KMGTP]/) | ||
|
||
unit = mem_str.match(/[KMGTP]/).to_s | ||
value = mem_str.match(/\d+/).to_s | ||
|
||
return nil unless unit && value | ||
|
||
factor = { | ||
"K" => 1024, | ||
"M" => 1024**2, | ||
"G" => 1024**3, | ||
"T" => 1024**4, | ||
"P" => 1024**5 | ||
} | ||
|
||
return nil unless factor[unit] | ||
|
||
value.to_i * factor[unit] | ||
end | ||
|
||
# Parse hash describing Slurm job status | ||
def parse_job_info(v) | ||
|
||
allocated_nodes = parse_nodes(v[:node_list]) | ||
if allocated_nodes.empty? | ||
if v[:scheduled_nodes] && v[:scheduled_nodes] != "(null)" | ||
|
@@ -898,10 +921,23 @@ def parse_job_info(v) | |
submission_time: v[:submit_time] ? Time.parse(v[:submit_time]) : nil, | ||
dispatch_time: (v[:start_time].nil? || v[:start_time] == "N/A") ? nil : Time.parse(v[:start_time]), | ||
native: v, | ||
gpus: self.class.gpus_from_gres(v[:gres]) | ||
gpus: self.class.gpus_from_gres(v[:gres]), | ||
total_memory: compute_total_memory(v, allocated_nodes) | ||
) | ||
end | ||
|
||
# Compute the total memory being used by a job | ||
# @return [Integer] total memory in bytes | ||
def compute_total_memory(v, allocated_nodes) | ||
return nil unless v[:min_memory].to_s.match?(/\d+/) | ||
|
||
min_memory = parse_memory(v[:min_memory]) | ||
|
||
return nil if min_memory.nil? | ||
|
||
min_memory * allocated_nodes.count | ||
end | ||
Comment on lines
+931
to
+939
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is right. Here's a related ticket on Slurm memory and what min_memory is actually reporting. Seems like we need to muliply this by how many cores have been allocated. And if the whole node is allocated, and this value is 0, then I guess 0 is the best value we can get at this time? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea I think
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I'm now seeing that even assuming it's per core is wrong. I found this job on ascend, with 100G memory and 8 cores. But the actual memory usage isn't 800G, it's just the 100G.
It's not clear to me how to find out if this is value per core or if it's total. |
||
|
||
# Replace '(null)' with nil | ||
def handle_null_account(account) | ||
(account != '(null)') ? account : nil | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit concerned about having to do this translation. I don't think anything other than
M
is ever given. Have you seen other values out ofsqueue
? docs indicate it'sM
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only say I've seen the 'M' and 'G' in outputs, and I had a hard time figuring out what SLURM does here with this tbh. I haven't seen any 'K', and I don't know if we have any jobs large enough to see 'T' or 'P'. But, I have seen both 'M' and 'G' in responses from
squeue
live on the OSC systems withsqueue --Format=MinMemory
though it's usually only a few users.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ We use the
--noconvert
flag here so it's alwaysM
, that's why I can't see any G out of say active jobs page.ood_core/lib/ood_core/job/adapters/slurm.rb
Line 221 in be9ede9