-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to know if the physical memory is exhausted? #25
Comments
Thanks Sam. That's useful to know that SSH'ing to the node doesn't work at present. We'll need to get that working. In the meantime, you can get the maximum (high-water mark) usage of the job using
Unlike Casper, Gust does not have the NVMe swap space so the memory stats reported by PBS (and thus |
Cool cool, |
Hey @shaomeng , one trick you can use to get a history of your resource usage is run someting like the following: $ cat log_memory.sh
#!/usr/bin/env bash
logfile="log_memory.out"
if [[ $# -eq 1 ]]; then
logfile=$1
fi
tstart=$(date +%s)
while true; do
echo "### $0 : " $(hostname) " : " $(date) >> ${logfile}
echo "# elapsed (seconds): " $(($(date +%s) - ${tstart})) >> ${logfile}
echo "# $(uptime)" >> ${logfile}
ps -o pid,user,%cpu,%mem,rss,command -u ${USER} >> ${logfile}
sleep 30s
done Then in your PBS script, before launching the main application, start ./log_memory.sh "log-${PBS_JOBID}.out" &
./my_application
... This will produce a simple listing of the CPU and Memory usage of anything you have running on the node at 30 second intervals. Have you tried going from 64 to 96 ranks, or something less aggressive than the full 128? Just to be sure there are a few cores left for OS processes, in case that contention is impacting you when you fully load the node... |
One more suggestion - If you are fully subscribing a node with OpenMP threads you'll want to bind the threads to specific CPU cores for performance. The best way to do this is compiler dependent, but look at the environment variable KMP_AFFINITY for the Intel compiler, or the OpenMP supplied OMP_PROC_BIND for similar functionality in other compilers. |
Thank you all for your suggestions! @benkirk reducing the number of threads doesn't seem to help here. I suspect that my program isn't fully utilizing 128 cores anyway, so there probably almost always has some idle cores for OS processes. @roryck the Another thing I've noticed is that one run can be 50% slower or faster than another run, even though they're tested just minutes apart. My runs have duration of 1-2 minutes, so this 50% difference is a pretty big surprise for me, and I'll need to look into it further. Thank you all! |
@vanderwb Forgot to ask, what's the time resolution of this |
@shaomeng - apologies for missing your question. It gets updated whenever the PBS accounting logs are updated, which should be every scheduler cycle. So I believe roughly you're looking at every 30-60 seconds. |
I was using Gust to run an OpenMP program, and a simple timer says that using 128-way parallelization is even slower than 64-way parallelization. I suspect that this has to do with memory, because there's only 2GB memory per core when all 128 cores are in use.
My question is, is there a way to confirm my suspicion? More generally, is there a way to observe the memory usage while in an interactive session? (I often use
ssh
to log into the compute node and usetop
to observe memory usage, but this approach doesn't seem to work on Gust.)The text was updated successfully, but these errors were encountered: