Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grid engine support for terabyte (T) MEMTOT output from qhost, and cpu specifications #41

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cooketho
Copy link

The function obtainSystemConstants() in the GridEngineBatchSystem class in batchSystems/gridengine.py threw the error "ValueError: invalid literal for float(): 1.5T" when I tried to run it on a system that has 1.5T of available memory. I modified the MemoryString class to handle qhost output in the terabyte (T) range.

jobTree then worked fine, but the jobs it submitted to sge sat in queued "qw" state indefinitely. The reason was it was requesting a single processor per node via "qsub -l num_proc=1", but none of the nodes on my system have exactly one processor (they have more than that). I modified the prepareQsub(cpu, mem) function to use "qsub -pe shm 1". This now works on my system, but the function might have to be generalized to work on others (if something other than the shm parallel environment is being used).

@hannes-ucsc
Copy link

Thank you for the pull request. jobTree is now Toil and is maintained in a different repository. We are working to integrate your changes to Toil.

@@ -69,7 +71,7 @@ def prepareQsub(cpu, mem):
"LD_LIBRARY_PATH=%s" % os.environ["LD_LIBRARY_PATH"]]
reqline = list()
if cpu is not None:
reqline.append("p="+str(cpu))
qsubline.extend(["-pe", "shm", str(int(cpu))])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toil already switched to -pe but it uses -pe smp instead of -pe shm. Do you think they are equivalent?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hannes,

I don't think "-pe shm" and "-pe smp" are equivalent, but I'm not entirely
sure.

For example, on the computer I use, the results of the command qconf -spl
are:
orte
shm

So I don't think I can use the smp parallel environment on this particular
computer without making some changes.

I can't think of an elegant solution at the moment, but maybe you could
prompt the user for the name of the appropriate parallel environment, just
once, during package setup.

cheers,
Tom

On Tue, Oct 6, 2015 at 6:05 PM, Hannes Schmidt [email protected]
wrote:

In batchSystems/gridengine.py
#41 (comment):

@@ -69,7 +71,7 @@ def prepareQsub(cpu, mem):
"LD_LIBRARY_PATH=%s" % os.environ["LD_LIBRARY_PATH"]]
reqline = list()
if cpu is not None:

  •    reqline.append("p="+str(cpu))
    
  •    qsubline.extend(["-pe", "shm", str(int(cpu))])
    

Toil already switched to -pe but it uses -pe smp instead of -pe shm. Do
you think they are equivalent?


Reply to this email directly or view it on GitHub
https://github.com/benedictpaten/jobTree/pull/41/files#r41341883.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants