Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Slurm support #42

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b29ce56
slurm additions - doesn't work
Jan 9, 2015
164839d
mostly functional slurm batch system
Jan 16, 2015
d52838b
turned off random errors in sort test; added JT_SLURM envs for passin…
Jan 20, 2015
0c01cb0
actually doing it correctly
Jan 20, 2015
cfdb615
dynamic batch system loading and command line option definition
Jan 22, 2015
634e9df
slurmopts successfully being passed to SlurmBatchSystem and Worker; b…
Jan 22, 2015
8ffe9bf
Adding 14.03.8 as default
ifxdeploy Jan 22, 2015
273e916
Converted default --slurm-time to string.
ifxdeploy Jan 23, 2015
3a98c34
Really actually fix the default slurm time
ifxdeploy Jan 23, 2015
25754fd
Added multiple attempts for sbatch in case of failure. Sleep time be…
ifxdeploy Mar 5, 2015
b0f4c4a
Modify processAnyNewFile to treat presence of a .new file as failure …
tsackton Mar 11, 2015
0da218b
added print out and retries for zero length pickle files
ifxdeploy Mar 11, 2015
4bbd380
Modify processAnyNewFile to treat presence of a .new file as failure …
tsackton Mar 11, 2015
d9f7dbe
Adding retry code to sacct and scancel methods
tsackton Mar 16, 2015
9484916
Adding retry code to sacct and scancel methods
tsackton Mar 16, 2015
62fddf6
capture exception; return none
ifxdeploy Apr 1, 2015
f1131f8
Merge branch 'master' of github.com:harvardinformatics/jobTree
ifxdeploy Apr 1, 2015
775f4fc
Double memory after job failure (I hope).
tsackton Apr 17, 2015
b307fc0
Added COMPLETING to the finished states and make /usr/bin/squeue the …
ifxdeploy Jun 24, 2015
f5248b5
Attempting to fix memory doubling after failed job.
tsackton Jun 28, 2015
72bf060
support for job name pattern
fasifx Feb 23, 2016
ebac036
remove quotes from --jobname pattern
fasifx Feb 23, 2016
a80e92f
/usr/local/bin/squeue
aaronk Mar 22, 2016
c17eac4
added --constraint
Oct 18, 2016
59e71fe
added --constraint
Oct 18, 2016
50e6f53
adding qos
Mar 20, 2017
b4bdaeb
adding qos
Mar 20, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,8 @@
*.a
bin/
tmp_*
.spyderproject
.project
.pydevproject
.settings*

15 changes: 15 additions & 0 deletions batchSystems/abstractBatchSystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,21 @@ class AbstractBatchSystem:
"""An abstract (as far as python currently allows) base class
to represent the interface the batch system must provide to the jobTree.
"""
@classmethod
def getDisplayNames(cls):
"""
Array of names used to select this batch system on the command line. Returns
None by default.
"""
return None

@classmethod
def getOptionData(cls):
"""
Returns dict for each option. Can be used to add to option parser.
"""
return dict()

def __init__(self, config, maxCpus, maxMemory):
"""This method must be called.
The config object is setup by the jobTreeSetup script and
Expand Down
6 changes: 6 additions & 0 deletions batchSystems/gridengine.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,12 @@ def run(self):
class GridengineBatchSystem(AbstractBatchSystem):
"""The interface for gridengine.
"""
@classmethod
def getDisplayNames(cls):
"""
Names used to select this batch system.
"""
return ["gridengine","gridEngine"]

def __init__(self, config, maxCpus, maxMemory):
AbstractBatchSystem.__init__(self, config, maxCpus, maxMemory) #Call the parent constructor
Expand Down
8 changes: 8 additions & 0 deletions batchSystems/lsf.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,14 @@ class LSFBatchSystem(AbstractBatchSystem):
"""The interface for running jobs on lsf, runs all the jobs you
give it as they come in, but in parallel.
"""

@classmethod
def getDisplayNames(cls):
"""
Names used to select this batch system.
"""
return ["lsf","LSF"]

def __init__(self, config, maxCpus, maxMemory):
AbstractBatchSystem.__init__(self, config, maxCpus, maxMemory) #Call the parent constructor
self.lsfResultsFile = getParasolResultsFileName(config.attrib["job_tree"])
Expand Down
8 changes: 8 additions & 0 deletions batchSystems/parasol.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,14 @@ def getUpdatedJob(parasolResultsFile, outputQueue1, outputQueue2):
class ParasolBatchSystem(AbstractBatchSystem):
"""The interface for Parasol.
"""

@classmethod
def getDisplayNames(cls):
"""
Names used to select this batch system.
"""
return ["parasol"]

def __init__(self, config, maxCpus, maxMemory):
AbstractBatchSystem.__init__(self, config, maxCpus, maxMemory) #Call the parent constructor
if maxMemory != sys.maxint:
Expand Down
7 changes: 7 additions & 0 deletions batchSystems/singleMachine.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ class SingleMachineBatchSystem(AbstractBatchSystem):
"""The interface for running jobs on a single machine, runs all the jobs you
give it as they come in, but in parallel.
"""
@classmethod
def getDisplayNames(cls):
"""
Names used to select this batch system.
"""
return ["single_machine","singleMachine"]

def __init__(self, config, maxCpus, maxMemory, workerFn=worker):
AbstractBatchSystem.__init__(self, config, maxCpus, maxMemory) #Call the parent constructor
self.jobIndex = 0
Expand Down
Loading