Skip to content

Commit

Permalink
Issues/1951 update aws spot documentation (#4310)
Browse files Browse the repository at this point in the history
* Update Preemptibility documentation

* Add example of --defaultPreemptible to preemptability section

* Replace preemptable with preemptible

* Add compatibilty for spelling preemptible preemptable

* Remove note in job.py referring to preemptable

* Change Preempability to Preemptibility

* Update documentation, add support for preemptable

* add backwards compatibility for preemptable keyword

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: Lon Blauvelt <[email protected]>
  • Loading branch information
3 people authored Feb 2, 2023
1 parent 02b4aad commit eb21245
Show file tree
Hide file tree
Showing 29 changed files with 558 additions and 535 deletions.
32 changes: 16 additions & 16 deletions docs/running/cliOptions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,9 +248,9 @@ autoscaled cluster, as well as parameters to control the level of provisioning.
--nodeTypes NODETYPES
Specifies a list of comma-separated node types, each of which is
composed of slash-separated instance types, and an optional spot
bid set off by a colon, making the node type preemptable. Instance
bid set off by a colon, making the node type preemptible. Instance
types may appear in multiple node types, and the same node type
may appear as both preemptable and non-preemptable.
may appear as both preemptible and non-preemptible.

Valid argument specifying two node types:
c5.4xlarge/c5a.4xlarge:0.42,t2.large
Expand Down Expand Up @@ -288,16 +288,16 @@ autoscaled cluster, as well as parameters to control the level of provisioning.
--scaleInterval SCALEINTERVAL
The interval (seconds) between assessing if the scale of
the cluster needs to change. (Default: 60)
--preemptableCompensation PREEMPTABLECOMPENSATION
--preemptibleCompensation PREEMPTIBLECOMPENSATION
The preference of the autoscaler to replace
preemptable nodes with non-preemptable nodes, when
preemptable nodes cannot be started for some reason.
preemptible nodes with non-preemptible nodes, when
preemptible nodes cannot be started for some reason.
Defaults to 0.0. This value must be between 0.0 and
1.0, inclusive. A value of 0.0 disables such
compensation, a value of 0.5 compensates two missing
preemptable nodes with a non-preemptable one. A value
preemptible nodes with a non-preemptible one. A value
of 1.0 replaces every missing pre-emptable node with a
non-preemptable one.
non-preemptible one.
--nodeStorage NODESTORAGE
Specify the size of the root volume of worker nodes
when they are launched in gigabytes. You may want to
Expand All @@ -321,10 +321,10 @@ keeping this limited we can avoid nodes occupied with services causing deadlocks
--maxServiceJobs MAXSERVICEJOBS
The maximum number of service jobs that can be run
concurrently, excluding service jobs running on
preemptable nodes. default=9223372036854775807
--maxPreemptableServiceJobs MAXPREEMPTABLESERVICEJOBS
preemptible nodes. default=9223372036854775807
--maxPreemptibleServiceJobs MAXPREEMPTIBLESERVICEJOBS
The maximum number of service jobs that can run
concurrently on preemptable nodes.
concurrently on preemptible nodes.
default=9223372036854775807
--deadlockWait DEADLOCKWAIT
Time, in seconds, to tolerate the workflow running only
Expand Down Expand Up @@ -371,8 +371,8 @@ from the batch system.
type and a count are used, they must be separated by a
colon. If multiple types of accelerators are used, the
specifications are separated by commas. Default is [].
--defaultPreemptable BOOL
Make all jobs able to run on preemptable (spot) nodes
--defaultPreemptible BOOL
Make all jobs able to run on preemptible (spot) nodes
by default.
--maxCores INT The maximum number of CPU cores to request from the
batch system at any one time. Standard suffixes like
Expand All @@ -391,8 +391,8 @@ systems have issues!).
--retryCount RETRYCOUNT
Number of times to retry a failing job before giving
up and labeling job failed. default=1
--enableUnlimitedPreemptableRetries
If set, preemptable failures (or any failure due to an
--enableUnlimitedPreemptibleRetries
If set, preemptible failures (or any failure due to an
instance getting unexpectedly terminated) will not count
towards job failures and -\\-retryCount.
--doubleMem If set, batch jobs which die due to reaching memory
Expand Down Expand Up @@ -514,8 +514,8 @@ to run both simultaneously. To cope with this situation Toil attempts to
schedule services and accessors intelligently, however to avoid a deadlock
with workflows running service jobs it is advisable to use the following parameters:

* ``--maxServiceJobs``: The maximum number of service jobs that can be run concurrently, excluding service jobs running on preemptable nodes.
* ``--maxPreemptableServiceJobs``: The maximum number of service jobs that can run concurrently on preemptable nodes.
* ``--maxServiceJobs``: The maximum number of service jobs that can be run concurrently, excluding service jobs running on preemptible nodes.
* ``--maxPreemptibleServiceJobs``: The maximum number of service jobs that can run concurrently on preemptible nodes.

Specifying these parameters so that at a maximum cluster size there will be
sufficient resources to run accessors in addition to services will ensure that
Expand Down
34 changes: 21 additions & 13 deletions docs/running/cloud/amazon.rst
Original file line number Diff line number Diff line change
Expand Up @@ -324,32 +324,40 @@ For more information on other autoscaling (and other) options have a look at :re
Some important caveats about starting a toil run through an ssh session are
explained in the :ref:`sshCluster` section.

Preemptability
Preemptibility
^^^^^^^^^^^^^^

Toil can run on a heterogeneous cluster of both preemptable and non-preemptable nodes. Being preemptable node simply
Toil can run on a heterogeneous cluster of both preemptible and non-preemptible nodes. Being a preemptible node simply
means that the node may be shut down at any time, while jobs are running. These jobs can then be restarted later
somewhere else.

A node type can be specified as preemptable by adding a `spot bid`_ to its entry in the list of node types provided with
the ``--nodeTypes`` flag. If spot instance prices rise above your bid, the preemptable node whill be shut down.
A node type can be specified as preemptible by adding a `spot bid`_ to its entry in the list of node types provided with
the ``--nodeTypes`` flag. If spot instance prices rise above your bid, the preemptible node whill be shut down.

While individual jobs can each explicitly specify whether or not they should be run on preemptable nodes
via the boolean ``preemptable`` resource requirement, the ``--defaultPreemptable`` flag will allow jobs without a
``preemptable`` requirement to run on preemptable machines.
Individual jobs can explicitly specify whether they should be run on preemptible nodes via the boolean ``preemptible``
resource requirement, if this is not specified, the job will not run on preemptible nodes even if preemptible nodes
are available unless specified with the ``--defaultPreemptible`` flag. The ``--defaultPreemptible`` flag will allow
jobs without a ``preemptible`` requirement to run on preemptible machines. For example::

.. admonition:: Specify Preemptability Carefully
$ python /root/sort.py aws:us-west-2:<my-jobstore-name> \
--provisioner aws \
--nodeTypes c3.4xlarge:2.00 \
--maxNodes 2 \
--batchSystem mesos \
--defaultPreemptible

.. admonition:: Specify Preemptibility Carefully

Ensure that your choices for ``--nodeTypes`` and ``--maxNodes <>`` make
sense for your workflow and won't cause it to hang. You should make sure the
provisioner is able to create nodes large enough to run the largest job
in the workflow, and that non-preemptable node types are allowed if there are
non-preemptable jobs in the workflow.
in the workflow, and that non-preemptible node types are allowed if there are
non-preemptible jobs in the workflow.

Finally, the ``--preemptableCompensation`` flag can be used to handle cases where preemptable nodes may not be
Finally, the ``--preemptibleCompensation`` flag can be used to handle cases where preemptible nodes may not be
available but are required for your workflow. With this flag enabled, the autoscaler will attempt to compensate
for a shortage of preemptable nodes of a certain type by creating non-preemptable nodes of that type, if
non-preemptable nodes of that type were specified in ``--nodeTypes``.
for a shortage of preemptible nodes of a certain type by creating non-preemptible nodes of that type, if
non-preemptible nodes of that type were specified in ``--nodeTypes``.

.. _spot bid: https://aws.amazon.com/ec2/spot/pricing/

Expand Down
8 changes: 4 additions & 4 deletions src/toil/batchSystems/abstractBatchSystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
class BatchJobExitReason(enum.IntEnum):
FINISHED: int = 1 # Successfully finished.
FAILED: int = 2 # Job finished, but failed.
LOST: int = 3 # Preemptable failure (job's executing host went away).
LOST: int = 3 # Preemptible failure (job's executing host went away).
KILLED: int = 4 # Job killed before finishing.
ERROR: int = 5 # Internal error.
MEMLIMIT: int = 6 # Job hit batch system imposed memory limit
Expand Down Expand Up @@ -476,12 +476,12 @@ class AbstractScalableBatchSystem(AbstractBatchSystem):
"""

@abstractmethod
def getNodes(self, preemptable: Optional[bool] = None, timeout: int = 600) -> Dict[str, NodeInfo]:
def getNodes(self, preemptible: Optional[bool] = None, timeout: int = 600) -> Dict[str, NodeInfo]:
"""
Returns a dictionary mapping node identifiers of preemptable or non-preemptable nodes to
Returns a dictionary mapping node identifiers of preemptible or non-preemptible nodes to
NodeInfo objects, one for each node.
:param preemptable: If True (False) only (non-)preemptable nodes will be returned.
:param preemptible: If True (False) only (non-)preemptible nodes will be returned.
If None, all nodes will be returned.
"""
raise NotImplementedError()
Expand Down
28 changes: 14 additions & 14 deletions src/toil/batchSystems/kubernetes.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,37 +538,37 @@ def __init__(self) -> None:
Taints which are allowed to be present (with these values).
"""

def set_preemptable(self, preemptable: bool) -> None:
def set_preemptible(self, preemptible: bool) -> None:
"""
Add constraints for a job being preemptible or not.
Preemptable jobs will be able to run on preemptable or non-preemptable
nodes, and will prefer preemptable nodes if available.
Preemptible jobs will be able to run on preemptible or non-preemptible
nodes, and will prefer preemptible nodes if available.
Non-preemptable jobs will not be allowed to run on nodes that are
marked as preemptable.
Non-preemptible jobs will not be allowed to run on nodes that are
marked as preemptible.
Understands the labeling scheme used by EKS, and the taint scheme used
by GCE. The Toil-managed Kubernetes setup will mimic at least one of
these.
"""

# We consider nodes preemptable if they have any of these label or taint values.
# We consider nodes preemptible if they have any of these label or taint values.
# We tolerate all effects of specified taints.
# Amazon just uses a label, while Google
# <https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms>
# uses a label and a taint.
PREEMPTABLE_SCHEMES = {'labels': [('eks.amazonaws.com/capacityType', ['SPOT']),
PREEMPTIBLE_SCHEMES = {'labels': [('eks.amazonaws.com/capacityType', ['SPOT']),
('cloud.google.com/gke-preemptible', ['true'])],
'taints': [('cloud.google.com/gke-preemptible', ['true'])]}

if preemptable:
# We want to seek preemptable labels and tolerate preemptable taints.
self.desired_labels += PREEMPTABLE_SCHEMES['labels']
self.tolerated_taints += PREEMPTABLE_SCHEMES['taints']
if preemptible:
# We want to seek preemptible labels and tolerate preemptible taints.
self.desired_labels += PREEMPTIBLE_SCHEMES['labels']
self.tolerated_taints += PREEMPTIBLE_SCHEMES['taints']
else:
# We want to prohibit preemptable labels
self.prohibited_labels += PREEMPTABLE_SCHEMES['labels']
# We want to prohibit preemptible labels
self.prohibited_labels += PREEMPTIBLE_SCHEMES['labels']


def apply(self, pod_spec: V1PodSpec) -> None:
Expand Down Expand Up @@ -695,7 +695,7 @@ def _create_pod_spec(

# Also start on the placement constraints
placement = KubernetesBatchSystem.Placement()
placement.set_preemptable(job_desc.preemptable)
placement.set_preemptible(job_desc.preemptible)

for accelerator in job_desc.accelerators:
# Add in requirements for accelerators (GPUs).
Expand Down
Loading

0 comments on commit eb21245

Please sign in to comment.