Skip to content

DRA - Define UX for using multiple multi-host resources #5488

@geetasg

Description

@geetasg

Enhancement Description

  • One-line enhancement description (can be used as a release note): Provide a recommendation on how to configure a training job to utilize DRA via partitionable devices spanning multiple multi-host resources.
  • Kubernetes Enhancement Proposal:
  • Discussion Link:
  • PRs by stage and milestone:
    • Alpha - v1.xx
      • KEP (k/enhancements) update PR(s):
      • Code (k/k) update PR(s):
      • Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

Issue - kubernetes-sigs/jobset#762 - requests updates to workload constructs to enable the instantiation of resourceclaims at the right level. All workload constructs will need to be updated to understand this paradigm (ref:kubernetes-sigs/jobset#762 (comment)).
This issue is to discuss if there is a way to enable this for all common workload definitions via DRA without requiring the workload definition update.

Metadata

Metadata

Assignees

No one assigned

    Labels

    sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.wg/device-managementCategorizes an issue or PR as relevant to WG Device Management.

    Type

    No type

    Projects

    Status

    📋 Backlog

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions