Skip to content

Commit d4d9b65

Browse files
authored
Merge pull request #2100 from tkatila/gpu-rm-removal
Remove deprecated and unused functionality: GPU RM, QAT Kerneldrv, xpumanager-sidecar
2 parents 24eaf5c + a33d04a commit d4d9b65

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+394
-6039
lines changed

.github/workflows/lib-build.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ jobs:
2626
- intel-dsa-plugin
2727
- intel-iaa-plugin
2828
- intel-idxd-config-initcontainer
29-
- intel-xpumanager-sidecar
3029

3130
# # Demo images
3231
- crypto-perf

.github/workflows/lib-publish.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ jobs:
5656
- intel-dsa-plugin
5757
- intel-iaa-plugin
5858
- intel-idxd-config-initcontainer
59-
- intel-xpumanager-sidecar
6059
steps:
6160
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
6261
- uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5

.trivyignore.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,13 @@ misconfigurations:
1919
- id: AVD-KSV-0047
2020
statement: gpu plugin in kubelet mode requires "nodes/proxy" resource access
2121
paths:
22-
- gpu_plugin/overlays/fractional_resources/gpu-manager-role.yaml
2322
- operator/rbac/gpu_manager_role.yaml
2423
- operator/rbac/role.yaml
2524

2625
- id: AVD-KSV-0014
2726
statement: These are false detections for not setting "readOnlyFilesystem"
2827
paths:
2928
- fpga_plugin/overlays/region/mode-region.yaml
30-
- gpu_plugin/overlays/fractional_resources/add-mounts.yaml
31-
- gpu_plugin/overlays/fractional_resources/add-args.yaml
32-
- gpu_plugin/overlays/fractional_resources/gpu-manager-role.yaml
3329
- gpu_plugin/overlays/monitoring_shared-dev_nfd/add-args.yaml
3430
- gpu_plugin/overlays/nfd_labeled_nodes/add-args.yaml
3531
- iaa_plugin/overlays/iaa_initcontainer/iaa_initcontainer.yaml

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ endif
177177

178178
dockerlib = build/docker/lib
179179
dockertemplates = build/docker/templates
180-
images = $(shell basename -s .Dockerfile.in -a $(dockertemplates)/*.Dockerfile.in | grep -v -e dlb -e fpga -e kerneldrv)
180+
images = $(shell basename -s .Dockerfile.in -a $(dockertemplates)/*.Dockerfile.in | grep -v -e dlb -e fpga -e xpumanager-sidecar)
181181
dockerfiles = $(shell basename -s .in -a $(dockertemplates)/*.Dockerfile.in | xargs -I"{}" echo build/docker/{})
182182

183183
test-image-base-layer:

README.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -196,12 +196,6 @@ The [Device plugins operator README](cmd/operator/README.md) gives the installat
196196

197197
The [Device plugins Operator for OpenShift](https://github.com/intel/intel-technology-enabling-for-openshift) gives the installation and usage details for the operator available on [Red Hat OpenShift Container Platform](https://catalog.redhat.com/software/operators/detail/61e9f2d7b9cdd99018fc5736).
198198

199-
## XeLink XPU Manager Sidecar
200-
201-
To support interconnected GPUs in Kubernetes, XeLink sidecar is needed.
202-
203-
The [XeLink XPU Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it.
204-
205199
## Intel GPU Level-Zero sidecar
206200

207201
Sidecar uses Level-Zero API to provide additional GPU information for the GPU plugin that it cannot get through sysfs interfaces.

build/docker/intel-qat-plugin-kerneldrv.Dockerfile

Lines changed: 0 additions & 72 deletions
This file was deleted.

build/docker/templates/intel-qat-plugin-kerneldrv.Dockerfile.in

Lines changed: 0 additions & 43 deletions
This file was deleted.

cmd/gpu_plugin/README.md

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -47,20 +47,17 @@ Intel GPU plugin may register four node resources to the Kubernetes cluster:
4747
| gpu.intel.com/xe | GPU instance running new `xe` KMD |
4848
| gpu.intel.com/xe_monitoring | Monitoring resource for the new `xe` KMD devices |
4949

50-
While GPU plugin basic operations support nodes having both (`i915` and `xe`) KMDs on the same node, its resource management (=GAS) does not, for that node needs to have only one of the KMDs present.
51-
5250
For workloads on different KMDs, see [KMD and UMD](#kmd-and-umd).
5351

5452
## Modes and Configuration Options
5553

5654
| Flag | Argument | Default | Meaning |
5755
|:---- |:-------- |:------- |:------- |
5856
| -enable-monitoring | - | disabled | Enable '*_monitoring' resource that provides access to all Intel GPU devices on the node, [see use](./monitoring.md) |
59-
| -resource-manager | - | disabled | Deprecated. Enable fractional resource management, [see use](./fractional.md) |
6057
| -health-management | - | disabled | Enable health management by requesting data from oneAPI/Level-Zero interface. Requires [GPU Level-Zero](../gpu_levelzero/) sidecar. See [health management](#health-management) |
6158
| -wsl | - | disabled | Adapt plugin to run in the WSL environment. Requires [GPU Level-Zero](../gpu_levelzero/) sidecar. |
6259
| -shared-dev-num | int | 1 | Number of containers that can share the same GPU device |
63-
| -allocation-policy | string | none | 3 possible values: balanced, packed, none. For shared-dev-num > 1: _balanced_ mode spreads workloads among GPU devices, _packed_ mode fills one GPU fully before moving to next, and _none_ selects first available device from kubelet. Default is _none_. Allocation policy does not have an effect when resource manager is enabled. |
60+
| -allocation-policy | string | none | 3 possible values: balanced, packed, none. For shared-dev-num > 1: _balanced_ mode spreads workloads among GPU devices, _packed_ mode fills one GPU fully before moving to next, and _none_ selects first available device from kubelet. Default is _none_. |
6461

6562
The plugin also accepts a number of other arguments (common to all plugins) related to logging.
6663
Please use the -h option to see the complete list of logging related options.
@@ -75,9 +72,6 @@ Intel GPU-plugin supports a few different operation modes. Depending on the work
7572
|:---- |:-------- |:------- |:------- |
7673
| shared-dev-num == 1 | No, 1 container per GPU | Workloads using all GPU capacity, e.g. AI training | Yes |
7774
| shared-dev-num > 1 | Yes, >1 containers per GPU | (Batch) workloads using only part of GPU resources, e.g. inference, media transcode/analytics, or CPU bound GPU workloads | No |
78-
| shared-dev-num > 1 && resource-management | Depends on resource requests | Any. For requirements and usage, see [fractional resource management](./fractional.md) | Yes. 1000 millicores = exclusive GPU usage. See note below. |
79-
80-
> **Note**: Exclusive GPU usage with >=1000 millicores requires that also *all other GPU containers* specify (non-zero) millicores resource usage.
8175

8276
## Installing driver and firmware for Intel GPUs
8377

@@ -122,10 +116,6 @@ $ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes
122116

123117
GPU plugin can be installed with the Intel Device Plugin Operator. It allows configuring GPU plugin's parameters without kustomizing the deployment files. The general installation is described in the [install documentation](../operator/README.md#installation). For configuring the GPU Custom Resource (CR), see the [configuration options](#modes-and-configuration-options) and [operation modes](#operation-modes-for-different-workload-types).
124118

125-
### Install alongside with GPU Aware Scheduling (deprecated)
126-
127-
GPU plugin can be installed alongside with GPU Aware Scheduling (GAS). It allows scheduling Pods which e.g. request only partial use of a GPU. The installation is described in [fractional resources](./fractional.md) page.
128-
129119
### Verify Plugin Installation
130120

131121
You can verify that the plugin has been installed on the expected nodes by searching for the relevant
@@ -212,9 +202,9 @@ Furthermore, the deployments `securityContext` must be configured with appropria
212202
213203
More info: https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/
214204
215-
### Labels created by GPU plugin
205+
### Labels created for Intel GPUs via NFD
216206
217-
If installed with NFD and started with resource-management, plugin will export a set of labels for the node. For detailed info, see [labeling documentation](./labels.md).
207+
When NFD's NodeFeatureRules for Intel GPUs are installed, nodes are labeled with a variaty of GPU specific labels. For detailed info, see [labeling documentation](./labels.md).
218208

219209
### SR-IOV use with the plugin
220210

cmd/gpu_plugin/device_props.go

Lines changed: 1 addition & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -15,35 +15,22 @@
1515
package main
1616

1717
import (
18-
"slices"
19-
20-
"github.com/intel/intel-device-plugins-for-kubernetes/cmd/internal/labeler"
2118
"github.com/intel/intel-device-plugins-for-kubernetes/cmd/internal/pluginutils"
2219
"k8s.io/klog/v2"
2320
)
2421

2522
type DeviceProperties struct {
2623
currentDriver string
27-
drmDrivers map[string]bool
28-
tileCounts []uint64
2924
isPfWithVfs bool
3025
}
3126

32-
type invalidTileCountErr struct {
33-
error
34-
}
35-
3627
func newDeviceProperties() *DeviceProperties {
37-
return &DeviceProperties{
38-
drmDrivers: make(map[string]bool),
39-
}
28+
return &DeviceProperties{}
4029
}
4130

4231
func (d *DeviceProperties) fetch(cardPath string) {
4332
d.isPfWithVfs = pluginutils.IsSriovPFwithVFs(cardPath)
4433

45-
d.tileCounts = append(d.tileCounts, labeler.GetTileCount(cardPath))
46-
4734
driverName, err := pluginutils.ReadDeviceDriver(cardPath)
4835
if err != nil {
4936
klog.Warningf("card (%s) doesn't have driver, using default: %s", cardPath, deviceTypeDefault)
@@ -52,11 +39,6 @@ func (d *DeviceProperties) fetch(cardPath string) {
5239
}
5340

5441
d.currentDriver = driverName
55-
d.drmDrivers[d.currentDriver] = true
56-
}
57-
58-
func (d *DeviceProperties) drmDriverCount() int {
59-
return len(d.drmDrivers)
6042
}
6143

6244
func (d *DeviceProperties) driver() string {
@@ -66,20 +48,3 @@ func (d *DeviceProperties) driver() string {
6648
func (d *DeviceProperties) monitorResource() string {
6749
return d.currentDriver + monitorSuffix
6850
}
69-
70-
func (d *DeviceProperties) maxTileCount() (uint64, error) {
71-
if len(d.tileCounts) == 0 {
72-
return 0, invalidTileCountErr{}
73-
}
74-
75-
minCount := slices.Min(d.tileCounts)
76-
maxCount := slices.Max(d.tileCounts)
77-
78-
if minCount != maxCount {
79-
klog.Warningf("Node's GPUs are heterogenous (min: %d, max: %d tiles)", minCount, maxCount)
80-
81-
return 0, invalidTileCountErr{}
82-
}
83-
84-
return maxCount, nil
85-
}

0 commit comments

Comments
 (0)