Skip to content

Commit

Permalink
Docs: add mtu size configuration for SpiderMultusConfig
Browse files Browse the repository at this point in the history
  • Loading branch information
cyclinder committed Feb 12, 2025
1 parent 62d4e07 commit 8c206ba
Show file tree
Hide file tree
Showing 4 changed files with 146 additions and 34 deletions.
25 changes: 25 additions & 0 deletions docs/usage/install/ai/get-started-macvlan-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,31 @@
EOF
```
在一些特殊通信场景下,用户需要为 Pod 自定义 MTU 大小以满足不同数据报文通信需求。您可以通过以下方式自定义配置 Pod 的 MTU 大小:
```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: gpu1-macvlan
namespace: spiderpool
spec:
cniType: macvlan
rdmaResourceName: spidernet.io/shared_cx5_gpu1
macvlan:
master: ["enp11s0f0np0"]
ippools:
ipv4: ["gpu1-net11"]
chainCNIJsonData:
- |
{
"type": "tuning",
"mtu": 1480
}
```
注意: MTU 的取值范围不应该大于 macvlan master 网卡的 MTU 值,否则无法创建 Pod。
## 创建测试应用
1. 在指定节点上创建一组 DaemonSet 应用
Expand Down
25 changes: 25 additions & 0 deletions docs/usage/install/ai/get-started-macvlan.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,31 @@ The network planning for the cluster is as follows:
EOF
```
In some special communication scenarios, users need to customize the MTU size for Pods to meet the communication needs of different data packets. You can customize the MTU size for Pods in the following way.
```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: gpu1-macvlan
namespace: spiderpool
spec:
cniType: macvlan
rdmaResourceName: spidernet.io/shared_cx5_gpu1
macvlan:
master: ["enp11s0f0np0"]
ippools:
ipv4: ["gpu1-net11"]
chainCNIJsonData:
- |
{
"type": "tuning",
"mtu": 1480
}
```
Note: The MTU value should not exceed the MTU value of the macvlan master network interface, otherwise the Pod cannot be created.
## Create a Test Application
1. Create a DaemonSet application on specified nodes.
Expand Down
47 changes: 39 additions & 8 deletions docs/usage/install/ai/get-started-sriov-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb

2. RoCE 网络场景下, 使用了 [SR-IOV CNI](https://github.com/k8snetworkplumbingwg/sriov-cni) 来暴露宿主机上的 RDMA 网卡给 Pod 使用,暴露 RDMA 资源。可额外使用 [RDMA CNI](https://github.com/k8snetworkplumbingwg/rdma-cni) 来完成 RDMA 设备隔离。

注意:

- 基于 SR-IOV 技术给容器提供 RDMA 通信能力只适用于裸金属环境,不适用于虚拟机环境。

## 方案

本文将以如下典型的 AI 集群拓扑为例,介绍如何搭建 Spiderpool
Expand Down Expand Up @@ -218,10 +222,10 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb
priority: 99
numVfs: 12
nicSelector:
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
linkType: ${LINK_TYPE}
deviceType: netdevice
isRdma: true
Expand All @@ -238,10 +242,10 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb
priority: 99
numVfs: 12
nicSelector:
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
linkType: ${LINK_TYPE}
deviceType: netdevice
isRdma: true
Expand Down Expand Up @@ -366,6 +370,33 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb
EOF
```
4.(可选)自定义 SR-IOV VF 的 MTU
在一些特殊通信场景下,用户需要为 Pod 自定义 MTU 大小以满足不同数据报文通信需求。您可以通过以下方式自定义配置 Pod 的 MTU 大小(以 Ethernet 为例):
```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: gpu1-sriov
namespace: spiderpool
spec:
cniType: sriov
sriov:
resourceName: spidernet.io/gpu1sriov
enableRdma: true
ippools:
ipv4: ["gpu1-net11"]
chainCNIJsonData:
- |
{
"type": "tuning",
"mtu": 1480
}
```
注意:MTU 的取值范围不应该大于 sriov PF 的 MTU 值。
## 创建测试应用
1. 在指定节点上创建一组 DaemonSet 应用,测试指定节点上的 SR-IOV 设备的可用性
Expand Down
83 changes: 57 additions & 26 deletions docs/usage/install/ai/get-started-sriov.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ Different CNIs are used for different network scenarios:

2. In RoCE network scenarios, the [SR-IOV CNI](https://github.com/k8snetworkplumbingwg/sriov-cni) is used to expose the RDMA network interface on the host to the Pod, thereby exposing RDMA resources. Additionally, the [RDMA CNI](https://github.com/k8snetworkplumbingwg/rdma-cni) can be used to achieve RDMA device isolation.

Note:

- Based on SR-IOV technology, the RDMA communication capability of containers is only applicable to bare metal environments, not to virtual machine environments.

## Solution

This article will introduce how to set up Spiderpool using the following typical AI cluster topology as an example.
Expand Down Expand Up @@ -213,39 +217,39 @@ The network planning for the cluster is as follows:
name: gpu1-nic-policy
namespace: spiderpool
spec:
nodeSelector:
kubernetes.io/os: "linux"
resourceName: gpu1sriov
priority: 99
numVfs: 12
nicSelector:
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
linkType: ${LINK_TYPE}
deviceType: netdevice
isRdma: true
nodeSelector:
kubernetes.io/os: "linux"
resourceName: gpu1sriov
priority: 99
numVfs: 12
nicSelector:
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
linkType: ${LINK_TYPE}
deviceType: netdevice
isRdma: true
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: gpu2-nic-policy
namespace: spiderpool
spec:
nodeSelector:
kubernetes.io/os: "linux"
resourceName: gpu2sriov
priority: 99
numVfs: 12
nicSelector:
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
linkType: ${LINK_TYPE}
deviceType: netdevice
isRdma: true
nodeSelector:
kubernetes.io/os: "linux"
resourceName: gpu2sriov
priority: 99
numVfs: 12
nicSelector:
deviceID: "1017"
vendor: "15b3"
rootDevices:
- 0000:86:00.0
linkType: ${LINK_TYPE}
deviceType: netdevice
isRdma: true
EOF
```
Expand Down Expand Up @@ -367,6 +371,33 @@ The network planning for the cluster is as follows:
EOF
```
4. (Optional) Customize the MTU of SR-IOV VF
In some special communication scenarios, users need to customize the MTU size for Pods to meet the communication requirements of different data packets. You can customize the Pod's MTU configuration as follows (using Ethernet as an example):

```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: gpu1-sriov
namespace: spiderpool
spec:
cniType: sriov
sriov:
resourceName: spidernet.io/gpu1sriov
enableRdma: true
ippools:
ipv4: ["gpu1-net11"]
chainCNIJsonData:
- |
{
"type": "tuning",
"mtu": 1480
}
```

Note: The MTU value should not exceed the MTU value of the sriov PF.

## Create a Test Application

1. Create a DaemonSet application on a specified node to test the availability of SR-IOV devices on that node.
Expand Down

0 comments on commit 8c206ba

Please sign in to comment.