From 8c206bab83a357d23ce6f31713d4305361afe0c5 Mon Sep 17 00:00:00 2001 From: Cyclinder Kuo Date: Wed, 12 Feb 2025 15:13:07 +0800 Subject: [PATCH] Docs: add mtu size configuration for SpiderMultusConfig --- .../install/ai/get-started-macvlan-zh_CN.md | 25 ++++++ docs/usage/install/ai/get-started-macvlan.md | 25 ++++++ .../install/ai/get-started-sriov-zh_CN.md | 47 +++++++++-- docs/usage/install/ai/get-started-sriov.md | 83 +++++++++++++------ 4 files changed, 146 insertions(+), 34 deletions(-) diff --git a/docs/usage/install/ai/get-started-macvlan-zh_CN.md b/docs/usage/install/ai/get-started-macvlan-zh_CN.md index dd9a062c4..11ceaa7f4 100644 --- a/docs/usage/install/ai/get-started-macvlan-zh_CN.md +++ b/docs/usage/install/ai/get-started-macvlan-zh_CN.md @@ -246,6 +246,31 @@ EOF ``` + 在一些特殊通信场景下,用户需要为 Pod 自定义 MTU 大小以满足不同数据报文通信需求。您可以通过以下方式自定义配置 Pod 的 MTU 大小: + + ```yaml + apiVersion: spiderpool.spidernet.io/v2beta1 + kind: SpiderMultusConfig + metadata: + name: gpu1-macvlan + namespace: spiderpool + spec: + cniType: macvlan + rdmaResourceName: spidernet.io/shared_cx5_gpu1 + macvlan: + master: ["enp11s0f0np0"] + ippools: + ipv4: ["gpu1-net11"] + chainCNIJsonData: + - | + { + "type": "tuning", + "mtu": 1480 + } + ``` + + 注意: MTU 的取值范围不应该大于 macvlan master 网卡的 MTU 值,否则无法创建 Pod。 + ## 创建测试应用 1. 在指定节点上创建一组 DaemonSet 应用 diff --git a/docs/usage/install/ai/get-started-macvlan.md b/docs/usage/install/ai/get-started-macvlan.md index a4c513aa9..0c95169bb 100644 --- a/docs/usage/install/ai/get-started-macvlan.md +++ b/docs/usage/install/ai/get-started-macvlan.md @@ -246,6 +246,31 @@ The network planning for the cluster is as follows: EOF ``` + In some special communication scenarios, users need to customize the MTU size for Pods to meet the communication needs of different data packets. You can customize the MTU size for Pods in the following way. + + ```yaml + apiVersion: spiderpool.spidernet.io/v2beta1 + kind: SpiderMultusConfig + metadata: + name: gpu1-macvlan + namespace: spiderpool + spec: + cniType: macvlan + rdmaResourceName: spidernet.io/shared_cx5_gpu1 + macvlan: + master: ["enp11s0f0np0"] + ippools: + ipv4: ["gpu1-net11"] + chainCNIJsonData: + - | + { + "type": "tuning", + "mtu": 1480 + } + ``` + + Note: The MTU value should not exceed the MTU value of the macvlan master network interface, otherwise the Pod cannot be created. + ## Create a Test Application 1. Create a DaemonSet application on specified nodes. diff --git a/docs/usage/install/ai/get-started-sriov-zh_CN.md b/docs/usage/install/ai/get-started-sriov-zh_CN.md index 8d4a6ae28..4ce68ef76 100644 --- a/docs/usage/install/ai/get-started-sriov-zh_CN.md +++ b/docs/usage/install/ai/get-started-sriov-zh_CN.md @@ -20,6 +20,10 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb 2. RoCE 网络场景下, 使用了 [SR-IOV CNI](https://github.com/k8snetworkplumbingwg/sriov-cni) 来暴露宿主机上的 RDMA 网卡给 Pod 使用,暴露 RDMA 资源。可额外使用 [RDMA CNI](https://github.com/k8snetworkplumbingwg/rdma-cni) 来完成 RDMA 设备隔离。 +注意: + +- 基于 SR-IOV 技术给容器提供 RDMA 通信能力只适用于裸金属环境,不适用于虚拟机环境。 + ## 方案 本文将以如下典型的 AI 集群拓扑为例,介绍如何搭建 Spiderpool @@ -218,10 +222,10 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb priority: 99 numVfs: 12 nicSelector: - deviceID: "1017" - vendor: "15b3" - rootDevices: - - 0000:86:00.0 + deviceID: "1017" + vendor: "15b3" + rootDevices: + - 0000:86:00.0 linkType: ${LINK_TYPE} deviceType: netdevice isRdma: true @@ -238,10 +242,10 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb priority: 99 numVfs: 12 nicSelector: - deviceID: "1017" - vendor: "15b3" - rootDevices: - - 0000:86:00.0 + deviceID: "1017" + vendor: "15b3" + rootDevices: + - 0000:86:00.0 linkType: ${LINK_TYPE} deviceType: netdevice isRdma: true @@ -366,6 +370,33 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb EOF ``` +4.(可选)自定义 SR-IOV VF 的 MTU + + 在一些特殊通信场景下,用户需要为 Pod 自定义 MTU 大小以满足不同数据报文通信需求。您可以通过以下方式自定义配置 Pod 的 MTU 大小(以 Ethernet 为例): + + ```yaml + apiVersion: spiderpool.spidernet.io/v2beta1 + kind: SpiderMultusConfig + metadata: + name: gpu1-sriov + namespace: spiderpool + spec: + cniType: sriov + sriov: + resourceName: spidernet.io/gpu1sriov + enableRdma: true + ippools: + ipv4: ["gpu1-net11"] + chainCNIJsonData: + - | + { + "type": "tuning", + "mtu": 1480 + } + ``` + + 注意:MTU 的取值范围不应该大于 sriov PF 的 MTU 值。 + ## 创建测试应用 1. 在指定节点上创建一组 DaemonSet 应用,测试指定节点上的 SR-IOV 设备的可用性 diff --git a/docs/usage/install/ai/get-started-sriov.md b/docs/usage/install/ai/get-started-sriov.md index 6fd7facff..f22c70926 100644 --- a/docs/usage/install/ai/get-started-sriov.md +++ b/docs/usage/install/ai/get-started-sriov.md @@ -19,6 +19,10 @@ Different CNIs are used for different network scenarios: 2. In RoCE network scenarios, the [SR-IOV CNI](https://github.com/k8snetworkplumbingwg/sriov-cni) is used to expose the RDMA network interface on the host to the Pod, thereby exposing RDMA resources. Additionally, the [RDMA CNI](https://github.com/k8snetworkplumbingwg/rdma-cni) can be used to achieve RDMA device isolation. +Note: + +- Based on SR-IOV technology, the RDMA communication capability of containers is only applicable to bare metal environments, not to virtual machine environments. + ## Solution This article will introduce how to set up Spiderpool using the following typical AI cluster topology as an example. @@ -213,19 +217,19 @@ The network planning for the cluster is as follows: name: gpu1-nic-policy namespace: spiderpool spec: - nodeSelector: - kubernetes.io/os: "linux" - resourceName: gpu1sriov - priority: 99 - numVfs: 12 - nicSelector: - deviceID: "1017" - vendor: "15b3" - rootDevices: - - 0000:86:00.0 - linkType: ${LINK_TYPE} - deviceType: netdevice - isRdma: true + nodeSelector: + kubernetes.io/os: "linux" + resourceName: gpu1sriov + priority: 99 + numVfs: 12 + nicSelector: + deviceID: "1017" + vendor: "15b3" + rootDevices: + - 0000:86:00.0 + linkType: ${LINK_TYPE} + deviceType: netdevice + isRdma: true --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy @@ -233,19 +237,19 @@ The network planning for the cluster is as follows: name: gpu2-nic-policy namespace: spiderpool spec: - nodeSelector: - kubernetes.io/os: "linux" - resourceName: gpu2sriov - priority: 99 - numVfs: 12 - nicSelector: - deviceID: "1017" - vendor: "15b3" - rootDevices: - - 0000:86:00.0 - linkType: ${LINK_TYPE} - deviceType: netdevice - isRdma: true + nodeSelector: + kubernetes.io/os: "linux" + resourceName: gpu2sriov + priority: 99 + numVfs: 12 + nicSelector: + deviceID: "1017" + vendor: "15b3" + rootDevices: + - 0000:86:00.0 + linkType: ${LINK_TYPE} + deviceType: netdevice + isRdma: true EOF ``` @@ -367,6 +371,33 @@ The network planning for the cluster is as follows: EOF ``` +4. (Optional) Customize the MTU of SR-IOV VF + + In some special communication scenarios, users need to customize the MTU size for Pods to meet the communication requirements of different data packets. You can customize the Pod's MTU configuration as follows (using Ethernet as an example): + + ```yaml + apiVersion: spiderpool.spidernet.io/v2beta1 + kind: SpiderMultusConfig + metadata: + name: gpu1-sriov + namespace: spiderpool + spec: + cniType: sriov + sriov: + resourceName: spidernet.io/gpu1sriov + enableRdma: true + ippools: + ipv4: ["gpu1-net11"] + chainCNIJsonData: + - | + { + "type": "tuning", + "mtu": 1480 + } + ``` + + Note: The MTU value should not exceed the MTU value of the sriov PF. + ## Create a Test Application 1. Create a DaemonSet application on a specified node to test the availability of SR-IOV devices on that node.