Skip to content

Commit 707e4d5

Browse files
authored
Merge release v0.1.6
Release v0.1.6
2 parents 0030f36 + e2b3971 commit 707e4d5

File tree

6 files changed

+288
-8
lines changed

6 files changed

+288
-8
lines changed

docs/guides/external-mgs/readme.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ These three methods are not mutually exclusive on the system as a whole. Individ
1717

1818
## Configuration with an External MGT
1919

20+
### Storage Profile
2021
An existing MGT external to the NNF cluster can be used to manage the Lustre file systems on the NNF nodes. An advantage to this configuration is that the MGT can be highly available through multiple MGSs. A disadvantage is that there is only a single MGT. An MGT shared between more than a handful of Lustre file systems is not a common use case, so the Lustre code may prove less stable.
2122

2223
The following yaml provides an example of what the `NnfStorageProfile` should contain to use an MGT on an external server.
@@ -30,12 +31,49 @@ metadata:
3031
data:
3132
[...]
3233
lustreStorage:
33-
externalMgs: 1.2.3.4@eth0
34+
externalMgs: 1.2.3.4@eth0:1.2.3.5@eth0
3435
combinedMgtMdt: false
3536
standaloneMgtPoolName: ""
3637
[...]
3738
```
3839

40+
### NnfLustreMGT
41+
42+
A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any unused fsnames from the MGT disk for any internally hosted MGTs.
43+
44+
For a MGT hosted on an external node, an admin must create an `NnfLustreMGT` resource. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.
45+
46+
Below is an example `NnfLustreMGT` resource. The `NnfLustreMGT` resource for external MGSs must be created in the `nnf-system` namespace.
47+
48+
```yaml
49+
apiVersion: nnf.cray.hpe.com/v1alpha1
50+
kind: NnfLustreMGT
51+
metadata:
52+
name: external-mgt
53+
namespace: nnf-system
54+
spec:
55+
addresses:
56+
- "1.2.3.4@eth0:1.2.3.5@eth0"
57+
fsNameStart: "aaaaaaaa"
58+
fsNameBlackList:
59+
- "mylustre"
60+
fsNameStartReference:
61+
name: external-mgt
62+
namespace: default
63+
kind: ConfigMap
64+
```
65+
66+
* `addresses` - This is a list of LNet addresses that could be used for this MGT. This should match any values that are used in the `externalMgs` field in the `NnfStorageProfiles`.
67+
* `fsNameStart` - The first fsname to use. Subsequent fsnames will be incremented based on this starting fsname (e.g, `aaaaaaaa`, `aaaaaaab`, `aaaaaaac`). fsnames use lowercase letters `'a'`-`'z'`. `fsNameStart` should be exactly 8 characters long.
68+
* `fsNameBlackList` - This is a list of fsnames that should not be given to any NNF Lustre file systems. If the MGT is hosting any non-NNF Lustre file systems, their fsnames should be included in this blacklist.
69+
* `fsNameStartReference` - This is an optional `ObjectReference` to a `ConfigMap` that holds a starting fsname. If this field is specified, it takes precedence over the `fsNameStart` field in the spec. The `ConfigMap` will be updated to the next available fsname every time an fsname is assigned to a new Lustre file system.
70+
71+
### ConfigMap
72+
73+
For external MGTs, the `fsNameStartReference` should be used to point to a `ConfigMap` in the `default` namespace. The `ConfigMap` should be left empty initially. The `ConfigMap` is used to hold the value of the next available fsname, and it should not be deleted or modified while a `NnfLustreMGT` resource is referencing it. Removing the `ConfigMap` will cause the Rabbit software to lose track of which fsnames have already been used on the MGT. This is undesireable unless the external MGT is no longer being used by Rabbit software or if an admin has erased all previously used fsnames with the `lctl erase_lcfg [fsname]` command.
74+
75+
When using the `ConfigMap`, the nnf-sos software may be undeployed and redeployed without losing track of the next fsname value. During an undeploy, the `NnfLustreMGT` resource will be removed. During a deploy, the `NnfLustreMGT` resource will read the fsname value from the `ConfigMap` if it is present. The value in the `ConfigMap` will override the fsname in the `fsNameStart` field.
76+
3977
## Configuration with Persistent Lustre
4078

4179
The MGT from a persistent Lustre file system hosted on the NNF nodes can also be used as the MGT for other NNF Lustre file systems. This configuration has the advantage of not relying on any hardware outside of the cluster. However, there is no high availability, and a single MGT is still shared between all Lustre file systems created on the cluster.

docs/guides/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,13 @@
1616
* [Lustre External MGT](external-mgs/readme.md)
1717
* [Global Lustre](global-lustre/readme.md)
1818
* [Directive Breakdown](directive-breakdown/readme.md)
19+
* [User Interactions](user-interactions/readme.md)
1920

2021
## NNF User Containers
2122

2223
* [User Containers](user-containers/readme.md)
2324

2425
## Node Management
2526

26-
* [Draining A Node](node-management/drain.md)
27+
* [Disable or Drain a Node](node-management/drain.md)
2728
* [Debugging NVMe Namespaces](node-management/nvme-namespaces.md)

docs/guides/node-management/drain.md

Lines changed: 62 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,40 @@
1-
# Draining A Node
1+
# Disable Or Drain A Node
2+
3+
## Disabling a node
4+
5+
A Rabbit node can be manually disabled, indicating to the WLM that it should not schedule more jobs on the node. Jobs currently on the node will be allowed to complete at the discretion of the WLM.
6+
7+
Disable a node by setting its Storage state to `Disabled`.
8+
9+
```shell
10+
kubectl patch storage $NODE --type=json -p '[{"op":"replace", "path":"/spec/state", "value": "Disabled"}]'
11+
```
12+
13+
When the Storage is queried by the WLM, it will show the disabled status.
14+
15+
```console
16+
$ kubectl get storages
17+
NAME STATE STATUS MODE AGE
18+
kind-worker2 Enabled Ready Live 10m
19+
kind-worker3 Disabled Disabled Live 10m
20+
```
21+
22+
To re-enable a node, set its Storage state to `Enabled`.
23+
24+
```shell
25+
kubectl patch storage $NODE --type=json -p '[{"op":"replace", "path":"/spec/state", "value": "Enabled"}]'
26+
```
27+
28+
The Storage state will show that it is enabled.
29+
30+
```console
31+
kubectl get storages
32+
NAME STATE STATUS MODE AGE
33+
kind-worker2 Enabled Ready Live 10m
34+
kind-worker3 Enabled Ready Live 10m
35+
```
36+
37+
## Draining a node
238

339
The NNF software consists of a collection of DaemonSets and Deployments. The pods
440
on the Rabbit nodes are usually from DaemonSets. Because of this, the `kubectl drain`
@@ -9,7 +45,11 @@ Given the limitations of DaemonSets, the NNF software will be drained by using t
945
as described in
1046
[Taints and Tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
1147

12-
## Drain NNF Pods From A Rabbit Node
48+
This would be used only after the WLM jobs have been removed from that Rabbit (preferably) and there is some reason to also remove the NNF software from it. This might be used before a Rabbit is powered off and pulled out of the cabinet, for example, to avoid leaving pods in "Terminating" state (harmless, but it's noise).
49+
50+
If an admin used this taint before power-off it would mean there wouldn't be "Terminating" pods lying around for that Rabbit. After a new/same Rabbit is put back in its place, the NNF software won't jump back on it while the taint is present. The taint can be removed at any time, from immediately after the node is powered off up to some time after the new/same Rabbit is powered back on.
51+
52+
### Drain NNF pods from a rabbit node
1353

1454
Drain the NNF software from a node by applying the `cray.nnf.node.drain` taint.
1555
The CSI driver pods will remain on the node to satisfy any unmount requests from k8s
@@ -19,15 +59,33 @@ as it cleans up the NNF pods.
1959
kubectl taint node $NODE cray.nnf.node.drain=true:NoSchedule cray.nnf.node.drain=true:NoExecute
2060
```
2161

62+
This will cause the node's `Storage` resource to be drained:
63+
64+
```console
65+
$ kubectl get storages
66+
NAME STATE STATUS MODE AGE
67+
kind-worker2 Enabled Drained Live 5m44s
68+
kind-worker3 Enabled Ready Live 5m45s
69+
```
70+
71+
The `Storage` resource will contain the following message indicating the reason it has been drained:
72+
73+
```console
74+
$ kubectl get storages rabbit1 -o json | jq -rM .status.message
75+
Kubernetes node is tainted with cray.nnf.node.drain
76+
```
77+
2278
To restore the node to service, remove the `cray.nnf.node.drain` taint.
2379

2480
```shell
2581
kubectl taint node $NODE cray.nnf.node.drain-
2682
```
2783

28-
## The CSI Driver
84+
The `Storage` resource will revert to a `Ready` status.
85+
86+
### The CSI driver
2987

30-
While the CSI driver pods may be drained from a Rabbit node, it is advisable not to do so.
88+
While the CSI driver pods may be drained from a Rabbit node, it is inadvisable to do so.
3189

3290
**Warning** K8s relies on the CSI driver to unmount any filesystems that may have
3391
been mounted into a pod's namespace. If it is not present when k8s is attempting

0 commit comments

Comments
 (0)