-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathdevel.note
More file actions
39 lines (30 loc) · 1.85 KB
/
devel.note
File metadata and controls
39 lines (30 loc) · 1.85 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
What are the scattered jobs.config, lab.config and system.config for
o lab.config contains user-provided params of the cluster. User can decide on the SRIOV NIC or MTU etc.
o jobs.config contains user-defined job params
o system.config. For devel/internal use to store cluster's learned characteristics.
How do we generate USEABLE_CPUS and ALLOCATABLE_CPUS. See REG_ROOT/lab-analyzer
REG_ROOT/lab-analyzer pulls the worker to get num_cpus and substract 4 off. This leanred param is stored in GEN_LAB_ENV
How do we know cluster type i.e SNO, 3-node compact, standard
See REG_ROOT/lab-analyzer learns the cluster and stores info in GEN_LAB_ENV
Where do we set MCP name, "mcp-regulus-vf" for standard cluster. SNO and compact cluster use "master"
See ./SRIOV-config/config.env
In SNO and 3-node compact, we MUST use mcp "master" b/c we cannot move nodes out of mcp "master"
The bastion expand.sh generates the final mcp in setting.env, and ue it.
The crucible controller hacks system.config and override with "master" when cluster-type != STANDARD
Example errors during development:
==================================
1. "Failed to create pod client-6"
Reason: Pod annotations and/or resources are not met. Pod spec syntax error.
Debug: endpoint/k8s/create-pod-output-*
Fix: reg_expand.sh generated incorrect objects
2. "An IPV4 address could not be found for net1 in addr_info:"
Reason: netwwork-attachment-definition issue. There is syntax diff between before and after 4.14
Debug: look at the sample-1-fail-1/server/1/*server-stderrout.txt
Fix: somehow the SRIOV-config/install.sh chooses the wrong version
2. "Failed to verify pods are running "
Reason: 1 Insufficient cpu,
Debug: lab-analyzer ran and generated USEABLE_CPUS=ALLOCATABLE-4. However after PAO and SRIOV config, USEABLE is no true anymore.
Fix: "make clean-lab" , "make init-lab"
2.
Reaseon:
Debug: