-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
I was just working through making an updated version of our kubernetes config files to support k8s 1.32. In the process I have an nnf container that is panicing over and over.
FYI, I didn't fully complete the install process, because I was more interested in the time in making k8s 1.32 work. I did everything up through connecting argocd, but I did not do the "Create a Flux RBAC credential for flux-coral2-dws service", "Create a Viewer RBAC credential", or "Rabbit-based Lustre MGT Server Pool" steps. Just FYI in case any of that helps explain why it is panicing.
2025-03-04T15:30:16.065565568-08:00 stderr F 2025-03-04T15:30:16.065-0800 INFO ec.nnf Storage Service Enabled {"eventId": "54", "eventMessage": "The fabric '%1' is ready", "eventArgs": ["Rabbit"], "health": "OK"}
2025-03-04T15:30:16.065626166-08:00 stderr F 2025-03-04T15:30:16.065-0800 DEBUG NnfNodeECData Command Run {"State": "Start", "command": "zpool import -a"}
2025-03-04T15:30:16.067672814-08:00 stderr F 2025-03-04T15:30:16.067-0800 INFO ec Starting HTTP Server {"address": ":50057"}
2025-03-04T15:30:16.164878328-08:00 stderr F 2025-03-04T15:30:16.164-0800 DEBUG controllers.NnfNode Command Run {"NnfNode": {"name":"nnf-nlc","namespace":"hetchy202"}, "command": "lctl list_nids"}
2025-03-04T15:30:16.170122957-08:00 stderr F 2025-03-04T15:30:16.170-0800 INFO NnfNodeECData Imported all available zpools {"State": "Start"}
2025-03-04T15:30:16.170122957-08:00 stderr F 2025-03-04T15:30:16.170-0800 INFO NnfNodeECData Allow others to start {"State": "Start"}
2025-03-04T15:30:16.170122957-08:00 stderr F 2025-03-04T15:30:16.170-0800 INFO controllers.NnfNodeBlockStorage Ready to start {"State": "Start"}
2025-03-04T15:30:16.170142635-08:00 stderr F 2025-03-04T15:30:16.170-0800 INFO controllers.NnfNodeStorage Ready to start {"State": "Start"}
2025-03-04T15:30:16.170142635-08:00 stderr F 2025-03-04T15:30:16.170-0800 INFO controllers.NnfClientMount Ready to start {"State": "Start"}
2025-03-04T15:30:16.229579164-08:00 stderr F 2025-03-04T15:30:16.229-0800 DEBUG controllers.NnfNode Command Run {"NnfNode": {"name":"nnf-nlc","namespace":"hetchy202"}, "command": "lctl list_nids"}
2025-03-04T15:30:17.098438238-08:00 stderr F 2025-03-04T15:30:17.098-0800 INFO controllers.NnfNodeBlockStorage Reconciler is awake {"NnfNodeBlockStorage": {"name":"systemstorage-local-xfs-system-storage-0","namespace":"hetchy202"}}
2025-03-04T15:30:17.526724662-08:00 stderr F 2025-03-04T15:30:17.526-0800 DEBUG controllers.NnfNodeBlockStorage Command Run {"NnfNodeBlockStorage": {"name":"systemstorage-local-xfs-system-storage-0","namespace":"hetchy202"}, "command": "nvme list -v --output-format=json"}
2025-03-04T15:30:17.654893304-08:00 stderr F 2025-03-04T15:30:17.654-0800 INFO Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {"controller": "nnfnodeblockstorage", "controllerGroup": "nnf.cray.hpe.com", "controllerKind": "NnfNodeBlockStorage",
"NnfNodeBlockStorage": {"name":"systemstorage-local-xfs-system-storage-0","namespace":"hetchy202"}, "namespace": "hetchy202", "name": "systemstorage-local-xfs-system-storage-0", "reconcileID": "2084fb2a-7221-4bab-ad34-33a5e272ec84"}
2025-03-04T15:30:17.657016400-08:00 stderr F panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2025-03-04T15:30:17.657016400-08:00 stderr F panic: runtime error: invalid memory address or nil pointer dereference
2025-03-04T15:30:17.657026650-08:00 stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x1700e2e]
2025-03-04T15:30:17.657026650-08:00 stderr F
2025-03-04T15:30:17.657026650-08:00 stderr P goroutine 719 [
2025-03-04T15:30:17.657034956-08:00 stderr F running]:
2025-03-04T15:30:17.657041529-08:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2025-03-04T15:30:17.657047992-08:00 stderr F /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1e5
2025-03-04T15:30:17.657054484-08:00 stderr P panic({0x1aaf740?, 0x2fd1d70
2025-03-04T15:30:17.657060927-08:00 stderr F ?})
2025-03-04T15:30:17.657060927-08:00 stderr P /usr/local/go/src/runtime/panic.go:914 +0x21f
2025-03-04T15:30:17.657067720-08:00 stderr F
2025-03-04T15:30:17.657074162-08:00 stderr F github.com/NearNodeFlash/nnf-ec/pkg/manager-server.(*Storage).GetStatus(...)
2025-03-04T15:30:17.657074162-08:00 stderr P /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-server/storage.go:
2025-03-04T15:30:17.657081106-08:00 stderr F 68
2025-03-04T15:30:17.657081106-08:00 stderr F github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*StorageGroup).status(0xc00171f8b0?)
2025-03-04T15:30:17.657087909-08:00 stderr F /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/storage_group.go:68 +0x2e
2025-03-04T15:30:17.657094692-08:00 stderr P github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*StorageService).StorageServiceIdStorageGroupIdGet(0xc000819400, {0xc000aa5428?,
2025-03-04T15:30:17.657102006-08:00 stderr P 0x0?}, {0xc0007d59b0, 0x2c},
2025-03-04T15:30:17.657110112-08:00 stderr F 0xc00151b200)
2025-03-04T15:30:17.657110112-08:00 stderr F /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/manager.go:1103 +0x585
2025-03-04T15:30:17.657119089-08:00 stderr P github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*AerService).StorageServiceIdStorageGroupIdGet(0xc00091fed0, {0xc000aa5428?, 0x3?},
2025-03-04T15:30:17.657127035-08:00 stderr F {0xc0007d59b0?, 0x2c?}, 0xc0007d59b0?)
2025-03-04T15:30:17.657135361-08:00 stderr F /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/aer.go:129 +0x30
2025-03-04T15:30:17.657135361-08:00 stderr P github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).getStorageGroup(0x1d4b482
2025-03-04T15:30:17.657144008-08:00 stderr P ?, {0x2067468, 0xc00091fed0}, {0xc0007d59b0, 0x2c
2025-03-04T15:30:17.657152083-08:00 stderr F })
2025-03-04T15:30:17.657152083-08:00 stderr F /workspace/internal/controller/nnf_node_block_storage_controller.go:623 +0x79
2025-03-04T15:30:17.657160189-08:00 stderr P github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).createBlockDevice(0xc0006d4500, {0x2048138, 0xc000a142d0},
2025-03-04T15:30:17.657168145-08:00 stderr F 0xc00067c000, 0x0)
2025-03-04T15:30:17.657168145-08:00 stderr F /workspace/internal/controller/nnf_node_block_storage_controller.go:408 +0x94d
2025-03-04T15:30:17.657197632-08:00 stderr F github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).Reconcile(0xc0006d4500, {0x2048138?, 0xc000a142d0}, {{{0xc000d10a96, 0x9}, {0xc000914c60, 0x28}}})
2025-03-04T15:30:17.657197632-08:00 stderr P /workspace/internal/controller/nnf_node_block_storage_controller.go
2025-03-04T15:30:17.657209465-08:00 stderr F :249 +0xdbc
2025-03-04T15:30:17.657209465-08:00 stderr P sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x204b470?, {0x2048138?, 0xc000a142d0
2025-03-04T15:30:17.657217921-08:00 stderr P ?}, {{{0xc000d10a96?, 0xb?}
2025-03-04T15:30:17.657225856-08:00 stderr F , {0xc000914c60?, 0x0?}}})
2025-03-04T15:30:17.657233942-08:00 stderr F /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7
2025-03-04T15:30:17.657233942-08:00 stderr P sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(
2025-03-04T15:30:17.657242328-08:00 stderr P 0xc00069f7c0, {0x2048170, 0xc0006d3040}, {0x1b510a0?
2025-03-04T15:30:17.657250554-08:00 stderr F , 0xc0007b2920?})
2025-03-04T15:30:17.657250554-08:00 stderr P /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc
2025-03-04T15:30:17.657258830-08:00 stderr F
2025-03-04T15:30:17.657258830-08:00 stderr P sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00069f7c0, {0x2048170, 0xc0006d3040}
2025-03-04T15:30:17.657267227-08:00 stderr F )
2025-03-04T15:30:17.657267227-08:00 stderr F /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1c9
2025-03-04T15:30:17.657267227-08:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
2025-03-04T15:30:17.657276234-08:00 stderr F /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79
2025-03-04T15:30:17.657276234-08:00 stderr P created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 242
2025-03-04T15:30:17.657284550-08:00 stderr F
2025-03-04T15:30:17.657284550-08:00 stderr F /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
📋 Open