Skip to content

panic in nnfnodeblockstorage #245

@morrone

Description

@morrone

I was just working through making an updated version of our kubernetes config files to support k8s 1.32. In the process I have an nnf container that is panicing over and over.

FYI, I didn't fully complete the install process, because I was more interested in the time in making k8s 1.32 work. I did everything up through connecting argocd, but I did not do the "Create a Flux RBAC credential for flux-coral2-dws service", "Create a Viewer RBAC credential", or "Rabbit-based Lustre MGT Server Pool" steps. Just FYI in case any of that helps explain why it is panicing.


2025-03-04T15:30:16.065565568-08:00 stderr F 2025-03-04T15:30:16.065-0800       INFO    ec.nnf  Storage Service Enabled {"eventId": "54", "eventMessage": "The fabric '%1' is ready", "eventArgs": ["Rabbit"], "health": "OK"}
2025-03-04T15:30:16.065626166-08:00 stderr F 2025-03-04T15:30:16.065-0800       DEBUG   NnfNodeECData   Command Run     {"State": "Start", "command": "zpool import -a"}
2025-03-04T15:30:16.067672814-08:00 stderr F 2025-03-04T15:30:16.067-0800       INFO    ec      Starting HTTP Server    {"address": ":50057"}
2025-03-04T15:30:16.164878328-08:00 stderr F 2025-03-04T15:30:16.164-0800       DEBUG   controllers.NnfNode     Command Run     {"NnfNode": {"name":"nnf-nlc","namespace":"hetchy202"}, "command": "lctl list_nids"}
2025-03-04T15:30:16.170122957-08:00 stderr F 2025-03-04T15:30:16.170-0800       INFO    NnfNodeECData   Imported all available zpools   {"State": "Start"}
2025-03-04T15:30:16.170122957-08:00 stderr F 2025-03-04T15:30:16.170-0800       INFO    NnfNodeECData   Allow others to start   {"State": "Start"}
2025-03-04T15:30:16.170122957-08:00 stderr F 2025-03-04T15:30:16.170-0800       INFO    controllers.NnfNodeBlockStorage Ready to start  {"State": "Start"}
2025-03-04T15:30:16.170142635-08:00 stderr F 2025-03-04T15:30:16.170-0800       INFO    controllers.NnfNodeStorage      Ready to start  {"State": "Start"}
2025-03-04T15:30:16.170142635-08:00 stderr F 2025-03-04T15:30:16.170-0800       INFO    controllers.NnfClientMount      Ready to start  {"State": "Start"}
2025-03-04T15:30:16.229579164-08:00 stderr F 2025-03-04T15:30:16.229-0800       DEBUG   controllers.NnfNode     Command Run     {"NnfNode": {"name":"nnf-nlc","namespace":"hetchy202"}, "command": "lctl list_nids"}
2025-03-04T15:30:17.098438238-08:00 stderr F 2025-03-04T15:30:17.098-0800       INFO    controllers.NnfNodeBlockStorage Reconciler is awake     {"NnfNodeBlockStorage": {"name":"systemstorage-local-xfs-system-storage-0","namespace":"hetchy202"}}
2025-03-04T15:30:17.526724662-08:00 stderr F 2025-03-04T15:30:17.526-0800       DEBUG   controllers.NnfNodeBlockStorage Command Run     {"NnfNodeBlockStorage": {"name":"systemstorage-local-xfs-system-storage-0","namespace":"hetchy202"}, "command": "nvme list -v --output-format=json"}
2025-03-04T15:30:17.654893304-08:00 stderr F 2025-03-04T15:30:17.654-0800       INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "nnfnodeblockstorage", "controllerGroup": "nnf.cray.hpe.com", "controllerKind": "NnfNodeBlockStorage",
 "NnfNodeBlockStorage": {"name":"systemstorage-local-xfs-system-storage-0","namespace":"hetchy202"}, "namespace": "hetchy202", "name": "systemstorage-local-xfs-system-storage-0", "reconcileID": "2084fb2a-7221-4bab-ad34-33a5e272ec84"}
2025-03-04T15:30:17.657016400-08:00 stderr F panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2025-03-04T15:30:17.657016400-08:00 stderr F    panic: runtime error: invalid memory address or nil pointer dereference
2025-03-04T15:30:17.657026650-08:00 stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x1700e2e]
2025-03-04T15:30:17.657026650-08:00 stderr F 
2025-03-04T15:30:17.657026650-08:00 stderr P goroutine 719 [
2025-03-04T15:30:17.657034956-08:00 stderr F running]:
2025-03-04T15:30:17.657041529-08:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2025-03-04T15:30:17.657047992-08:00 stderr F    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1e5
2025-03-04T15:30:17.657054484-08:00 stderr P panic({0x1aaf740?, 0x2fd1d70
2025-03-04T15:30:17.657060927-08:00 stderr F ?})
2025-03-04T15:30:17.657060927-08:00 stderr P    /usr/local/go/src/runtime/panic.go:914 +0x21f
2025-03-04T15:30:17.657067720-08:00 stderr F 
2025-03-04T15:30:17.657074162-08:00 stderr F github.com/NearNodeFlash/nnf-ec/pkg/manager-server.(*Storage).GetStatus(...)
2025-03-04T15:30:17.657074162-08:00 stderr P    /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-server/storage.go:
2025-03-04T15:30:17.657081106-08:00 stderr F 68
2025-03-04T15:30:17.657081106-08:00 stderr F github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*StorageGroup).status(0xc00171f8b0?)
2025-03-04T15:30:17.657087909-08:00 stderr F    /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/storage_group.go:68 +0x2e
2025-03-04T15:30:17.657094692-08:00 stderr P github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*StorageService).StorageServiceIdStorageGroupIdGet(0xc000819400, {0xc000aa5428?, 
2025-03-04T15:30:17.657102006-08:00 stderr P 0x0?}, {0xc0007d59b0, 0x2c}, 
2025-03-04T15:30:17.657110112-08:00 stderr F 0xc00151b200)
2025-03-04T15:30:17.657110112-08:00 stderr F    /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/manager.go:1103 +0x585
2025-03-04T15:30:17.657119089-08:00 stderr P github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf.(*AerService).StorageServiceIdStorageGroupIdGet(0xc00091fed0, {0xc000aa5428?, 0x3?}, 
2025-03-04T15:30:17.657127035-08:00 stderr F {0xc0007d59b0?, 0x2c?}, 0xc0007d59b0?)
2025-03-04T15:30:17.657135361-08:00 stderr F    /workspace/vendor/github.com/NearNodeFlash/nnf-ec/pkg/manager-nnf/aer.go:129 +0x30
2025-03-04T15:30:17.657135361-08:00 stderr P github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).getStorageGroup(0x1d4b482
2025-03-04T15:30:17.657144008-08:00 stderr P ?, {0x2067468, 0xc00091fed0}, {0xc0007d59b0, 0x2c
2025-03-04T15:30:17.657152083-08:00 stderr F })
2025-03-04T15:30:17.657152083-08:00 stderr F    /workspace/internal/controller/nnf_node_block_storage_controller.go:623 +0x79
2025-03-04T15:30:17.657160189-08:00 stderr P github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).createBlockDevice(0xc0006d4500, {0x2048138, 0xc000a142d0}, 
2025-03-04T15:30:17.657168145-08:00 stderr F 0xc00067c000, 0x0)
2025-03-04T15:30:17.657168145-08:00 stderr F    /workspace/internal/controller/nnf_node_block_storage_controller.go:408 +0x94d
2025-03-04T15:30:17.657197632-08:00 stderr F github.com/NearNodeFlash/nnf-sos/internal/controller.(*NnfNodeBlockStorageReconciler).Reconcile(0xc0006d4500, {0x2048138?, 0xc000a142d0}, {{{0xc000d10a96, 0x9}, {0xc000914c60, 0x28}}})
2025-03-04T15:30:17.657197632-08:00 stderr P    /workspace/internal/controller/nnf_node_block_storage_controller.go
2025-03-04T15:30:17.657209465-08:00 stderr F :249 +0xdbc
2025-03-04T15:30:17.657209465-08:00 stderr P sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x204b470?, {0x2048138?, 0xc000a142d0
2025-03-04T15:30:17.657217921-08:00 stderr P ?}, {{{0xc000d10a96?, 0xb?}
2025-03-04T15:30:17.657225856-08:00 stderr F , {0xc000914c60?, 0x0?}}})
2025-03-04T15:30:17.657233942-08:00 stderr F    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7
2025-03-04T15:30:17.657233942-08:00 stderr P sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(
2025-03-04T15:30:17.657242328-08:00 stderr P 0xc00069f7c0, {0x2048170, 0xc0006d3040}, {0x1b510a0?
2025-03-04T15:30:17.657250554-08:00 stderr F , 0xc0007b2920?})
2025-03-04T15:30:17.657250554-08:00 stderr P    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc
2025-03-04T15:30:17.657258830-08:00 stderr F 
2025-03-04T15:30:17.657258830-08:00 stderr P sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00069f7c0, {0x2048170, 0xc0006d3040}
2025-03-04T15:30:17.657267227-08:00 stderr F )
2025-03-04T15:30:17.657267227-08:00 stderr F    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1c9
2025-03-04T15:30:17.657267227-08:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
2025-03-04T15:30:17.657276234-08:00 stderr F    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79
2025-03-04T15:30:17.657276234-08:00 stderr P created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 242
2025-03-04T15:30:17.657284550-08:00 stderr F 
2025-03-04T15:30:17.657284550-08:00 stderr F    /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    📋 Open

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions