Hi all,
First of all, since I have already successfully built and executed everything on a bare-metal environment, I am confident that there are no issues with the source code itself. However, I am concerned that the provided built image (especially on grc, and srsgnb) might have been compiled too strictly (e.g., heavily optimized for specific newer CPU architectures).
1. Environment
Image: ghcr.io/microsoft/jrtc-apps/grc:latest
GNU Radio Version: 3.10.1.1
Host OS: Ubuntu 22.04.5 LTS (Kernel 5.15.0-119-generic)
CPU: Intel(R) Xeon(R) Gold 6148 (Supports AVX-512)
2. The Core Issue
The GRC process hangs indefinitely after printing the following log:
[INFO] Starting flowgraph....
When running volk_profile -v inside the container, it immediately crashes:
command terminated with exit code 139 (Segmentation Fault)
**Our Suspicion: ** This indicates a failure within the VOLK library, likely during SIMD kernel selection/dispatching. We suspect a potential -march=native optimization issue during the image build process. If the image was compiled on a different CPU architecture (e.g., a newer generation Xeon), the specific instructions might be incompatible with our 1st Gen Xeon Gold processor, leading to this crash.
3. Related Binary Issues (gNB Case)
We encountered a similar compatibility issue with the gNB image provided in the Helm chart, where the original binary crashed upon startup.
We were only able to resolve the gNB issue by replacing the container's binary with one built locally on the same host.
This consistent failure across both gNB and GRC images strongly suggests a general binary incompatibility with our environment.
4. Verified Configurations & Troubleshooting
We have ruled out permission and networking issues by verifying the following:
-
Applied yaml file ( privileged: true, hostIPC: true, and seccomp: Unconfined to the pod. )
-
Verified ZMQ connectivity (E2E).
-
Subscriber successfully registered in Open5GS DB.
5. Our Situation and Concerns
We have transitioned to a Kubernetes-based setup following your recommendation to achieve stability. However, encountering these fundamental binary issues at the initial stage is quite unexpected.
While we could technically proceed by manually building and replacing all binaries locally, we are deeply concerned that such a workaround will lead to future compatibility risks and massive maintenance overhead. To ensure a stable and sustainable setup as intended by your project, we believe a fundamental fix at the image level is necessary.
6. Requests
-
Could you verify if these images were built with specific CPU optimizations (e.g., AVX-512 for newer architectures) that might be unstable on certain Xeon architectures?
-
Could you provide a more generic image (e.g., compiled with AVX2 or Generic targets) or advise on a definitive way to resolve these instruction set mismatches?
We really want to avoid relying on local manual builds for long-term maintenance. Looking forward to your thoughts on this.
Hi all,
First of all, since I have already successfully built and executed everything on a bare-metal environment, I am confident that there are no issues with the source code itself. However, I am concerned that the provided built image (especially on grc, and srsgnb) might have been compiled too strictly (e.g., heavily optimized for specific newer CPU architectures).
1. Environment
Image: ghcr.io/microsoft/jrtc-apps/grc:latest
GNU Radio Version: 3.10.1.1
Host OS: Ubuntu 22.04.5 LTS (Kernel 5.15.0-119-generic)
CPU: Intel(R) Xeon(R) Gold 6148 (Supports AVX-512)
2. The Core Issue
The GRC process hangs indefinitely after printing the following log:
When running volk_profile -v inside the container, it immediately crashes:
**Our Suspicion: ** This indicates a failure within the VOLK library, likely during SIMD kernel selection/dispatching. We suspect a potential -march=native optimization issue during the image build process. If the image was compiled on a different CPU architecture (e.g., a newer generation Xeon), the specific instructions might be incompatible with our 1st Gen Xeon Gold processor, leading to this crash.
3. Related Binary Issues (gNB Case)
We encountered a similar compatibility issue with the gNB image provided in the Helm chart, where the original binary crashed upon startup.
We were only able to resolve the gNB issue by replacing the container's binary with one built locally on the same host.
This consistent failure across both gNB and GRC images strongly suggests a general binary incompatibility with our environment.
4. Verified Configurations & Troubleshooting
We have ruled out permission and networking issues by verifying the following:
Applied yaml file ( privileged: true, hostIPC: true, and seccomp: Unconfined to the pod. )
Verified ZMQ connectivity (E2E).
Subscriber successfully registered in Open5GS DB.
5. Our Situation and Concerns
We have transitioned to a Kubernetes-based setup following your recommendation to achieve stability. However, encountering these fundamental binary issues at the initial stage is quite unexpected.
While we could technically proceed by manually building and replacing all binaries locally, we are deeply concerned that such a workaround will lead to future compatibility risks and massive maintenance overhead. To ensure a stable and sustainable setup as intended by your project, we believe a fundamental fix at the image level is necessary.
6. Requests
Could you verify if these images were built with specific CPU optimizations (e.g., AVX-512 for newer architectures) that might be unstable on certain Xeon architectures?
Could you provide a more generic image (e.g., compiled with AVX2 or Generic targets) or advise on a definitive way to resolve these instruction set mismatches?
We really want to avoid relying on local manual builds for long-term maintenance. Looking forward to your thoughts on this.