Skip to content

n-acd: runtime eBPF support detection #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jvaclav-rh
Copy link

Currently, eBPF support is toggled by a compile-time flag based on whether we expect the system to support it or not.

This PR makes it so that n-acd always attempts to use eBPF, and gracefully handles failure if it isn't supported (e.g. due to lacking capabilities).

This change is desirable for programs that would like to use n-acd with eBPF as often as possible, but expect that eBPF setup may fail (and are OK with falling back).

@dvdhrm
Copy link
Member

dvdhrm commented Sep 10, 2024

Hi, thanks for the suggestion! Can you explain who would use it? Distros know their kernel versions, so they can easily require a kernel with eBPF. Who would build n-acd without knowing their kernel version?

@jvaclav-rh
Copy link
Author

jvaclav-rh commented Sep 11, 2024

Can you explain who would use it? Distros know their kernel versions, so they can easily require a kernel with eBPF. Who would build n-acd without knowing their kernel version?

It is not so much about whether the kernel supports eBPF but also about if the application has capabilities to use it at runtime. For example it would be very useful for NetworkManager if it is e.g. running in a container.

edit: I also just noticed an older pull request (#9) that had some discussion related to this.

@jvaclav-rh
Copy link
Author

@dvdhrm Is there anything I can explain or amend to help move this forward?

@bengal
Copy link
Contributor

bengal commented May 6, 2025

Can you explain who would use it? Distros know their kernel versions, so they can easily require a kernel with eBPF. Who would build n-acd without knowing their kernel version?

It is not so much about whether the kernel supports eBPF but also about if the application has capabilities to use it at runtime. For example it would be very useful for NetworkManager if it is e.g. running in a container.

edit: I also just noticed an older pull request (#9) that had some discussion related to this.

See also: #9 (comment) . If n-acd runs as root in a network namespace where the current unprivileged user is mapped as root, eBPF doesn't work unless /proc/sys/kernel/unprivileged_bpf_disabled is changed to 0. It would be better to automatically fall back to non-BPF operations in such cases.

@@ -22,6 +22,4 @@ dep_crbtree = sub_crbtree.get_variable('libcrbtree_dep')
dep_csiphash = sub_csiphash.get_variable('libcsiphash_dep')
dep_cstdaux = sub_cstdaux.get_variable('libcstdaux_dep')

use_ebpf = get_option('ebpf')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the option, README.md must be updated.

*/
if (n_acd_probe_is_unique(probe)) {
if (probe->acd->fd_bpf_map != -1 && n_acd_probe_is_unique(probe)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be changed as suggested in #9 (comment)

Copy link
Author

@jvaclav-rh jvaclav-rh May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the original comment -- it talks about that if we manage to successfully create the BPF map, all of the IPs would be instantly added. But if we move this ... != 1 condition up, and return early, wouldn't that mean the IP is silently dropped, not added to the RBtree and thus not included in the later created bpf map? Would that be the desired behavior?

Copy link
Contributor

@bengal bengal May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suggestion was more to remove this check and let n_acd_bpf_map_add() return success when the map is negative. In this way, we keep tracking the probe->acd->n_bpf_map value correctly; then, if the map can be created later, it will be re-initialized with the right size in n_acd_ensure_bpf_map_space() . @dvdhrm ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@jvaclav-rh jvaclav-rh force-pushed the runtime-ebpf branch 3 times, most recently from fe48f3e to f8a8384 Compare May 12, 2025 08:47
@jvaclav-rh
Copy link
Author

@dvdhrm I am going to try and continue working on this, but before doing anything I wanted to ask which approach makes more sense to you. If we support switching to eBPF at runtime, we have two options:

  • Once we start using eBPF, we assume it's always going to be available. If it's not used, we can start using it later. If we encounter an error with eBPF after it's already set up, ACD fails.
  • We fully support "silently" switching between BPF<->eBPF, if setting up eBPF fails we fall back to BPF. This would be more complex, but more robust.

What do you think is better/makes more sense?

@dvdhrm
Copy link
Member

dvdhrm commented May 19, 2025

I would prefer to create the map once, and only ever re-create it if it already exists. Hence, we never end up using ebpf later on, if it failed initially.

However, I still think it is wise to keep any counters up to date, even if we do not end up using them. It is much less confusing to see correct counters in debug-dumps, than random values. And it makes it a lot less likely to trip if we use those counters for other purposes later on.

Currently, eBPF support is toggled by a compile-time flag
based on whether we expect the system to support it or not.

Make it so that n-acd always attempts to use eBPF, and gracefully
handles failure if it isn't supported (e.g. due to lacking capabilities).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants