Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure kernel sends GARPs to avoid communication failures #4649

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

cyclinder
Copy link
Collaborator

@cyclinder cyclinder commented Feb 13, 2025

Thanks for contributing!

Notice:

What issue(s) does this PR fix:

Fixes #4650

Special notes for your reviewer:

正常情况下,IPAM 分配 IP 给 CNI, CNI 将 IP 地址配置到网卡之后,最后设置状态为 UP。当网卡的链路地址改变或状态改变时,内核会发送免费 ARP,宣告其地址。

#4588 为了修复检测失败时因为错误的 ARP 缓存表导致通信失败,在 IPAM 执行冲突检测。这会提前将网卡 UP,执行完冲突检测之后,返回 IP 地址给 CNI,CNI 将地址配置到网卡,最 后UP 网卡的时候,网卡已经是 UP 的状态,所以这不会触发内核发送免费 ARP 通告其地址。这会导致当 Pod 以相同的 IP 重启后,与其通信过的其他主机不会主动更新 ARP 缓存表导致通信失败。

此 PR 通过在 IPAM 做完冲突检测之后,将网卡的状态设置为 Down,以便 CNI 可以在 UP 网卡时触发免费ARP。

Normally, IPAM assigns IP to the CNI, and the CNI configures the IP address to the NIC before finally setting the status to UP. the kernel sends a free ARP announcing the address of the NIC when its link address is changed or the status is changed.

#4588 To fix a detection failure when communication fails due to an incorrect ARP cache table, conflict detection is performed at IPAM. This will bring the card up early, and after conflict detection, it will return the IP address to the CNI, which will configure the address to the card, and when the card is finally brought up, it will already be up, so this will not trigger the kernel to send a free ARP announcing its address. This can lead to communication failures when the Pod reboots with the same IP and other hosts it has communicated with do not actively update the ARP cache table.

This PR is accomplished by setting the status of the NIC to Down after IPAM has done conflict detection so that the CNI can trigger the free ARP notification when the NIC is UP.

@cyclinder cyclinder added release/bug cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. labels Feb 13, 2025
Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 60.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 75.24%. Comparing base (62d4e07) to head (e128d86).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
cmd/spiderpool/cmd/command_add.go 60.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4649      +/-   ##
==========================================
- Coverage   75.28%   75.24%   -0.05%     
==========================================
  Files          56       56              
  Lines        6765     6778      +13     
==========================================
+ Hits         5093     5100       +7     
- Misses       1462     1466       +4     
- Partials      210      212       +2     
Flag Coverage Δ
unittests 75.24% <60.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
cmd/spiderpool/cmd/command_add.go 68.25% <60.00%> (-1.07%) ⬇️

@cyclinder cyclinder merged commit f8f971d into spidernet-io:main Feb 14, 2025
53 checks passed
@cyclinder cyclinder deleted the ipam/set_nic_down branch February 14, 2025 07:43
github-actions bot pushed a commit that referenced this pull request Feb 14, 2025
github-actions bot pushed a commit that referenced this pull request Feb 14, 2025
github-actions bot pushed a commit that referenced this pull request Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. release/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pod fail to communication when the pod recreated with same ip
2 participants