-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ECMP for unsafe_routes #1332
Conversation
Thanks for the contribution! Before we can merge this, we need @dioss-Machiel to sign the Salesforce Inc. Contributor License Agreement. |
|
Hi @dioss-Machiel, this looks very interesting. One note:
It does look like this PR may offer effective load balancing between gateways, but I'm not sure if it provides redundancy. If a given gateway is down, will this code detect it, and route on the other gateway instead? |
This implementation does indeed provide redundancy in the following network setup: How this technically works: For example: if you have three gateways with weight "1" then each of them handles 1/3 of the traffic. If you only have two gateways then each of them handles 1/2 of the traffic, if one goes down the other gateway will handle all the traffic. This is the link to the relevant code in this PR: (inside.go, line 187) A more advanced implementation could recalculate the buckets when nodes go up or down to keep balancing the traffic on the remaining gateways but that comes with more complexity and maybe a slight penalty to routing speed. |
@dioss-Machiel we merged a pretty large change to support ipv6 on the overlay, mind rebasing to clear up the conflicts? |
Still always use the first route found, this should not change any routing behaviour in nebula.
Prefer first route found, if gatway unavailable then keep trying untill all options are exhausted.
WIP Multipath is working but routing table updates are still broken
8e46695
to
d987d75
Compare
@nbrownus Conflicts have been resolved |
d987d75
to
0cc2f98
Compare
I fixed the testify errors |
…teways are reachable
Did a quick benchmark,
This benchmark includes 2 other xxhash implementations but they have slightly worse distribution while being much faster than xxh3. I included As a reference point, the firewall takes ~40ns/op on my machine to pass a packet on a group or name match. Worst case performance in the firewall is ~95ns/op to evaluate and fail a full rule evaluation. If we went with
Which doesn't seem too terrible to me. We can take this a bit further as well by removing the local and remote addr, If we can drop down to just evaluating the local and remote ports then we can use a much faster hash described here (taken from #768) which has really nice distribution (does not require modifying your tests) and incredible speed, 0.3117 ns/op on my machine.
|
I'm not married to the hashing algorithm, I did some quick research and other L4 balancing implementations also only hash by source and destination port. Using the hash-prospector implementation looks good to me. |
This algorithm has better performance and we can remove some dependencies
This PR implements ECMP support in Nebula. The implementation uses hash-threshold mapping (like in the Linux kernel) which allows you to define weights per gateway.
This can be used for example to aggregate multiple links and provide redundancy to an external location where nebula cannot be installed.
The change is backwards compatible and should not impact normal operation.
ECMP routes can be defined in the config file (example below) and also via the use_system_route_table feature, which means that on Linux you can use a BGP daemon to add / remove multipath routes.
Example config:
Implementation note: if the gateway where the packet should be routed to is not reachable the first available gateway is chosen, so when one of the gateways is down the traffic is no longer properly being balanced, but there is still connectivity.