in-kernel PM: use endpoints only to create subflows with received `ADD_ADDR` #503

matttbe · 2024-06-18T17:25:22Z

Currently, upon the reception of an ADD_ADDR (and when the fullmesh flag is not used), the in-kernel PM will create new subflows using the local address the routing configuration will pick.

It sounds interesting to pick local addresses from a selected list of endpoint, and use it only once.

Use case: both the client (C) and the server (S) have two addresses (a and b). The client establishes the connection between C(a) and S(a). Once established, the server announces its additional address S(b). Once received, the client connects to it using its second address C(b). The client didn't use this address C(b) to establish a subflow with the server's primary address S(a).

 C        S
C(a) --- S(a)
C(b) --- S(b)

In case of a 3rd address on each side (C(c) and S(c)), upon the reception of an ADD_ADDR with S(c), the client should not pick C(b) because it has already been used. C(c) should then be used.

Note that this situation is currently possible if C doesn't add any endpoint, but configure the routing in order to pick C(b) for the route to S(b), and pick C(c) for the route to S(c). That doesn't sound very practical.

About the configuration, such endpoints could be marked with a new flag (not compatible with fullmesh (and subflow?)), using a good explicit name (TBD).

If at least one endpoint with such flag is used, and they have all been used to create subflows, what should we do in case of a reception of a new ADD_ADDR? Maybe good to fallback to IPADDRANY, and let the routing table finding a new local address: if the goal is to limit only to endpoints with such flag, then ip mptcp limits set add_addr_accepted X could be used, with X being the number of endpoints with such flag.

The text was updated successfully, but these errors were encountered:

Running generic/751 on the for-next branch often results in a hang like below. They are both stack by locking an extent. This suggests someone forget to unlock an extent. INFO: task kworker/u128:1:12 blocked for more than 323 seconds. Not tainted 6.13.0-BTRFS-ZNS+ #503 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u128:1 state:D stack:0 pid:12 tgid:12 ppid:2 flags:0x00004000 Workqueue: btrfs-fixup btrfs_work_helper [btrfs] Call Trace: <TASK> __schedule+0x534/0xdd0 schedule+0x39/0x140 __lock_extent+0x31b/0x380 [btrfs] ? __pfx_autoremove_wake_function+0x10/0x10 btrfs_writepage_fixup_worker+0xf1/0x3a0 [btrfs] btrfs_work_helper+0xff/0x480 [btrfs] ? lock_release+0x178/0x2c0 process_one_work+0x1ee/0x570 ? srso_return_thunk+0x5/0x5f worker_thread+0x1d1/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x10b/0x230 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> INFO: task kworker/u134:0:184 blocked for more than 323 seconds. Not tainted 6.13.0-BTRFS-ZNS+ #503 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u134:0 state:D stack:0 pid:184 tgid:184 ppid:2 flags:0x00004000 Workqueue: writeback wb_workfn (flush-btrfs-4) Call Trace: <TASK> __schedule+0x534/0xdd0 schedule+0x39/0x140 __lock_extent+0x31b/0x380 [btrfs] ? __pfx_autoremove_wake_function+0x10/0x10 find_lock_delalloc_range+0xdb/0x260 [btrfs] writepage_delalloc+0x12f/0x500 [btrfs] ? srso_return_thunk+0x5/0x5f extent_write_cache_pages+0x232/0x840 [btrfs] btrfs_writepages+0x72/0x130 [btrfs] do_writepages+0xe7/0x260 ? srso_return_thunk+0x5/0x5f ? lock_acquire+0xd2/0x300 ? srso_return_thunk+0x5/0x5f ? find_held_lock+0x2b/0x80 ? wbc_attach_and_unlock_inode.part.0+0x102/0x250 ? wbc_attach_and_unlock_inode.part.0+0x102/0x250 __writeback_single_inode+0x5c/0x4b0 writeback_sb_inodes+0x22d/0x550 __writeback_inodes_wb+0x4c/0xe0 wb_writeback+0x2f6/0x3f0 wb_workfn+0x32a/0x510 process_one_work+0x1ee/0x570 ? srso_return_thunk+0x5/0x5f worker_thread+0x1d1/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x10b/0x230 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> This happens because we have another success path for the zoned mode. When there is no active zone available, btrfs_reserve_extent() returns -EAGAIN. In this case, we have two reactions. (1) If the given range is never allocated, we can only wait for someone to finish a zone, so wait on BTRFS_FS_NEED_ZONE_FINISH bit and retry afterward. (2) Or, if some allocations are already done, we must bail out and let the caller to send IOs for the allocation. This is because these IOs may be necessary to finish a zone. The commit 06f3642 ("btrfs: do proper folio cleanup when cow_file_range() failed") moved the unlock code from the inside of the loop to the outside. So, previously, the allocated extents are unlocked just after the allocation and so before returning from the function. However, they are no longer unlocked on the case (2) above. That caused the hang issue. Fix the issue by modifying the 'end' to the end of the allocated range. Then, we can exit the loop and the same unlock code can properly handle the case. Reported-by: Shin'ichiro Kawasaki <[email protected]> Tested-by: Johannes Thumshirn <[email protected]> Fixes: 06f3642 ("btrfs: do proper folio cleanup when cow_file_range() failed") CC: [email protected] Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>

matttbe added the pm path-manager label Jun 18, 2024

matttbe added this to MPTCP Upstream: Future Jun 18, 2024

github-project-automation bot moved this to Needs triage in MPTCP Upstream: Future Jun 18, 2024

matttbe added the enhancement label Jul 18, 2024

majek mentioned this issue Dec 12, 2024

Kernel built-in PM is unreliable #534

Closed

4 tasks

matttbe mentioned this issue Dec 13, 2024

PM: new subflow: avoid selecting initial endpoint upon ADD_ADDR reception if servers deny MPJ to initial address #536

Open

matttbe mentioned this issue Jan 30, 2025

in-kernel PM: limit to one subflow per network interface for each connection #542

Open

matttbe mentioned this issue Mar 22, 2025

Observing significant delays (>30 seconds) on "handover" to subflow #554

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in-kernel PM: use endpoints only to create subflows with received `ADD_ADDR` #503

in-kernel PM: use endpoints only to create subflows with received `ADD_ADDR` #503

matttbe commented Jun 18, 2024

in-kernel PM: use endpoints only to create subflows with received ADD_ADDR #503

in-kernel PM: use endpoints only to create subflows with received ADD_ADDR #503

Comments

matttbe commented Jun 18, 2024

in-kernel PM: use endpoints only to create subflows with received `ADD_ADDR` #503

in-kernel PM: use endpoints only to create subflows with received `ADD_ADDR` #503