Skip to content

Commit f4a4295

Browse files
committed
Merge: test packetdrill/tcp/cubic/cubic-rack-reo-timeout-retrans-failed-incoming-data.pkt fails on aarch64 using default CONFIG_HZ=100
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6582 JIRA: https://issues.redhat.com/browse/RHEL-83546 ``` commit 1c2709c Author: Neal Cardwell <[email protected]> Date: Sun Oct 15 13:47:00 2023 -0400 tcp: fix excessive TLP and RACK timeouts from HZ rounding We discovered from packet traces of slow loss recovery on kernels with the default HZ=250 setting (and min_rtt < 1ms) that after reordering, when receiving a SACKed sequence range, the RACK reordering timer was firing after about 16ms rather than the desired value of roughly min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the exact same issue. This commit fixes the TLP transmit timer and RACK reordering timer floor calculation to more closely match the intended 2ms floor even on kernels with HZ=250. It does this by adding in a new TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies, instead of the current approach of converting to jiffies and then adding th TCP_TIMEOUT_MIN value of 2 jiffies. Our testing has verified that on kernels with HZ=1000, as expected, this does not produce significant changes in behavior, but on kernels with the default HZ=250 the latency improvement can be large. For example, our tests show that for HZ=250 kernels at low RTTs this fix roughly halves the latency for the RACK reorder timer: instead of mostly firing at 16ms it mostly fires at 8ms. Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Fixes: bb4d991 ("tcp: adjust tail loss probe timeout") Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>``` Signed-off-by: CKI Backport Bot <[email protected]> --- <small>Created 2025-03-14 13:21 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small> Approved-by: Davide Caratti <[email protected]> Approved-by: Alessandro Carminati <[email protected]> Approved-by: CKI KWF Bot <[email protected]> Merged-by: Augusto Caringi <[email protected]>
2 parents ba64e51 + e057f6c commit f4a4295

File tree

3 files changed

+9
-5
lines changed

3 files changed

+9
-5
lines changed

include/net/tcp.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,9 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
143143
#define TCP_RTO_MAX ((unsigned)(120*HZ))
144144
#define TCP_RTO_MIN ((unsigned)(HZ/5))
145145
#define TCP_TIMEOUT_MIN (2U) /* Min timeout for TCP timers in jiffies */
146+
147+
#define TCP_TIMEOUT_MIN_US (2*USEC_PER_MSEC) /* Min TCP timeout in microsecs */
148+
146149
#define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* RFC6298 2.1 initial RTO value */
147150
#define TCP_TIMEOUT_FALLBACK ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value, now
148151
* used as a fallback RTO for the

net/ipv4/tcp_output.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2746,7 +2746,7 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto)
27462746
{
27472747
struct inet_connection_sock *icsk = inet_csk(sk);
27482748
struct tcp_sock *tp = tcp_sk(sk);
2749-
u32 timeout, rto_delta_us;
2749+
u32 timeout, timeout_us, rto_delta_us;
27502750
int early_retrans;
27512751

27522752
/* Don't do any loss probe on a Fast Open connection before 3WHS
@@ -2770,11 +2770,12 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto)
27702770
* sample is available then probe after TCP_TIMEOUT_INIT.
27712771
*/
27722772
if (tp->srtt_us) {
2773-
timeout = usecs_to_jiffies(tp->srtt_us >> 2);
2773+
timeout_us = tp->srtt_us >> 2;
27742774
if (tp->packets_out == 1)
2775-
timeout += TCP_RTO_MIN;
2775+
timeout_us += tcp_rto_min_us(sk);
27762776
else
2777-
timeout += TCP_TIMEOUT_MIN;
2777+
timeout_us += TCP_TIMEOUT_MIN_US;
2778+
timeout = usecs_to_jiffies(timeout_us);
27782779
} else {
27792780
timeout = TCP_TIMEOUT_INIT;
27802781
}

net/ipv4/tcp_recovery.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ bool tcp_rack_mark_lost(struct sock *sk)
109109
tp->rack.advanced = 0;
110110
tcp_rack_detect_loss(sk, &timeout);
111111
if (timeout) {
112-
timeout = usecs_to_jiffies(timeout) + TCP_TIMEOUT_MIN;
112+
timeout = usecs_to_jiffies(timeout + TCP_TIMEOUT_MIN_US);
113113
inet_csk_reset_xmit_timer(sk, ICSK_TIME_REO_TIMEOUT,
114114
timeout, inet_csk(sk)->icsk_rto);
115115
}

0 commit comments

Comments
 (0)