Cubic needs multiple stabilization steps after Hystart #1830

huitema · 2025-02-03T07:26:31Z

Cubic connections start with an initial 'slow start' phase running Hystart. In theory, Hystart stops after sensing delay increases, and then the connection should continue from a safe point. In practice, we see something else, as shown in this snapshot of the initial connection phase in the qlog traces of a cubic run:

We see three steps down, and we also see a series of packet losses continuing well after the first two steps. In this example, the Hystart process exits after the notification of delay increase, and immediately sets "ssthresh", "W_max" and "W_last_max" to that value. This proves over optimistic, and we see an immediate need to further reduce the transmission window.

The delay increase is not caused by the current value of the CWIN, but rather by the value of CWIN when the last acknowledged packet was sent. The CWIN will increase as acknowledgements are received after this point, but those increase are still "in the pipe line". We should set the safe value to one of:

the number of bytes in flight after the delay-causing acknowledgement packet was sent,
or, the total number of bytes acknowledged since that packet was sent,
or, if we cannot keep track of that, half the current value of the CWIN.

Instead, the apply the coefficient "beta cubic", which may be adequate during congestion avoidance but is not enough to stabilize the connection. This is probably wrong.

There is of course a risk that this is "too drastic", which means we should consider allowing a correction with HyStart++, see issue #1694 and comments from @hfstco.

huitema · 2025-02-03T08:28:47Z

Tried implementing the divide by 2 simplification. Almost all tests pass, but the satellite cubic test completes in 8 seconds instead of 6.1 -- not quite an acceptable performance loss.

huitema · 2025-02-03T20:43:31Z

The first attempt was too simple: we already had code managing the start of long delay connections, already lowering CWIN to a manageable value. We should not divide the window by 2 if that code is already applied. With that fix, all the tests pass. We do not observe performance regression. In fact, avoiding the "rebounds" also avoids packet losses, which improves performance overall. After the changes in PR #1831, we get the following:

The PR makes two changes:

use CWIN/2 as the target upong exit with HyStart
enter recovery upon exit, to avoid reacting to packet losses caused by excess CWIN.

We can see on the traces that the "bounces" are gone, and the connection proceeds normally.

hfstco · 2025-02-04T10:41:19Z

Ok, just want to share my thoughts.

In general congestion feedback usually arrives 1 RTT later.

In the cubic_test test case the buffer is set to 1 BDP. (20ms RTT + 20ms buffer) The first drop highly depends on the buffer size of the path.

...and immediately sets "ssthresh", "W_max" and "W_last_max" to that value. This proves over optimistic...

In cubic_test HyStart triggers at 142ms. Maybe too late?

If we increase the buffer size of cubic_test in the master branch, these two steps will disappear without any changes.

Otherwise, if we decrease the buffer size of cubic_test including the changes of #1831, these steps appear again.

However, in my opinion these steps depend on the buffer size of the path. If the buffer can absorb the overshoot or not.

Instead, the apply the coefficient "beta cubic", which may be adequate during congestion avoidance but is not enough to stabilize the connection. This is probably wrong.

Currently, picoquic reduces the CWND by cubic_beta = 7/8 (W_max), right? RFC 9438 recommends 0.7 for CUBIC. NewReno sets the ssthresh by 0.5 or /2. However, even if we change the cubic_beta these would not avoid the first step. But maybe could avoid the second one.

When I compare the two versions of cubic_test (master vs. fix-cubic-start-bounce), we increase the gap sending more data after HyStart, which allows our buffer to drain. And this gap depends reduction of CWND and on the ACKs received (until bytes_in_flight <= CWND).

huitema mentioned this issue Feb 3, 2025

Fix cubic start bounce #1831

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cubic needs multiple stabilization steps after Hystart #1830

Cubic needs multiple stabilization steps after Hystart #1830

huitema commented Feb 3, 2025

huitema commented Feb 3, 2025

huitema commented Feb 3, 2025 •

edited

Loading

hfstco commented Feb 4, 2025 •

edited

Loading

Cubic needs multiple stabilization steps after Hystart #1830

Cubic needs multiple stabilization steps after Hystart #1830

Comments

huitema commented Feb 3, 2025

huitema commented Feb 3, 2025

huitema commented Feb 3, 2025 • edited Loading

hfstco commented Feb 4, 2025 • edited Loading

huitema commented Feb 3, 2025 •

edited

Loading

hfstco commented Feb 4, 2025 •

edited

Loading