You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
netstacklat: Exclude TCP reads for HOL blocked segments
The 'tcp-socket-read' currently reports the latency for the skb
containing the last TCP segment read from the socket. However, this
segment might have been head of line (HOL) blocked by a previous
segment missing. In this case, netstacklat's reported latency will
include HOL blocking periods that is dependent on external
factors (such as network packet loss, and network latency impacts
retransmission time). As netstacklat is primarily intended to identify
issues within the local host (in the network stack or receiving
applications), by default filter out any socket reads were the last
read SKB might have experienced HOL-blocking.
Add the new -y/--include-tcp-hol-delay option to retain the old
behavior of reporting latency for all reads, including those that are
HOL-blocked. This may be useful in some scenarios when you still want
to be aware of latency issues caused by HOL-blocking, even though it
is caused by external components. For example, in a data center
context were you have full control over the network, it may still be
relevant to monitor HOL-based caused by the network.
To exclude HOL-blocked reads, detect if any new ooo-segments have
arrived by checking for differences in the number of ooo-packets in
tcp_sock->rcv_ooopack. If any new ooo-segments have arrived, exclude
the latency sample from the current read and set a limit for the next
safe sequence number to read where the current ooo-packets must have
been passed so segments can no longer be HOL-blocked. If there are
skbs in the ooo-queue, set the limit to the end of the
ooo-queue. Otherwise, set the limit to the current rcv_nxt (as if the
ooo-queue is empty the detected ooo-segments must already have been
merged into the receive queue and rcv_nxt must have advanced past
them). If the read is past the safe sequence limit and no new
ooo-segments have arrived, it's safe to start including the latency
samples again.
For sockets were some ooo-segments have been observed, keep the
ooo-range state in socket storage (BPF_MAP_TYPE_SK_STORAGE). Skip
protecting this state with a spin-lock, as it should only be
concurrently accessed if there are concurrent reads on the same TCP
socket, which is assumed to be very rare as applications attempting
that cannot know which part of the data each of their concurrent reads
will get.
There are some scenarios that may cause this ooo-filtering to fail.
- If multiple reads are done to the socket concurrently, we may not
correctly track the last read byte. The kernel does not keep a lock
on the TCP socket at the time our hooked function
tcp_recv_timestamp() runs. If two reads are done in parallel, it's
therefore possible that for both reads we will check the last read
byte (tcp_sock.copied_seq) after the second read has updated it. We
may then incorrectly conclude that the first read was ahead of the
ooo-range when it was not, and record its latency when we should
have excluded it. In practice I belive this issue should be quite
rare, as most applications will probably not attempt to perform
multiple concurrent reads to a single connected TCP socket in
parallel (as then you cannot know which part of the payload the
parallel reads will return).
- As tcp_recv_timestamp() runs outside of the socket lock, the various
state members we access may concurrently be updated as we're
attempting to read them. An especially problematic one is
tcp_sock.ooo_last_skb, which keeps a pointer to an SKB that is only
valid while the ooo-queue is non-empty. It is possible that between
our check for if the ooo-queue is non-empty and following the
ooo_last_skb pointer, the ooo-queue is cleared and the ooo_last_skb
pointer may end up pointing towards a freed SKB. If the socket
members we access are updated before or while we read them, it can
break the filtering in numerous ways, e.g. result in includes
samples that should have been excluded (due to e.g. copied_seq being
updated before our read) or excluding a large amount of valid
samples (due to e.g. setting a sequence limit based on garbage in a
freed SKB).
Signed-off-by: Simon Sundberg <[email protected]>
0 commit comments