Enable traffic monitoring to debug flaky selftests #168

ameryhung · 2025-02-03T16:30:46Z

Traffic monitoring was add to the selftests/bpf[0] to debug some flaky test cases. Enable this in CI by building selftests with libpcap, installing libpcap in the test environment and finally turn it on when running the test.

[0] https://lore.kernel.org/bpf/[email protected]/

theihor · 2025-02-04T22:37:48Z

@ameryhung If I understand correctly, the traffic monitoring logs are printed to files.

In github UI only stdout is going to be visible, the logs stored in files will be lost by default. So on CI you may want to either dump the logs to stdout, or collect them into a separate artifact.

ameryhung · 2025-02-04T23:08:44Z

@theihor

I will change to upload the log file as well. Originally, I thought "-m" also prints to stdout. However, it does not include the payload. Having it should be more helpful for debugging.

theihor · 2025-02-04T23:32:52Z

You can ignore GCC BPF tests failure. They are disabled at kernel-patches/bpf, I'll disable them here too shortly.

theihor · 2025-02-05T18:51:34Z

I just noticed there is an older similar PR already: #130

There is a piece of yaml uploading logs as artifacts there.

Re stdout vs artifacts, consider the verbosity of the monitoring log. If it's lots of lines per test, then I'd go with the artifacts.

ameryhung · 2025-02-05T19:56:47Z

I think uploading tmon logs as artifacts makes sense. I will update the PR. Thanks!

run-vmtest/run-bpf-selftests.sh

theihor · 2025-02-05T20:47:52Z

run-vmtest/action.yml

+    - uses: actions/upload-artifact@v4
+      with:
+        name: tmon-logs-${{ inputs.arch }}-${{ inputs.toolchain_full }}${{ inputs.kernel-test }}
+        if-no-files-found: error
+        path: "/tmp/tmon_pcap/*"


Managing artifacts is really a job-level task IMO. And we should avoid new action inputs if possible.

I think in this case, it makes sense to add an always() (link) step in kernel-test workflow after run-vmtest call step. toolchain_full string is directly available there, since the build artifacts are identified by it. Also, with always() condition, logs will be uploaded on failure, which is what we want.

Later we may want to add other log files to the step (but don't worry about that in this PR).

theihor

Great PR, thanks!

theihor · 2025-02-07T17:52:03Z

run-vmtest/run-bpf-selftests.sh

@@ -43,7 +43,7 @@ test_progs_helper() {
  # "&& true" does not change the return code (it is not executed
  # if the Python script fails), but it prevents exiting on a
  # failure due to the "set -e".
-  ./${selftest} ${args} --json-summary "${json_file}" && true
+  ./${selftest} ${args} -m '*' --json-summary "${json_file}" && true


nit: adding -m to args was ok too

Do you mean what I had before (i.e., "args+=" -m ''")? That didn't work because the single quote inside the double quote was not evaluated and removed so test_progs would see '' instead of *. I can replace the args with arrays.

local args=("$2")

args+=("-m")
args+=("*")

./test_progs "${args[@]}"

That didn't work because the single quote inside the double quote was not evaluated and removed so test_progs would see '' instead of *.

Hmm... that should work.

Consider this:

args+=" ${w:+-w$w}" args+=" -m '*'" echo ./foo "$args" bar

$ ./x.sh ./foo -m '*' bar $ w=123 ./x.sh ./foo -w123 -m '*' bar

Anyways, this is a nit thing, don't spend too much time on it.

danobi · 2025-02-07T22:06:44Z

IMHO uploading the artifact is not wrong and is generally useful. But it's probably a bit more useful to get the output from stdout/stderr. We already have infra in KPD that slurps failed selftest output and emails it out to contributors. Would be nice to leverage that.

If it's just a matter of improving the selftest infra in kernel tree then maybe @mlau2 can help.

theihor · 2025-02-07T23:09:09Z

IMHO uploading the artifact is not wrong and is generally useful. But it's probably a bit more useful to get the output from stdout/stderr. We already have infra in KPD that slurps failed selftest output and emails it out to contributors. Would be nice to leverage that.

@danobi dumping everything to stdout/stderr is ok up to a certain point. An extreme bad example is selftests/sched_ext which prints megabytes of log to stdout. This makes github ui to freeze, and is generally not convenient to work with.

The users of these logs should consider its verbosity, format and situations when it is actually needed. I assumed traffic monitoring might be pretty verbose, so uploading as separate artifacts made sense to me.

I just downloaded an artifact from a successfull test run, and it appears that the "logs" are actually some binary data (@ameryhung I don't really understand what it is, so if that's not right lmk):

$ cat packets-105-94-select_reuseport__sockhash_IPv4_TCP_LOOPBACK_test_err_sk_select-select_reuseport.log
�ò�oN�g��	XED��@@K���؅��� ����8��
�IH� oN�g��	<E(@@<�؅�����)P�h

$ xxd packets-105-94-select_reuseport__sockhash_IPv4_TCP_LOOPBACK_test_err_sk_select-select_reuseport.log
00000000: d4c3 b2a1 0200 0400 0000 0000 0000 0000  ................
00000010: 0004 0000 1401 0000 6f4e a667 8cd4 0900  ........oN.g....
00000020: 5800 0000 5800 0000 0800 0000 0000 0001  X...X...........
00000030: 0304 0006 0000 0000 0000 0000 4500 0044  ............E..D
00000040: f0c0 4000 4006 4bf1 7f00 0001 7f00 0001  ..@[email protected].........
00000050: 86c9 d885 dfec df20 0000 0000 a002 ffd7  ....... ........
00000060: fe38 0000 0204 ffd7 0402 080a 9249 48e6  .8...........IH.
00000070: 0000 0000 0103 0307 2000 0000 0000 0000  ........ .......
00000080: 6f4e a667 b0d4 0900 3c00 0000 3c00 0000  oN.g....<...<...
00000090: 0800 0000 0000 0001 0304 0006 0000 0000  ................
000000a0: 0000 0000 4500 0028 0000 4000 4006 3cce  ....E..(..@.@.<.
000000b0: 7f00 0001 7f00 0001 d885 86c9 0000 0000  ................
000000c0: dfec df29 5014 0000 9368 0000            ...)P....h..

If this is expected, then there is just no point in printing it to stdout: it's unreadable.

ameryhung · 2025-02-07T23:19:10Z

@theihor The logs indeed are binary data and can be inspected using tcpdump. E.g., tcpdump -X -r <packet_log> will show packets with their payloads

theihor · 2025-02-07T23:30:20Z

@theihor The logs indeed are binary data and can be inspected using tcpdump. E.g., tcpdump -X -r <packet_log> will show packets with their payloads

@ameryhung thank you, makes sense. Well it's 1.1Mb of the logs in total right now, and tcpdump-ing them is gonna be lots of lines. So let's use the artifacts.

We only need to make sure that relevant people are aware that this data is available.

Convert args to array so that later we can pass "-m *" to test_progs through args. Currently, with args being string, it is not possible. Doing args+="-m *" will result in the wildcard being expanded. Doing args+="-m '*'" will result in single quoted wildcard being passed to test_progs. Signed-off-by: Amery Hung <[email protected]>

Traffic monitoring was add to the selftests/bpf[0] to debug some flaky test cases. Enable this in CI by building selftests with libpcap, installing libpcap in the test environment and finally turn it on when running the test. [0] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Amery Hung <[email protected]>

After enabling traffic monitor in bpf selftests, upload the log files as artifacts. Signed-off-by: Amery Hung <[email protected]>

iamkafai · 2025-02-08T00:23:00Z

I just know I have a mlau2 account :)

In reply to @danobi about stdout/stderr, it does log to stdout/stderr (can't remember which one). I believe it was unnoticed because the test has passed. There is no details output by default for successful test.

Using "test_progs -v" should show it. Like:

#> ./test_progs -v -m '*' -t tc_redirect/tc_redirect_dtime
test_inet_dtime:PASS:setns src 0 nsec
src Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
src_fwd In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
dst In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
dst_fwd Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
dst Out IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
src In IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
dst In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
dst_fwd In IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
test_inet_dtime:PASS:connect_to_fd 0 nsec
src Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
src_fwd Out IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
test_inet_dtime:PASS:accept 0 nsec
src_fwd In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
src Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 68, ACK
test_inet_dtime:PASS:send to server 0 nsec
dst In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 68, ACK
dst_fwd Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
__rcv_tstamp:PASS:recvmsg 0 nsec

danobi · 2025-02-08T22:54:24Z

@danobi dumping everything to stdout/stderr is ok up to a certain point. An extreme bad example is selftests/sched_ext which prints megabytes of log to stdout. This makes github ui to freeze, and is generally not convenient to work with.

The users of these logs should consider its verbosity, format and situations when it is actually needed. I assumed traffic monitoring might be pretty verbose, so uploading as separate artifacts made sense to me.

Fair enough. In those cases the artifacts will be useful. And as far as I know, it's free on GHA.

theihor self-requested a review February 5, 2025 20:56

theihor requested changes Feb 5, 2025

View reviewed changes

ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from 8ec43a2 to aa63759 Compare February 7, 2025 00:52

theihor approved these changes Feb 7, 2025

View reviewed changes

theihor mentioned this pull request Feb 7, 2025

ci: Install libpcap for upcoming traffic monitor feature. #130

Closed

ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from aa63759 to 773e073 Compare February 7, 2025 20:32

ameryhung marked this pull request as draft February 7, 2025 20:41

ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from 773e073 to a37ec9e Compare February 7, 2025 22:12

ameryhung marked this pull request as ready for review February 7, 2025 22:41

ameryhung added 3 commits February 7, 2025 15:36

workflows: upload traffic monitoring log as artifacts

505d233

After enabling traffic monitor in bpf selftests, upload the log files as artifacts. Signed-off-by: Amery Hung <[email protected]>

ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from a37ec9e to 505d233 Compare February 7, 2025 23:44

danobi approved these changes Feb 8, 2025

View reviewed changes

theihor merged commit 2b211eb into libbpf:main Feb 11, 2025
16 checks passed

theihor added a commit to kernel-patches/vmtest that referenced this pull request Feb 11, 2025

Test libbpf/ci#168

8974ebb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable traffic monitoring to debug flaky selftests #168

Enable traffic monitoring to debug flaky selftests #168

ameryhung commented Feb 3, 2025

theihor commented Feb 4, 2025

ameryhung commented Feb 4, 2025

theihor commented Feb 4, 2025

theihor commented Feb 5, 2025 •

edited

Loading

ameryhung commented Feb 5, 2025

theihor Feb 5, 2025

theihor left a comment

theihor Feb 7, 2025

ameryhung Feb 7, 2025 •

edited

Loading

theihor Feb 7, 2025

danobi commented Feb 7, 2025

theihor commented Feb 7, 2025

ameryhung commented Feb 7, 2025 •

edited

Loading

theihor commented Feb 7, 2025

iamkafai commented Feb 8, 2025 •

edited

Loading

danobi commented Feb 8, 2025

Enable traffic monitoring to debug flaky selftests #168

Enable traffic monitoring to debug flaky selftests #168

Conversation

ameryhung commented Feb 3, 2025

theihor commented Feb 4, 2025

ameryhung commented Feb 4, 2025

theihor commented Feb 4, 2025

theihor commented Feb 5, 2025 • edited Loading

ameryhung commented Feb 5, 2025

theihor Feb 5, 2025

Choose a reason for hiding this comment

theihor left a comment

Choose a reason for hiding this comment

theihor Feb 7, 2025

Choose a reason for hiding this comment

ameryhung Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

theihor Feb 7, 2025

Choose a reason for hiding this comment

danobi commented Feb 7, 2025

theihor commented Feb 7, 2025

ameryhung commented Feb 7, 2025 • edited Loading

theihor commented Feb 7, 2025

iamkafai commented Feb 8, 2025 • edited Loading

danobi commented Feb 8, 2025

theihor commented Feb 5, 2025 •

edited

Loading

ameryhung Feb 7, 2025 •

edited

Loading

ameryhung commented Feb 7, 2025 •

edited

Loading

iamkafai commented Feb 8, 2025 •

edited

Loading