Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable traffic monitoring to debug flaky selftests #168

Merged
merged 3 commits into from
Feb 11, 2025

Conversation

ameryhung
Copy link
Contributor

Traffic monitoring was add to the selftests/bpf[0] to debug some flaky test cases. Enable this in CI by building selftests with libpcap, installing libpcap in the test environment and finally turn it on when running the test.

[0] https://lore.kernel.org/bpf/[email protected]/

@theihor
Copy link
Contributor

theihor commented Feb 4, 2025

@ameryhung If I understand correctly, the traffic monitoring logs are printed to files.

In github UI only stdout is going to be visible, the logs stored in files will be lost by default. So on CI you may want to either dump the logs to stdout, or collect them into a separate artifact.

@ameryhung
Copy link
Contributor Author

@theihor

I will change to upload the log file as well. Originally, I thought "-m" also prints to stdout. However, it does not include the payload. Having it should be more helpful for debugging.

@theihor
Copy link
Contributor

theihor commented Feb 4, 2025

You can ignore GCC BPF tests failure. They are disabled at kernel-patches/bpf, I'll disable them here too shortly.

@theihor
Copy link
Contributor

theihor commented Feb 5, 2025

I just noticed there is an older similar PR already: #130

There is a piece of yaml uploading logs as artifacts there.

Re stdout vs artifacts, consider the verbosity of the monitoring log. If it's lots of lines per test, then I'd go with the artifacts.

@ameryhung
Copy link
Contributor Author

I think uploading tmon logs as artifacts makes sense. I will update the PR. Thanks!

@theihor theihor self-requested a review February 5, 2025 20:56
run-vmtest/run-bpf-selftests.sh Outdated Show resolved Hide resolved
Comment on lines 83 to 87
- uses: actions/upload-artifact@v4
with:
name: tmon-logs-${{ inputs.arch }}-${{ inputs.toolchain_full }}${{ inputs.kernel-test }}
if-no-files-found: error
path: "/tmp/tmon_pcap/*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managing artifacts is really a job-level task IMO. And we should avoid new action inputs if possible.

I think in this case, it makes sense to add an always() (link) step in kernel-test workflow after run-vmtest call step. toolchain_full string is directly available there, since the build artifacts are identified by it. Also, with always() condition, logs will be uploaded on failure, which is what we want.

Later we may want to add other log files to the step (but don't worry about that in this PR).

@ameryhung ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from 8ec43a2 to aa63759 Compare February 7, 2025 00:52
Copy link
Contributor

@theihor theihor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, thanks!

@@ -43,7 +43,7 @@ test_progs_helper() {
# "&& true" does not change the return code (it is not executed
# if the Python script fails), but it prevents exiting on a
# failure due to the "set -e".
./${selftest} ${args} --json-summary "${json_file}" && true
./${selftest} ${args} -m '*' --json-summary "${json_file}" && true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: adding -m to args was ok too

Copy link
Contributor Author

@ameryhung ameryhung Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean what I had before (i.e., "args+=" -m ''")? That didn't work because the single quote inside the double quote was not evaluated and removed so test_progs would see '' instead of *. I can replace the args with arrays.

local args=("$2")

args+=("-m")
args+=("*")

./test_progs "${args[@]}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't work because the single quote inside the double quote was not evaluated and removed so test_progs would see '' instead of *.

Hmm... that should work.

Consider this:

args+=" ${w:+-w$w}"
args+=" -m '*'"
echo ./foo "$args" bar
$ ./x.sh 
./foo   -m '*' bar
$ w=123 ./x.sh 
./foo  -w123 -m '*' bar

Anyways, this is a nit thing, don't spend too much time on it.

@ameryhung ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from aa63759 to 773e073 Compare February 7, 2025 20:32
@ameryhung ameryhung marked this pull request as draft February 7, 2025 20:41
@danobi
Copy link
Member

danobi commented Feb 7, 2025

IMHO uploading the artifact is not wrong and is generally useful. But it's probably a bit more useful to get the output from stdout/stderr. We already have infra in KPD that slurps failed selftest output and emails it out to contributors. Would be nice to leverage that.

If it's just a matter of improving the selftest infra in kernel tree then maybe @mlau2 can help.

@ameryhung ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from 773e073 to a37ec9e Compare February 7, 2025 22:12
@ameryhung ameryhung marked this pull request as ready for review February 7, 2025 22:41
@theihor
Copy link
Contributor

theihor commented Feb 7, 2025

IMHO uploading the artifact is not wrong and is generally useful. But it's probably a bit more useful to get the output from stdout/stderr. We already have infra in KPD that slurps failed selftest output and emails it out to contributors. Would be nice to leverage that.

@danobi dumping everything to stdout/stderr is ok up to a certain point. An extreme bad example is selftests/sched_ext which prints megabytes of log to stdout. This makes github ui to freeze, and is generally not convenient to work with.

The users of these logs should consider its verbosity, format and situations when it is actually needed. I assumed traffic monitoring might be pretty verbose, so uploading as separate artifacts made sense to me.

I just downloaded an artifact from a successfull test run, and it appears that the "logs" are actually some binary data (@ameryhung I don't really understand what it is, so if that's not right lmk):

$ cat packets-105-94-select_reuseport__sockhash_IPv4_TCP_LOOPBACK_test_err_sk_select-select_reuseport.log
�ò�oN�g��	XED��@@K���؅��� ����8��
�IH� oN�g��	<E(@@<�؅�����)P�h

$ xxd packets-105-94-select_reuseport__sockhash_IPv4_TCP_LOOPBACK_test_err_sk_select-select_reuseport.log
00000000: d4c3 b2a1 0200 0400 0000 0000 0000 0000  ................
00000010: 0004 0000 1401 0000 6f4e a667 8cd4 0900  ........oN.g....
00000020: 5800 0000 5800 0000 0800 0000 0000 0001  X...X...........
00000030: 0304 0006 0000 0000 0000 0000 4500 0044  ............E..D
00000040: f0c0 4000 4006 4bf1 7f00 0001 7f00 0001  ..@[email protected].........
00000050: 86c9 d885 dfec df20 0000 0000 a002 ffd7  ....... ........
00000060: fe38 0000 0204 ffd7 0402 080a 9249 48e6  .8...........IH.
00000070: 0000 0000 0103 0307 2000 0000 0000 0000  ........ .......
00000080: 6f4e a667 b0d4 0900 3c00 0000 3c00 0000  oN.g....<...<...
00000090: 0800 0000 0000 0001 0304 0006 0000 0000  ................
000000a0: 0000 0000 4500 0028 0000 4000 4006 3cce  ....E..(..@.@.<.
000000b0: 7f00 0001 7f00 0001 d885 86c9 0000 0000  ................
000000c0: dfec df29 5014 0000 9368 0000            ...)P....h..

If this is expected, then there is just no point in printing it to stdout: it's unreadable.

@ameryhung
Copy link
Contributor Author

ameryhung commented Feb 7, 2025

@theihor The logs indeed are binary data and can be inspected using tcpdump. E.g., tcpdump -X -r <packet_log> will show packets with their payloads

@theihor
Copy link
Contributor

theihor commented Feb 7, 2025

@theihor The logs indeed are binary data and can be inspected using tcpdump. E.g., tcpdump -X -r <packet_log> will show packets with their payloads

@ameryhung thank you, makes sense. Well it's 1.1Mb of the logs in total right now, and tcpdump-ing them is gonna be lots of lines. So let's use the artifacts.

We only need to make sure that relevant people are aware that this data is available.

Convert args to array so that later we can pass "-m *" to test_progs
through args. Currently, with args being string, it is not possible.
Doing args+="-m *" will result in the wildcard being expanded.
Doing args+="-m '*'" will result in single quoted wildcard being passed
to test_progs.

Signed-off-by: Amery Hung <[email protected]>
Traffic monitoring was add to the selftests/bpf[0] to debug some flaky
test cases. Enable this in CI by building selftests with libpcap,
installing libpcap in the test environment and finally turn it on when
running the test.

[0] https://lore.kernel.org/bpf/[email protected]/

Signed-off-by: Amery Hung <[email protected]>
After enabling traffic monitor in bpf selftests, upload the log files
as artifacts.

Signed-off-by: Amery Hung <[email protected]>
@ameryhung ameryhung force-pushed the enable_bpf_selftest_traffic_monitoring branch from a37ec9e to 505d233 Compare February 7, 2025 23:44
@iamkafai
Copy link

iamkafai commented Feb 8, 2025

I just know I have a mlau2 account :)

In reply to @danobi about stdout/stderr, it does log to stdout/stderr (can't remember which one). I believe it was unnoticed because the test has passed. There is no details output by default for successful test.

Using "test_progs -v" should show it. Like:

#> ./test_progs -v -m '*' -t tc_redirect/tc_redirect_dtime
test_inet_dtime:PASS:setns src 0 nsec
src Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
src_fwd In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
dst In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
dst_fwd Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 60, SYN
dst Out IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
src In IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
dst In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
dst_fwd In IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
test_inet_dtime:PASS:connect_to_fd 0 nsec
src Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
src_fwd Out IPv4 172.16.2.100:50001 > 172.16.1.100:48914: TCP, length 60, SYN, ACK
test_inet_dtime:PASS:accept 0 nsec
src_fwd In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
src Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 68, ACK
test_inet_dtime:PASS:send to server 0 nsec
dst In IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 68, ACK
dst_fwd Out IPv4 172.16.1.100:48914 > 172.16.2.100:50001: TCP, length 52, ACK
__rcv_tstamp:PASS:recvmsg 0 nsec

@danobi
Copy link
Member

danobi commented Feb 8, 2025

@danobi dumping everything to stdout/stderr is ok up to a certain point. An extreme bad example is selftests/sched_ext which prints megabytes of log to stdout. This makes github ui to freeze, and is generally not convenient to work with.

The users of these logs should consider its verbosity, format and situations when it is actually needed. I assumed traffic monitoring might be pretty verbose, so uploading as separate artifacts made sense to me.

Fair enough. In those cases the artifacts will be useful. And as far as I know, it's free on GHA.

@theihor theihor merged commit 2b211eb into libbpf:main Feb 11, 2025
16 checks passed
theihor added a commit to kernel-patches/vmtest that referenced this pull request Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants