Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCSP request with --phone-out and with supplied binary when using systemd for host resolution with my*entries segfaults #2516

Closed
multiflexi opened this issue Jun 25, 2024 · 25 comments
Labels
3.0 old branch 3.2 upcoming release bug:to be reproduced ... from maintainers

Comments

@multiflexi
Copy link
Contributor

multiflexi commented Jun 25, 2024

The error is:
testssl.sh/testssl.sh: line 2031: 3055367 Segmentation fault $OPENSSL ocsp -no_nonce ${host_header} -url "$uri" -issuer $TEMPDIR/hostcert_issuer.pem -verify_other $TEMPDIR/intermediatecerts.pem -CAfile <(cat $ADDTL_CA_FILES "$GOOD_CA_BUNDLE") -cert $HOSTCERT -text &> "$tmpfile"

This happens with --phone-out with supplied openssl (the bad version), but when the openssl is compiled from the source, the error does not occur. Also it does not occur with the system provided openssl.

Using the latest 3.2 version
Tested distros: Fedora 40, Ubuntu 22.04 and Slackware 15

@drwetter
Copy link
Collaborator

drwetter commented Jul 1, 2024

Hi @multiflexi ,
thanks for reporting. Smells like a DNS thing we had before.

  • does it only happen using --phone-out?
  • would you mind to strace that?

@multiflexi
Copy link
Contributor Author

Yes, only with --phone-out.
strace.txt

@drwetter
Copy link
Collaborator

drwetter commented Jul 5, 2024

Ok, thanks! I meant just the command which segfaulted. I'll guess I'll find the segfault in the hay stack later ;-)

@drwetter
Copy link
Collaborator

drwetter commented Sep 8, 2024

Plan is to compile the binaries on a newer platform , while tackling #2356

@drwetter drwetter added 3.2 upcoming release 3.0 old branch labels Sep 8, 2024
@drwetter
Copy link
Collaborator

drwetter commented Jan 4, 2025

Hi @multiflexi : can you please try his binary: https://testssl.sh/openssl-1.0.2k-bad/openssl.Linux.x86_64.static and let me know whether it works?

@multiflexi
Copy link
Contributor Author

Hi, sorry for the delay. It still outputs Segmentation fault:
OCSP URI http://GEANT.ocsp.sectigo.com./testssl.sh: line 2044: 72580 Segmentation fault (core dumped) $OPENSSL ocsp -no_nonce ${host_header} -url "$uri" -issuer $TEMPDIR/hostcert_issuer.pem -verify_other $TEMPDIR/intermediatecerts.pem -CAfile <(cat $ADDTL_CA_FILES "$GOOD_CA_BUNDLE") -cert $HOSTCERT -text &> "$tmpfile"

@drwetter
Copy link
Collaborator

Sigh. OK, thanks. That was on Fedora 40 only and not on Ubuntu 22.04?

In the above strace I maybe found something fishy but I can't really tell.
I would ease matters if exactly the command above could be "straced". By either copy and pasting the exact command on the command line and putting strace -f -o file.txt before the openssl command. Or doing that by inserting strace -f -o file.txt into that openssl command @ testssl.sh.

Does that happen when checking a specific host or any host?

@multiflexi
Copy link
Contributor Author

This was on current Manjaro. Today I also tested Fedora 41 with the same error and Ubuntu 22.04 where it worked fine. It happens when checking any host.
I did strace -f -o filename.txt ./testssl.sh -S --phone-out cesnet.cz
Result is in the 7z archive which you have to rename because GitHub does not allow *.7z files.

strace.7z.txt

@drwetter
Copy link
Collaborator

Thanks, but it seems I can't correlate your line 2044 (here: line 2091) with the strace output.

As said if it really segfaults there it would help if you could only strace that line.

@multiflexi
Copy link
Contributor Author

How can I do that?

@drwetter
Copy link
Collaborator

  • (vi|emacs|...) testssl.sh
  • goto the line which causes the segfault ($OPENSSL ocsp -no_nonce ${host_header} -url "$uri" -issuer $TEMPDIR/hostcert_issuer.pem -verify_other $TEMPDIR/intermediatecerts.pem -CAfile <(cat $ADDTL_CA_FILES "$GOOD_CA_BUNDLE") -cert $HOSTCERT -text &> "$tmpfile"
  • prepend strace -f -o filename.txt like strace -f -o filename.txt $OPENSSL ocsp -no_nonce ${host_header} ...
  • run testssl.sh and send me filename.txt

@multiflexi
Copy link
Contributor Author

I should have think of that 😄 Here you go.
strace.txt

@drwetter
Copy link
Collaborator

Thanks!

Did you supply the -f flag? It doesn't give me a strong hint as I hoped.

It looks more like openssl triggered the problem but is not the problem.

Before the thing segfaulted the loader was mapped into memory. Then some of the memory was protected to read only. Then SEGV_MAPERR indicated that some memory was accessed to which the pointer was wrong or it wasn't possible. Address is likely not 0x1e83c0 .

Wild guess: Do the distros where it segfaults the same /etc/nsswitch.conf and the one which is fine another one?

@multiflexi
Copy link
Contributor Author

Yes I did use -f:

Image

Manjaro where it segfaults:

Image

Fedora where it segfaults:

Image

Ubuntu where it works fine:

Image

@drwetter
Copy link
Collaborator

drwetter commented Jan 30, 2025

for testing sakes, can't you try to set the host entries for the first two to hosts: files dns and check whether it still segfaults?

PS: Never heard about myhostname or mymachines before but looking at an Alma Linux test machine here says it's using systemd . Oh well...

@multiflexi
Copy link
Contributor Author

So I tested it on Fedora and if dns is at the end of hosts line, it segfaults, if I move it to the second place, it works fine. I am able to reproduce it every time.

@drwetter drwetter changed the title Segmentation fault at line 2031 OCSP request with --phone-out and with supplied binary when using systemd for host resolution with my*entries segfaults Feb 12, 2025
@drwetter
Copy link
Collaborator

drwetter commented Feb 12, 2025

Awesome.

Thanks for helping to clarify though! At the moment I'd rather leave it (the cause) like it is -- as I am clueless and this seems to be the either systemd's problem or is somewhere in the middle between systemd and the openssl used. I changed the title. What I could do as a mitigation is trying to catch the segfault and issue a warning I assume "your" segfault didn't stop the whole scan?

If you like , maybe you can try the remove other entries and or move dns in nsswitch around.

@multiflexi
Copy link
Contributor Author

No, the segfault does not stop the scan. I will try to play with nsswitch in free time.

@drwetter
Copy link
Collaborator

NOTFOUND=return looks strange, but maybe I have to rtfm before guessing 😃

@drwetter
Copy link
Collaborator

drwetter commented Mar 3, 2025

Hi @multiflexi , can you have a short look what dmesg shows when the segfault occurs?

@drwetter
Copy link
Collaborator

drwetter commented Mar 8, 2025

Hi @multiflexi : can you please check which error number dmesg shows when using the new and if possible old binary., like

openssl.Linux.x[171004]: segfault at 270e0 ip 00000000000270e0 sp 00007ffc29913d68 error 14 in openssl.Linux.x86_64[400000+3b9000] likely on CPU 1 (core 0, socket 1)
Code: Unable to access opcode bytes at 0x270b6.

At a certain point I'd like to provide new binaries and if possible I'd like to get this issue fixed. I am not that tempted to clog this repo with another set of new binaries later.

@multiflexi
Copy link
Contributor Author

With the standard version on Manjaro:

[25563.596788] openssl.Linux.x[113335]: segfault at 271c0 ip 00000000000271c0 sp 00007ffd5c2393e8 error 14 likely on CPU 7 (core 1, socket 0)
[25563.596797] Code: Unable to access opcode bytes at 0x27196.

With the new version:

[25700.896799] openssl.Linux.x[121439]: segfault at 1e93c0 ip 00007f43ce5670d6 sp 00007ffdced4a248 error 4 in libc.so.6[360d6,7f43ce555000+171000] likely on CPU 3 (core 3, socket 0)
[25700.896810] Code: 48 03 04 25 00 00 00 00 c3 66 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 8b 05 2d 1e 1b 00 48 8b 0d 06 1c 1b 00 64 48 8b 00 <48> 8b 00 48 8b 70 38 48 8d 96 00 01 00 00 64 48 89 11 48 8b 78 40

@drwetter
Copy link
Collaborator

drwetter commented Mar 13, 2025

I am a little bit further but this is a tough one and not yet finally resolved....

For the record the following -- using the new binary:

Under Fedora 41 which I installed there were entries in the audit log as the binary did a connectTo call which SELinux prevented:

root@localhost-live:~# ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -i   | grep scontext | column -t
type=AVC  msg=audit(03/13/2025  15:37:45.924:113)  :  avc:  denied  {  read       }  for  pid=841  comm=systemd-homed    name=home                                 dev="vda3"                                      ino=197968       scontext=system_u:system_r:systemd_homed_t:s0  tcontext=system_u:object_r:var_t:s0  tclass=dir  permissive=0
type=AVC  msg=audit(03/13/2025  15:42:53.617:309)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0
type=AVC  msg=audit(03/13/2025  15:42:53.618:310)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0
type=AVC  msg=audit(03/13/2025  15:43:22.612:320)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0
type=AVC  msg=audit(03/13/2025  15:43:22.612:321)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0
type=AVC  msg=audit(03/13/2025  15:47:03.861:350)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0
type=AVC  msg=audit(03/13/2025  15:47:03.862:351)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0
type=AVC  msg=audit(03/13/2025  15:53:54.113:383)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0
type=AVC  msg=audit(03/13/2025  15:53:54.115:384)  :  avc:  denied  {  connectto  }  for  pid=899  comm=abrt-dump-journ  path=/run/systemd/userdb/io.systemd.Home  scontext=system_u:system_r:abrt_dump_oops_t:s0  tcontext=system_u:system_r:systemd_homed_t:s0  tclass=unix_stream_socket                      permissive=0

That seemed to be part of a problem (but can´t tell why a user shouldn't seem to be allowed to connect somewhere with any binary). But still, if I built a SELinux module and loaded it into the kernel, it didn't help and it still segfaults. aureport -a didn't show any further violations then. In the log file is only an ANOM_ABEND entry caused by the segfault which is used for an IDS).

journalctl -f showed then:

Image

Which kind of confirms my assumption that the nss with myhostname is the culprit here.

As of now I see only three possible solutions

  • leave it like it is and issue a warning to rather use /usr/bin/openssl
  • automatically switch to /usr/bin/openssl if available for openssl ocsp calls
  • look for any changes in the openssl 1.x tree which might fix this.

As said, there's also a docker issue which looks similar, see #2667 . There nss + myhostname isn´t involved but the possible fixes would be the same.

@drwetter
Copy link
Collaborator

  • look for any changes in the openssl 1.x tree which might fix this.

TL;DR: I believe the problem is static linking. However using an executable with dynamic linking under a variety of Linux systems is a pain in the <...> -- at least according to what I learned the hard way years ago. So for the time being I'll prepare a automagic switch to /usr/bin/openssl if available, for openssl ocsp calls.

I thought a problem could have been the old code base of "our" openssl-bad. So I compiled the latest versions of 1.1.1, 1.1.0 and also 1.0.2: Same segfault.

For the 1.1.x branches when running the openssl supplied test suites there were even tons of warning messages like warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking. Same message regarding dlopen. Thread e.g.: https://stackoverflow.com/questions/57476533/why-is-statically-linking-glibc-discouraged/57478728#57478728

PS: musl libc or diet libc seems no option to me. Don´t know how go binaries do that...

@drwetter
Copy link
Collaborator

PPS: Also the dynamically built binary does not work under Fedora.

Image

drwetter added a commit that referenced this issue Mar 14, 2025
…ne-out

As `--phone-out` sometimes doesn't work with our binary we switch transparently/automagically
to the vendor support openssl binary -- if available.

This fixes at least #2516 where the issue has been explained/debugged in detail.
See also #2667 and #1275.
drwetter added a commit that referenced this issue Mar 14, 2025
…ne-out (3.0)

As `--phone-out` sometimes doesn't work with our binary we switch transparently/automagically
to the vendor support openssl binary -- if available. This is the PR for 3.0, for 3.2 see #2695 .

This fixes at least #2516 where the issue has been explained/debugged in detail.
See also #2667 and #1275.
drwetter added a commit that referenced this issue Mar 15, 2025
One positive, one negative

This should detect failures in the future like in #2667, #2516
and #1275 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 old branch 3.2 upcoming release bug:to be reproduced ... from maintainers
Projects
None yet
Development

No branches or pull requests

2 participants