Skip to content

Glibc SIGSYS patch for sandboxed localtime_r() #2527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

AE1020
Copy link
Contributor

@AE1020 AE1020 commented Apr 27, 2021

Reason for This PR

This permits the alpine demo from the quick start to appear to
run with glibc instead of musl...when some warnings are being
emitted by the logger that include timestamps. Previously the
warning emission in a glibc build would cause SIGSYS.

Description of Changes

The LocalTime::now() function is used by the logger for some
messages (such as warnings). If the first case of calling
a log message that included the time happened in a sandboxed
thread, glibc would potentially read from the host filesystem
on that thread in response to the log message.

This effect arises from the implementation of how localtime_r()
obtains time zone information if the "TZ" environment variable
is not set. A function called __tzfile_read() is called:

https://sources.debian.org/src/glibc/2.28-10/time/tzset.c/#L584

Which triggers an fopen():

https://sources.debian.org/src/glibc/2.28-10/time/tzfile.c/#L165

Because this information is cached, it appears possible to work
around the problem by making an early call to LocalTime::now()
during initialization. Then if calls pertaining to logging
come later on the seccomp filtered thread the caching is done
already and problems don't occur.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • [N/A] Any newly added unsafe code is properly documented.
  • [N/A] Any API changes are reflected in firecracker/swagger.yaml.
  • [N/A] Any user-facing changes are mentioned in CHANGELOG.md.

RE: "[ ] All added/changed functionality is tested", I am running this in VirtualBox as I do not have a non-virtual Linux machine at present to apply. The musl-based version does not pass all the tests with that either. This change improves the mentioned issue, without seeming to make anything worse.

The LocalTime::now() function is used by the logger for some
messages (such as warnings).  If the first case of calling
a log message that included the time happened in a sandboxed
thread, glibc would potentially read from the host filesystem
on that thread in response to the log message.

This effect arises from the implementation of how localtime_r()
obtains time zone information if the "TZ" environment variable
is not set.  A function called __tzfile_read() is called:

  https://sources.debian.org/src/glibc/2.28-10/time/tzset.c/#L584

Which triggers an fopen():

  https://sources.debian.org/src/glibc/2.28-10/time/tzfile.c/#L165

Because this information is cached, it appears possible to work
around the problem by making an early call to LocalTime::now()
during initialization.  Then if calls pertaining to logging
come later on the seccomp filtered thread the caching is done
already and problems don't occur.

This permits the alpine demo from the quick start to appear to
run with glibc instead of musl, and built with the standard Rust
on an Ubuntu 20.04 installation (inside of VirtualBox 6.1.20).

Signed-off-by: Cerul Alain <[email protected]>
@serban300
Copy link
Contributor

@AE1020 Great dive deep and thank you for the detailed explanation ! Personally I'm a bit reluctant about including such workarounds in the code. How long would the timezone be cached ? Would it be possible for it to be evicted from the cache and for the VM to crash unpredictably later ?

@alindima
Copy link
Contributor

Hi @AE1020, there is a similar issue with musl as well, and we chose was to allow the required syscall in the seccomp filter:

// Used for reading the timezone in LocalTime::now()

For glibc, can you please tell us what the syscall that triggers the sigsys is? Firecracker usually logs the number of the offending syscall.
I was not able to reproduce your scenario, though I was running with glibc 2.26-44.amzn2.

Firecracker's glibc support is considered only experimental, and the recommended, production libc for Firecracker is musl, as mentioned in the documentation: https://github.com/firecracker-microvm/firecracker/blob/a367796e66eeac42d9ce1294c0fbbca6191e9cf3/docs/getting-started.md#alternative-building-firecracker-using-glibc.

As a development workaround, you may run with seccomp disabled, (--seccomp-level 0). For production purposes, please use musl.

Moreover, we're currently working on a feature that will enable users to provide custom file-based seccomp filters to Firecracker at runtime. Here is the feature branch where this is being developed: https://github.com/firecracker-microvm/firecracker/tree/feature/file_based_seccomp.
Once this is released, glibc users will be able to maintain their own seccomp filters, tailored to the glibc version. (Though this is not trivial and comes with risks )

I'll be closing this PR for now, for the above reasons

@alindima alindima closed this Apr 28, 2021
@AE1020
Copy link
Contributor Author

AE1020 commented Apr 28, 2021

I was not able to reproduce your scenario, though I was running with glibc 2.26-44.amzn2.

I got the problem running guest Ubuntu 20.04 (glibc 2.31) in a Windows 10 VirtualBox 6.1.20 host. /etc/timezone exists and TZ environment variable not set.

It was a fresh install created expressly to try Firecracker with, and try running Valgrind on ("good first issue").

For glibc, can you please tell us what the syscall that triggers the sigsys is?

I pointed out the specific fopen() call. The path requested was "/". The SIGSYS is generated by openat:

https://sources.debian.org/src/glibc/2.31-11/sysdeps/unix/sysv/linux/openat64_nocancel.c/#L43

UPDATE: Looking at it with a debug build of glibc, the "/" may have been a weakness in the vscode + rust debugger showing the leading character only of a longer path. Because the debug build of glibc shows the file path as "/dbg/etc/localtime"

Firecracker usually logs the number of the offending syscall.

Perhaps it is different as the issue itself occurs during logging. The musl build successfully ran the demo, and the log had these lines in it:

(...more...)
[    1.369701] iscsi: registered transport (tcp)
[    1.374943] tun: Universal TUN/TAP device driver, 1.6
2021-04-27T04:24:30.677076131 [anonymous-instance:fc_vcpu 0:WARN:src/devices/src/legacy/i8042.rs:126] Failed to trigger 
i8042 kbd interrupt (disabled by guest OS)
[    1.408186] i8042: Failed to disable AUX port, but continuing anyway... Is this a SiS?
(...more...)

That was where the glibc build output gave:

(...more...)
[    1.553549] iscsi: registered transport (tcp)
[    1.561403] tun: Universal TUN/TAP device driver, 1.6
Bad system call

Then it terminated.

Once this is released, glibc users will be able to maintain their own seccomp filters, tailored to the glibc version.

This may work (presuming the "/" request for file was a red herring based on broken debugging).

Though... it may be that as a cloud-focused offering, just reducing variability by using UTC for all times makes the most sense...or only using time zones if TZ set in the environment or passed as a config option, etc.

Doing UTC unless a time zone is in the config may make the most sense, as if you want any time zone when reading logs, it's the client's.

Firecracker's glibc support is considered only experimental, and the recommended, production libc for Firecracker is musl, as mentioned in the documentation: https://github.com/firecracker-microvm/firecracker/blob/a367796e66eeac42d9ce1294c0fbbca6191e9cf3/docs/getting-started.md#alternative-building-firecracker-using-glibc.

Yes (and design-wise I absolutely prefer musl, always good to see people use it).

But building with mainstream defaults can offer advantages for instrumentation and tooling not tailored to musl builds. And realistically speaking, if one is forced to be running glibc for other reasons it is saves space overall and is faster. Plus multiple implementations mean it can draw attention to gaps in understandings. I think in the area of these seccomp features specifically one wouldn't want to disable it in a variant implementation, to get any insights.

Anyway, PRs for glibc support were accepted and it seemed close to working. This change actually did allow it to work in my case. A PR with data and sourcing strikes me as more informative than a bug report. 🤷‍♂️

@alindima
Copy link
Contributor

alindima commented Apr 28, 2021

With the machine I tried, with glibc 2.26, this is the process for reading the local timezone (I ran using strace):

open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 25
fstat(25, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
fstat(25, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
read(25, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 127
lseek(25, -71, SEEK_CUR)                = 56
read(25, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 71
close(25)                               = 0

This is a perfect example of why introducing glibc-specific changes does not scale well. They may vary across versions.

I also don't see what the concern would be with open("/"). Any open call will fail if the user is trying to access a file that is restricted (i.e. if the user does not have the right permissions). Firecracker's design accounts for this, with the jailer. It chroot's into a very constrained environment, and runs the Firecracker process as a user with very limited privileges.
Firecracker's default seccomp filter allow any open() call, so I wouldn't worry about this.

Anyway, PRs for glibc support were accepted and it seemed close to working. This change actually did allow it to work in my case. A PR with data and sourcing strikes me as more informative than a bug report. 🤷‍♂️

There are examples of PRs for glibc support similar to this one that were closed specifically because they were addressing a seccomp violation. We added the experimental flag to glibc later on and developed the said seccomp file-based feature specifically so that we can handle such cases gracefully. We appreciate your effort and indeed you made a very informative PR.

However, as glibc support is experimental, and you are trying to run Firecracker with glibc for development/testing purposes, is running with --seccomp-level 0 acceptable for you in this case?

@alindima
Copy link
Contributor

I'm gonna reopen this PR for now, to offer clear exposure of the discussion

@alindima alindima reopened this Apr 28, 2021
@AE1020
Copy link
Contributor Author

AE1020 commented Apr 28, 2021

This is a perfect example of why introducing glibc-specific changes does not scale well. They may vary across versions.

Note that all of this is new to me (Firecracker, seccomp, Rust...), so there's a lot I don't understand. But I'd have thought the difference would be that you are not having the call trigger at the same time--not that glibc has changed much.

Are you sure your comparison case has that fopen() being called inside of StateMachine::run() called by Vcpu::run(), while protections are applied?

My specific problem happens when trigger_keyboard_interrupt() occurs under that call stack and it raises a warning...and thus makes its first call to localtime_r(). If you aren't getting that warning, you might try forcing it?

We appreciate your effort and indeed you made a very informative PR.

Thanks! I was just reacting a bit to hearing it as "Hey, it's experimental, can't you read?", plus being closed while I was mid-response to the other questions. I can read! :-)

So I appreciate you reopening it. But I don't have any specific needs... really it's just a learning experience for me. I was looking to get an idea of Rust from projects that are up my alley of interest rather than slog through a book. Hmmm, maybe I can't read. :-P

@alindima
Copy link
Contributor

Thanks! I was just reacting a bit to hearing it as "Hey, it's experimental, can't you read?", plus being closed while I was mid-response to the other questions. I can read! :-)

It really wasn't meant like that :) It's just helpful to link documentation whenever you're referring it.

Note that all of this is new to me (Firecracker, seccomp, Rust...), so there's a lot I don't understand. But I'd have thought the difference would be that you are not having the call trigger at the same time--not that glibc has changed much.
Are you sure your comparison case has that fopen() being called inside of StateMachine::run() called by Vcpu::run(), while protections are applied?
My specific problem happens when trigger_keyboard_interrupt() occurs under that call stack and it raises a warning...and thus makes its first call to localtime_r(). If you aren't getting that warning, you might try forcing it?

Right, so, in the scenario I've tested, I too get a warning in the logs about the keyboard interrupt, so this is not the problem. Whether the call is made at the same time in this case doesn't matter because I ran using strace, so any system call was recorded, and the code I've pasted in the previous comment is what gets triggered for the specific glibc version I've used.

The rust code for Localtime::now() calls a libc function, localtime_r(). This libc function is then fully responsible to return the localtime. The system calls used for doing so may vary across versions and libc implementations, since there are multiple ways to skin a cat, as they say.

So I appreciate you reopening it. But I don't have any specific needs... really it's just a learning experience for me. I was looking to get an idea of Rust from projects that are up my alley of interest rather than slog through a book. Hmmm, maybe I can't read. :-P

And we appreciate your effort. We're looking forward to your next contributions. As far as I understand, your progress on other issues is unblocked by running with --seccomp-level 0, and I think that for this case, this is the right thing to do.

Thanks

@alindima alindima closed this Apr 29, 2021
@AE1020
Copy link
Contributor Author

AE1020 commented Apr 29, 2021

Whether the call is made at the same time in this case doesn't matter

I'm not clear on what you mean by this (e.g. it clearly mattered in my case)

If something about your situation caused a warning earlier--such as closer to startup--it would have had the same effect as my workaround. The problem is very specific to when the fopen() happened--not that it happened at all.

I won't argue that it could be due to a glibc difference (e.g. yours could cache the time zone on library startup). I'm just saying I do not understand what about your environment would be different if it's permitting the access to "/etc/localtime" in the protected thread, when mine did not allow that access.

I've named a possible cause: e.g. you have some other condition (another warning or configuration option) that causes a time request to be made earlier than that moment. Then this would not be related to a difference in the glibc--and it would be a matter of the timing of calls into glibc instead. I'd still think something like this would be more likely than a glibc change, and might be worth pinning down for the sake of comprehension.

@alindima
Copy link
Contributor

alindima commented Apr 29, 2021

The thing that matters in the context of seccomp is whether a faulty syscall has happened before the prctl call that installed the filter, or after.
Since you receive a bad syscall, which is a SIGSYS signal, your code is triggering a syscall that is not allowed by the filter, so the filter was installed prior to the bad syscall.
In my case, based on strace output, the system calls that I pasted earlier happen after the filter is enabled, so, at the same time as yours, after the filter was installed.

In this sense, the timing of when exactly this was called does not matter, because in terms of seccomp, the only thing that matters is whether the filter was installed or not yet.

The glibc I'm using isn't caching the timezone at process startup time. Having run with strace, I can see exactly what system calls my code is running, so I'm sure of this.

To get a very clear understanding of what is happening on your side, you could run Firecracker under strace, like so:
strace -ff -o strace.out ./firecracker ....

The thing that must differ from my reproduction scenario vs yours is the libc version. fopen is not a system call, it's a library function. And different libc's implement it differently.

If your glibc version was calling openat under the hood, this is why it was failing. This system call is not listed in the allowlist for Firecracker on x86. Seeing the strace output will clarify things

I can also paste the strace output for the version of glibc I'm using, if you'd like

@AE1020
Copy link
Contributor Author

AE1020 commented Apr 29, 2021

If your glibc version was calling openat under the hood, this is why it was failing. This system call is not listed in the allowlist for Firecracker on x86. Seeing the strace output will clarify things

Okay--yes it's calling openat. My confusion was that I'd thought that regardless of how fopen() would be implemented, you would be wanting to block it from fopen()-ing a new file. That would match how Wikipedia described the new-to-me term seccomp:

seccomp allows a process to make a one-way transition into a "secure"
state where it cannot make any system calls except exit(), sigreturn(),
read() and write() to already-open file descriptors. Should it attempt
any other system calls, the kernel will terminate the process with SIGKILL
or SIGSYS.[1][2]

But I now gather that for the specific feature of host filesystem protection, you are foregoing seccomp features...in order to use your own more configurable "jailer" solution. Hence seccomp is protecting against other non-filesystem oriented syscalls (so attackers who manage to trick your Rust code can't make it execute new processes, etc).

So if one just deletes line 108 in filters.rs default_filter(), that will enable openat on x86_64 instead of just on aarch64...and I've verified my configuration works when I do so:

#[cfg(target_arch = "x86_64")]
allow_syscall(libc::SYS_open),
#[cfg(target_arch = "aarch64")]
allow_syscall(libc::SYS_openat),

But I don't know why your glibc 2.26 wouldn't need that change too. Because articles dating back to 2017--speaking about the impacts on seccomp specifically--say that glibc 2.26 was when it happened. ❓

Other than that point, it makes sense now. But perhaps you see why it would be confusing. The seccomp feature is explained as having the purpose of blocking access to anything other than already-open file descriptors, unless you are using a non-standard filtering extension. So I thought rejecting fopen() was the right thing to do, and not the bug.

@alindima
Copy link
Contributor

alindima commented May 4, 2021

Linux has two types of seccomp filtering. It has a default seccomp filter (SECCOMP_SET_MODE_STRICT ), that, as you said, only allows a couple of system calls, like read, exit, write. And it also has seccomp-bpf (SECCOMP_SET_MODE_FILTER) which uses BPF programs for filtering the allowed system calls.
Firecracker uses the latter, since it's more configurable and Firecracker would not function with the strict mode.

Have a look here for more details: https://man7.org/linux/man-pages/man2/seccomp.2.html

I can see where your confusion was coming from, no worries. Hope I could help a bit.

@AE1020
Copy link
Contributor Author

AE1020 commented May 4, 2021

Do you plan to add openat to the list for x86_64?

One might add it with a comment saying it was needed by GLIBC 2.26 and above, because that's when the internet says Glibc changed, But your Glibc 2.26 worked, suggesting it did not (...or perhaps you were running on aarch64?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants