Skip to content

Commit 3e780c6

Browse files
committed
Major redraft of environment variable sandboxing
Much simplified, removing regexs
1 parent 5963091 commit 3e780c6

File tree

1 file changed

+231
-0
lines changed

1 file changed

+231
-0
lines changed

text/0000-sandbox-environment.md

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
- Feature Name: sandbox-environment
2+
- Start Date: 2019-10-26
3+
- RFC PR: [rust-lang/rfcs#2794](https://github.com/rust-lang/rfcs/pull/2794)
4+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This proposes a mechanism to precisely control what environment variables are
10+
available to Rust programs at compilation time.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
Rust supports the `env!` and `option_env!` macros which allow Rust programs to
16+
query arbitrary process environment variables at compilation time. This is a
17+
very flexible mechanism to pass compile-time information to the program.
18+
19+
However, in many cases it is too flexible. It poses several problems:
20+
1. Environment variables are generally not tracked by build systems, so changing
21+
a variable is not taken into account. Cargo has an ad-hoc mechanism for doing
22+
this in build scripts, but there's nothing to make this guaranteed correct
23+
(i.e. that all variables accessed are tracked).
24+
2. There's no easy way to audit which environment variables a crate accesses.
25+
This not only exacerbates the problem above, but it also means that
26+
potentially sensitive information in an environment variable can be
27+
incorporated into the compiled code.
28+
3. There's no way to override variables if they're needed by the build process
29+
itself. For example, the `PATH` variable likely needs to be set so that the
30+
compiler can execute its various components, but there's no way to override
31+
this so that `env!("PATH")` returns something else. This would be necessary
32+
where the compilation environment differs from the deployment environment
33+
(such as when cross-compiling).
34+
35+
This RFC proposes a way to precisely coqntrol the environment visible to the
36+
compile-time macros, while defaulting to the current behaviour of making the
37+
entire environment available.
38+
39+
# Guide-level explanation
40+
[guide-level-explanation]: #guide-level-explanation
41+
42+
Rust implements the `env!()` and `option_env!()` macros to access the process
43+
environment variables at compilation time. `rustc` supports a number of
44+
command-line options to control the environment visible to the compiling code.
45+
46+
By default all environment variables are available with their value taken from
47+
the process environment. However there are several command-line options to
48+
control this environment:
49+
- `--env-clear` - remove all process environment variables from the logical
50+
environment, leaving it completely empty.
51+
- `--env-remove VARIABLE` - Remove a specific variable from the logical
52+
environment. This is an exact match, and it does nothing if that variable is
53+
not set.
54+
- `--env-pass VARIABLE` - pass a variable from the process environment to the
55+
logical one, even if it had previously been removed. This lets specific
56+
variables to be allow-listed without having to explicitly set their value. The
57+
variable is ignored if it is not set or not utf-8 encoded.
58+
- `--env-set VARIABLE=VALUE` - Set a variable in the logical environment. This will
59+
either create a new variable, or override an existing value.
60+
61+
The options are processed in the order listed above (ie, clear, then remove,
62+
then set). Multiple `--env-set` options affecting the same variable are
63+
processed in command-line order, so later ones override earlier ones.
64+
65+
# Reference-level explanation
66+
[reference-level-explanation]: #reference-level-explanation
67+
68+
The implementation of this RFC introduces the notion of:
69+
- the process environment which is inherited by the `rustc` process from it's
70+
invoker, and
71+
- a logical environment which is accessed by the `env!`/`option_env!` macros
72+
73+
The logical environment is initialized from the complete process environment,
74+
excluding only environment variables which are not utf-8 encoded (name or
75+
value).
76+
77+
Once initialized, the logical environment may be manipulated via the `--env-`
78+
command-line options described below.
79+
80+
These environments are fundamentally key-value mappings, which is how they're
81+
represented within the session state - a map from `String` to `String`.
82+
83+
## Processing of the options
84+
85+
These options are processed in order:
86+
1. `--env-clear` - remove all variables from the logical environment.
87+
1. `--env-remove VAR` - remove a specific variable from the logical environment.
88+
May be specified multiple times.
89+
1. `--env-pass VAR`- set a variable in the logical environment from the process
90+
environment. Ignored if the variable is not set, or is not utf-8 encoded.
91+
1. `--env-set VAR=VALUE` - multiple `--env-set` options affecting the same variable are
92+
processed in command-line order, so later ones override earlier ones.
93+
94+
`rustc` will only accept UTF-8 encoded command-line options, which affects all
95+
these options. This implies that all environment variables must have UTF-8
96+
encoded names and values.
97+
98+
## Compile-time behaviour
99+
100+
The `env!()` and `option_env!()` macros only inspect the logical environment
101+
with no reference to the process environment.
102+
103+
Note that this can't affect other environment accesses. For example, if a
104+
procedural macro uses the `std::env::var()` function as part of its
105+
implementation it will access the process environment. Any process that happens
106+
to be invoked by `rustc` would still see the original process environment, not
107+
the logical environment.
108+
109+
## Cargo
110+
111+
This has no direct effect on Cargo - it can completely ignore these options and
112+
the overall behaviour would be unchanged. However, it's easy to imagine a
113+
corresponding RFC for Cargo where it does more explicitly control the logical
114+
environment. For example, it could constrain the accessible variables to:
115+
1. ones that Cargo itself sets
116+
2. ones that the build script sets via `rustc-env`
117+
3. ones that the build script notes as `rerun-if-env-changed`
118+
4. explicitly listed in the `Cargo.toml`
119+
120+
# Drawbacks
121+
[drawbacks]: #drawbacks
122+
123+
The primary cost is additional complexity in invoking `rustc` itself, and
124+
additional complexity in documenting `env!`/`option_env!`. Procedual macros
125+
would need to be changed to access the logical environment, either by adding new
126+
environment access APIs, or overriding the implementation of `std::env::var`
127+
(etc) for procmacros.
128+
129+
# Rationale and alternatives
130+
[rationale-and-alternatives]: #rationale-and-alternatives
131+
132+
One alternative would be to simply do nothing, and leave things as-is.
133+
134+
In a Unix/Linux-like system, the environment can be controlled either with the
135+
shell, or the `env` command. However this requires `rustc` to be invoked via a
136+
shell or the `env` command, which may not be convenient for a given build
137+
system. Alternatively, the buildsystem itself could be modified to suitably
138+
configure the environment. However, it would still be strictly less capable, as
139+
it would not be able to override variables or remove needed by:
140+
- `rustc` itself to run - such as `HOME`, `LD_PRELOAD` or `LD_LIBRARY_PATH`
141+
- `rustc` to invoke the linker, such as `PATH`
142+
- the linker for its own operation (`PATH`, and so on)
143+
144+
This can be particularly awkward when it isn't clear which variables are needed by the toolchain - for example,
145+
invoking `rustc` via `rustup` uses a wider range of variables than directly invoking the `rustc` binary without
146+
an intermediary.
147+
148+
This proposal gives maximal control when needed, without changing the default behaviours at all.
149+
150+
When `rustc` is embedded or long-running, such as in `rls` or `rust-analyzer`, then its necessary to explicitly
151+
set the logical environment for each crate, rather than just inheriting the process environment.
152+
153+
# Prior art
154+
[prior-art]: #prior-art
155+
156+
C/C++ compilers typically have the ability set preprocessor macros via the
157+
command-line. They can be set to arbitrary values via the `-D` option. This is
158+
logically equivalent to both of Rust's mechanisms:
159+
- preprocessor macros can be used as predicates in compile-time conditional
160+
compilation tests, and
161+
- they can be expanded into the text of the program itself
162+
163+
These macros are explicit on the command-line, so they're easy to take into
164+
account as an input to the compilation process. And the tools driving the C
165+
compiler don't need any addition way to control the process environment.
166+
167+
Rust has a couple of mechanisms for compile-time configuration:
168+
- It has the `--cfg` options which set flags which can be tested with
169+
compile-time predicates. These are strictly binary choices, which allow for
170+
conditional complilation.
171+
- It has the process environment which can be queried at compile-time with
172+
`env!()` which evaluate to an arbitrary compile-time `&'static str` constant,
173+
which cannot be directly used for conditional compilation. The environment is
174+
not directly set via command-line options, but via another mechanism.
175+
176+
This mechanism doesn't change the semantics of either mechanism, but it does
177+
make the environment a little more like a C preprocessor macro - they can be
178+
precisely set on the command line, and if desired, only via the command line.
179+
180+
In general build systems need to have a precise knowledge of all inputs used to
181+
build a particular artifact. This is especially important when trying to
182+
implement fully reproducable builds, either for auditability reasons or just to
183+
get good hit rates from a build cache. Build systems don't take the environment
184+
into account because most of it isn't relevant to builds. Indeed, Rust is the
185+
only compiled language I know of which allows direct access to environment
186+
variables, so its not a thing that build systems *need* to take into account.
187+
Even Cargo - purpose built for building Rust programs - can't explicitly track
188+
what environment variables a piece of Rust code will access, and currently only
189+
has limited tools for tracking this.
190+
191+
# Unresolved questions
192+
[unresolved-questions]: #unresolved-questions
193+
194+
There are two unresolved questions in my mind:
195+
196+
One is how to handle non-UTF-8 environment variable names and values? The
197+
standard library has `std::env::var_os` to fetch the environment in OS-encoded
198+
form, but there are no corresponing macros for compile-time. So I think just
199+
restricting the logical to pure UTF-8 for names and values is fine.
200+
201+
The second is API extensions for procedural macros to make the logical
202+
environment available to them. This could either be done by adding new APIs, or
203+
perhaps some way to override the implementation of `std::env::var` in the
204+
procmacro.
205+
206+
# Future possibilities
207+
[future-possibilities]: #future-possibilities
208+
209+
This proposal is intended to be a minimum set of functionality to control the
210+
environment. It requires very explicit control - aside from `--env-clear`, all
211+
variables must be explicitly listed to remove or set their values.
212+
213+
A possible extension would be to allow regular expressions to select which
214+
variable names should be removed from the logical environment or passed through
215+
from the process environment.
216+
217+
Environment variables are frequently used for paths - a common pattern is:
218+
```
219+
include!(env!("SOME_PATH"))
220+
```
221+
If the variable `SOME_PATH` expands to a relative path, it is interpreted
222+
relative to the source file which is doing the include. This requires the
223+
variable to be set with an explicit knowledge of the source file layout. If it
224+
is being using from multiple files, then no one relative path can be made to
225+
work. In practice the only alternative is to always use absolute paths.
226+
227+
To address this, a possible extension would be `--env-set-path VAR=PATH` (and
228+
`--env-pass-path VAR`) where the value is interpreted as a path relative to the
229+
`rustc` current working directory - in other words, it could be set as a
230+
relative path, but it would be interpreted as if it were an absolute path.
231+
Absolute paths would always be treated as absolute.

0 commit comments

Comments
 (0)