|
| 1 | +- Feature Name: sandbox-environment |
| 2 | +- Start Date: 2019-10-26 |
| 3 | +- RFC PR: [rust-lang/rfcs#2794](https://github.com/rust-lang/rfcs/pull/2794) |
| 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This proposes a mechanism to precisely control what environment variables are |
| 10 | +available to Rust programs at compilation time. |
| 11 | + |
| 12 | +# Motivation |
| 13 | +[motivation]: #motivation |
| 14 | + |
| 15 | +Rust supports the `env!` and `option_env!` macros which allow Rust programs to |
| 16 | +query arbitrary process environment variables at compilation time. This is a |
| 17 | +very flexible mechanism to pass compile-time information to the program. |
| 18 | + |
| 19 | +However, in many cases it is too flexible. It poses several problems: |
| 20 | +1. Environment variables are generally not tracked by build systems, so changing |
| 21 | + a variable is not taken into account. Cargo has an ad-hoc mechanism for doing |
| 22 | + this in build scripts, but there's nothing to make this guaranteed correct |
| 23 | + (i.e. that all variables accessed are tracked). |
| 24 | +2. There's no easy way to audit which environment variables a crate accesses. |
| 25 | + This not only exacerbates the problem above, but it also means that |
| 26 | + potentially sensitive information in an environment variable can be |
| 27 | + incorporated into the compiled code. |
| 28 | +3. There's no way to override variables if they're needed by the build process |
| 29 | + itself. For example, the `PATH` variable likely needs to be set so that the |
| 30 | + compiler can execute its various components, but there's no way to override |
| 31 | + this so that `env!("PATH")` returns something else. This would be necessary |
| 32 | + where the compilation environment differs from the deployment environment |
| 33 | + (such as when cross-compiling). |
| 34 | + |
| 35 | +This RFC proposes a way to precisely coqntrol the environment visible to the |
| 36 | +compile-time macros, while defaulting to the current behaviour of making the |
| 37 | +entire environment available. |
| 38 | + |
| 39 | +# Guide-level explanation |
| 40 | +[guide-level-explanation]: #guide-level-explanation |
| 41 | + |
| 42 | +Rust implements the `env!()` and `option_env!()` macros to access the process |
| 43 | +environment variables at compilation time. `rustc` supports a number of |
| 44 | +command-line options to control the environment visible to the compiling code. |
| 45 | + |
| 46 | +By default all environment variables are available with their value taken from |
| 47 | +the process environment. However there are several command-line options to |
| 48 | +control this environment: |
| 49 | +- `--env-clear` - remove all process environment variables from the logical |
| 50 | + environment, leaving it completely empty. |
| 51 | +- `--env-remove VARIABLE` - Remove a specific variable from the logical |
| 52 | + environment. This is an exact match, and it does nothing if that variable is |
| 53 | + not set. |
| 54 | +- `--env-pass VARIABLE` - pass a variable from the process environment to the |
| 55 | + logical one, even if it had previously been removed. This lets specific |
| 56 | + variables to be allow-listed without having to explicitly set their value. The |
| 57 | + variable is ignored if it is not set or not utf-8 encoded. |
| 58 | +- `--env-set VARIABLE=VALUE` - Set a variable in the logical environment. This will |
| 59 | + either create a new variable, or override an existing value. |
| 60 | + |
| 61 | +The options are processed in the order listed above (ie, clear, then remove, |
| 62 | +then set). Multiple `--env-set` options affecting the same variable are |
| 63 | +processed in command-line order, so later ones override earlier ones. |
| 64 | + |
| 65 | +# Reference-level explanation |
| 66 | +[reference-level-explanation]: #reference-level-explanation |
| 67 | + |
| 68 | +The implementation of this RFC introduces the notion of: |
| 69 | +- the process environment which is inherited by the `rustc` process from it's |
| 70 | + invoker, and |
| 71 | +- a logical environment which is accessed by the `env!`/`option_env!` macros |
| 72 | + |
| 73 | +The logical environment is initialized from the complete process environment, |
| 74 | +excluding only environment variables which are not utf-8 encoded (name or |
| 75 | +value). |
| 76 | + |
| 77 | +Once initialized, the logical environment may be manipulated via the `--env-` |
| 78 | +command-line options described below. |
| 79 | + |
| 80 | +These environments are fundamentally key-value mappings, which is how they're |
| 81 | +represented within the session state - a map from `String` to `String`. |
| 82 | + |
| 83 | +## Processing of the options |
| 84 | + |
| 85 | +These options are processed in order: |
| 86 | +1. `--env-clear` - remove all variables from the logical environment. |
| 87 | +1. `--env-remove VAR` - remove a specific variable from the logical environment. |
| 88 | + May be specified multiple times. |
| 89 | +1. `--env-pass VAR`- set a variable in the logical environment from the process |
| 90 | + environment. Ignored if the variable is not set, or is not utf-8 encoded. |
| 91 | +1. `--env-set VAR=VALUE` - multiple `--env-set` options affecting the same variable are |
| 92 | + processed in command-line order, so later ones override earlier ones. |
| 93 | + |
| 94 | +`rustc` will only accept UTF-8 encoded command-line options, which affects all |
| 95 | +these options. This implies that all environment variables must have UTF-8 |
| 96 | +encoded names and values. |
| 97 | + |
| 98 | +## Compile-time behaviour |
| 99 | + |
| 100 | +The `env!()` and `option_env!()` macros only inspect the logical environment |
| 101 | +with no reference to the process environment. |
| 102 | + |
| 103 | +Note that this can't affect other environment accesses. For example, if a |
| 104 | +procedural macro uses the `std::env::var()` function as part of its |
| 105 | +implementation it will access the process environment. Any process that happens |
| 106 | +to be invoked by `rustc` would still see the original process environment, not |
| 107 | +the logical environment. |
| 108 | + |
| 109 | +## Cargo |
| 110 | + |
| 111 | +This has no direct effect on Cargo - it can completely ignore these options and |
| 112 | +the overall behaviour would be unchanged. However, it's easy to imagine a |
| 113 | +corresponding RFC for Cargo where it does more explicitly control the logical |
| 114 | +environment. For example, it could constrain the accessible variables to: |
| 115 | +1. ones that Cargo itself sets |
| 116 | +2. ones that the build script sets via `rustc-env` |
| 117 | +3. ones that the build script notes as `rerun-if-env-changed` |
| 118 | +4. explicitly listed in the `Cargo.toml` |
| 119 | + |
| 120 | +# Drawbacks |
| 121 | +[drawbacks]: #drawbacks |
| 122 | + |
| 123 | +The primary cost is additional complexity in invoking `rustc` itself, and |
| 124 | +additional complexity in documenting `env!`/`option_env!`. Procedual macros |
| 125 | +would need to be changed to access the logical environment, either by adding new |
| 126 | +environment access APIs, or overriding the implementation of `std::env::var` |
| 127 | +(etc) for procmacros. |
| 128 | + |
| 129 | +# Rationale and alternatives |
| 130 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 131 | + |
| 132 | +One alternative would be to simply do nothing, and leave things as-is. |
| 133 | + |
| 134 | +In a Unix/Linux-like system, the environment can be controlled either with the |
| 135 | +shell, or the `env` command. However this requires `rustc` to be invoked via a |
| 136 | +shell or the `env` command, which may not be convenient for a given build |
| 137 | +system. Alternatively, the buildsystem itself could be modified to suitably |
| 138 | +configure the environment. However, it would still be strictly less capable, as |
| 139 | +it would not be able to override variables or remove needed by: |
| 140 | +- `rustc` itself to run - such as `HOME`, `LD_PRELOAD` or `LD_LIBRARY_PATH` |
| 141 | +- `rustc` to invoke the linker, such as `PATH` |
| 142 | +- the linker for its own operation (`PATH`, and so on) |
| 143 | + |
| 144 | +This can be particularly awkward when it isn't clear which variables are needed by the toolchain - for example, |
| 145 | +invoking `rustc` via `rustup` uses a wider range of variables than directly invoking the `rustc` binary without |
| 146 | +an intermediary. |
| 147 | + |
| 148 | +This proposal gives maximal control when needed, without changing the default behaviours at all. |
| 149 | + |
| 150 | +When `rustc` is embedded or long-running, such as in `rls` or `rust-analyzer`, then its necessary to explicitly |
| 151 | +set the logical environment for each crate, rather than just inheriting the process environment. |
| 152 | + |
| 153 | +# Prior art |
| 154 | +[prior-art]: #prior-art |
| 155 | + |
| 156 | +C/C++ compilers typically have the ability set preprocessor macros via the |
| 157 | +command-line. They can be set to arbitrary values via the `-D` option. This is |
| 158 | +logically equivalent to both of Rust's mechanisms: |
| 159 | +- preprocessor macros can be used as predicates in compile-time conditional |
| 160 | + compilation tests, and |
| 161 | +- they can be expanded into the text of the program itself |
| 162 | + |
| 163 | +These macros are explicit on the command-line, so they're easy to take into |
| 164 | +account as an input to the compilation process. And the tools driving the C |
| 165 | +compiler don't need any addition way to control the process environment. |
| 166 | + |
| 167 | +Rust has a couple of mechanisms for compile-time configuration: |
| 168 | +- It has the `--cfg` options which set flags which can be tested with |
| 169 | + compile-time predicates. These are strictly binary choices, which allow for |
| 170 | + conditional complilation. |
| 171 | +- It has the process environment which can be queried at compile-time with |
| 172 | + `env!()` which evaluate to an arbitrary compile-time `&'static str` constant, |
| 173 | + which cannot be directly used for conditional compilation. The environment is |
| 174 | + not directly set via command-line options, but via another mechanism. |
| 175 | + |
| 176 | +This mechanism doesn't change the semantics of either mechanism, but it does |
| 177 | +make the environment a little more like a C preprocessor macro - they can be |
| 178 | +precisely set on the command line, and if desired, only via the command line. |
| 179 | + |
| 180 | +In general build systems need to have a precise knowledge of all inputs used to |
| 181 | +build a particular artifact. This is especially important when trying to |
| 182 | +implement fully reproducable builds, either for auditability reasons or just to |
| 183 | +get good hit rates from a build cache. Build systems don't take the environment |
| 184 | +into account because most of it isn't relevant to builds. Indeed, Rust is the |
| 185 | +only compiled language I know of which allows direct access to environment |
| 186 | +variables, so its not a thing that build systems *need* to take into account. |
| 187 | +Even Cargo - purpose built for building Rust programs - can't explicitly track |
| 188 | +what environment variables a piece of Rust code will access, and currently only |
| 189 | +has limited tools for tracking this. |
| 190 | + |
| 191 | +# Unresolved questions |
| 192 | +[unresolved-questions]: #unresolved-questions |
| 193 | + |
| 194 | +There are two unresolved questions in my mind: |
| 195 | + |
| 196 | +One is how to handle non-UTF-8 environment variable names and values? The |
| 197 | +standard library has `std::env::var_os` to fetch the environment in OS-encoded |
| 198 | +form, but there are no corresponing macros for compile-time. So I think just |
| 199 | +restricting the logical to pure UTF-8 for names and values is fine. |
| 200 | + |
| 201 | +The second is API extensions for procedural macros to make the logical |
| 202 | +environment available to them. This could either be done by adding new APIs, or |
| 203 | +perhaps some way to override the implementation of `std::env::var` in the |
| 204 | +procmacro. |
| 205 | + |
| 206 | +# Future possibilities |
| 207 | +[future-possibilities]: #future-possibilities |
| 208 | + |
| 209 | +This proposal is intended to be a minimum set of functionality to control the |
| 210 | +environment. It requires very explicit control - aside from `--env-clear`, all |
| 211 | +variables must be explicitly listed to remove or set their values. |
| 212 | + |
| 213 | +A possible extension would be to allow regular expressions to select which |
| 214 | +variable names should be removed from the logical environment or passed through |
| 215 | +from the process environment. |
| 216 | + |
| 217 | +Environment variables are frequently used for paths - a common pattern is: |
| 218 | +``` |
| 219 | +include!(env!("SOME_PATH")) |
| 220 | +``` |
| 221 | +If the variable `SOME_PATH` expands to a relative path, it is interpreted |
| 222 | +relative to the source file which is doing the include. This requires the |
| 223 | +variable to be set with an explicit knowledge of the source file layout. If it |
| 224 | +is being using from multiple files, then no one relative path can be made to |
| 225 | +work. In practice the only alternative is to always use absolute paths. |
| 226 | + |
| 227 | +To address this, a possible extension would be `--env-set-path VAR=PATH` (and |
| 228 | +`--env-pass-path VAR`) where the value is interpreted as a path relative to the |
| 229 | +`rustc` current working directory - in other words, it could be set as a |
| 230 | +relative path, but it would be interpreted as if it were an absolute path. |
| 231 | +Absolute paths would always be treated as absolute. |
0 commit comments