-
Notifications
You must be signed in to change notification settings - Fork 402
Commit 8b5556a
authored
Rollup merge of #147932 - thaliaarchi:utf8-osstring, r=tgross35
Create UTF-8 version of `OsStr`/`OsString`
Implement a UTF-8 version of `OsStr`/`OsString`, in addition to the existing bytes and WTF-8 platform-dependent encodings.
This is applicable for several platforms, but I've currently only implemented it for Motor OS:
- WASI uses Unicode paths, but currently reexports the Unix bytes-assuming `OsStrExt`/`OsStringExt` traits.
- [wasi:filesystem](https://wa.dev/wasi:filesystem) APIs:
> Paths are passed as interface-type `strings`, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API.
- In [wasi-filesystem#17](WebAssembly/wasi-filesystem#17 (comment)), it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8.
> The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings.
>
> This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it.
- As of Redox OS [0.7.0](https://www.redox-os.org/news/release-0.7.0/), "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit [d331f72f](https://gitlab.redox-os.org/redox-os/kernel/-/commit/d331f72f2a51fa577072f24bc2587829fd87368b) (Use UTF-8 for all paths, 2021-02-14). Redox does not have `OsStrExt`/`OsStringExt`.
- Motor OS guarantees that its OS strings are UTF-8 in its [current `OsStrExt`/`OsStringExt` traits](https://github.com/moturus/rust/blob/a828ffcf5f04be5cdd91b5fad608102eabc17ec7/library/std/src/os/motor/ffi.rs), but they're still internally bytes like Unix.
This is an alternate approach to rust-lang/rust#147797, which reuses the existing bytes `OsString` and relies on the safety properties of `from_encoded_bytes_unchecked`. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity.
Note that Motor OS currently does not build until rust-lang/rust#147930 merges.
cc `@tgross35` (for earlier review)
cc `@alexcrichton,` `@rylev,` `@loganek` (for WASI)
cc `@lasiotus` (for Motor OS)
cc `@jackpot51` (for Redox OS)File tree
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedOpen diff view settings
Filter options
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedOpen diff view settings
0 commit comments