Skip to content

Commit 8b5556a

Browse files
authored
Rollup merge of #147932 - thaliaarchi:utf8-osstring, r=tgross35
Create UTF-8 version of `OsStr`/`OsString` Implement a UTF-8 version of `OsStr`/`OsString`, in addition to the existing bytes and WTF-8 platform-dependent encodings. This is applicable for several platforms, but I've currently only implemented it for Motor OS: - WASI uses Unicode paths, but currently reexports the Unix bytes-assuming `OsStrExt`/`OsStringExt` traits. - [wasi:filesystem](https://wa.dev/wasi:filesystem) APIs: > Paths are passed as interface-type `strings`, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API. - In [wasi-filesystem#17](WebAssembly/wasi-filesystem#17 (comment)), it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8. > The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings. > > This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it. - As of Redox OS [0.7.0](https://www.redox-os.org/news/release-0.7.0/), "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit [d331f72f](https://gitlab.redox-os.org/redox-os/kernel/-/commit/d331f72f2a51fa577072f24bc2587829fd87368b) (Use UTF-8 for all paths, 2021-02-14). Redox does not have `OsStrExt`/`OsStringExt`. - Motor OS guarantees that its OS strings are UTF-8 in its [current `OsStrExt`/`OsStringExt` traits](https://github.com/moturus/rust/blob/a828ffcf5f04be5cdd91b5fad608102eabc17ec7/library/std/src/os/motor/ffi.rs), but they're still internally bytes like Unix. This is an alternate approach to rust-lang/rust#147797, which reuses the existing bytes `OsString` and relies on the safety properties of `from_encoded_bytes_unchecked`. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity. Note that Motor OS currently does not build until rust-lang/rust#147930 merges. cc `@tgross35` (for earlier review) cc `@alexcrichton,` `@rylev,` `@loganek` (for WASI) cc `@lasiotus` (for Motor OS) cc `@jackpot51` (for Redox OS)
2 parents 26fc9d8 + 376a755 commit 8b5556a

File tree

0 file changed

+0
-0
lines changed

    0 file changed

    +0
    -0
    lines changed

    0 commit comments

    Comments
     (0)