Implement strings in adapter modules (#4623) · jlb6740/wasmtime@650979a

Commit

Implement strings in adapter modules (bytecodealliance#4623)

* Implement strings in adapter modules

This commit is a hefty addition to Wasmtime's support for the component
model. This implements the final remaining type (in the current type
hierarchy) unimplemented in adapter module trampolines: strings. Strings
are the most complicated type to implement in adapter trampolines
because they are highly structured chunks of data in memory (according
to specific encodings). Additionally each lift/lower operation can
choose its own encoding for strings meaning that Wasmtime, the host, may
have to convert between any pairwise ordering of string encodings.

The `CanonicalABI.md` in the component-model repo in general specifies
all the fiddly bits of string encoding so there's not a ton of wiggle
room for Wasmtime to get creative. This PR largely "just" implements
that. The high-level architecture of this implementation is:

* Fused adapters are first identified to determine src/dst string
  encodings. This statically fixes what transcoding operation is being
  performed.

* The generated adapter will be responsible for managing calls to
  `realloc` and performing bounds checks. The adapter itself does not
  perform memory copies or validation of string contents, however.
  Instead each transcoding operation is modeled as an imported function
  into the adapter module.  This means that the adapter module
  dynamically, during compile time, determines what string transcoders
  are needed. Note that an imported transcoder is not only parameterized
  over the transcoding operation but additionally which memory is the
  source and which is the destination.

* The imported core wasm functions are modeled as a new
  `CoreDef::Transcoder` structure. These transcoders end up being small
  Cranelift-compiled trampolines. The Cranelift-compiled trampoline will
  load the actual base pointer of memory and add it to the relative
  pointers passed as function arguments. This trampoline then calls a
  transcoder "libcall" which enters Rust-defined functions for actual
  transcoding operations.

* Each possible transcoding operation is implemented in Rust with a
  unique name and a unique signature depending on the needs of the
  transcoder. I've tried to document inline what each transcoder does.

This means that the `Module::translate_string` in adapter modules is by
far the largest translation method. The main reason for this is due to
the management around calling the imported transcoder functions in the
face of validating string pointer/lengths and performing the dance of
`realloc`-vs-transcode at the right time. I've tried to ensure that each
individual case in transcoding is documented well enough to understand
what's going on as well.

Additionally in this PR is a full implementation in the host for the
`latin1+utf16` encoding which means that both lifting and lowering host
strings now works with this encoding.

Currently the implementation of each transcoder function is likely far
from optimal. Where possible I've leaned on the standard library itself
and for latin1-related things I'm leaning on the `encoding_rs` crate. I
initially tried to implement everything with `encoding_rs` but was
unable to uniformly do so easily. For now I settled on trying to get a
known-correct (even in the face of endianness) implementation for all of
these transcoders. If an when performance becomes an issue it should be
possible to implement more optimized versions of each of these
transcoding operations.

Testing this commit has been somewhat difficult and my general plan,
like with the `(list T)` type, is to rely heavily on fuzzing to cover
the various cases here. In this PR though I've added a simple test that
pushes some statically known strings through all the pairs of encodings
between source and destination. I've attempted to pick "interesting"
strings that one way or another stress the various paths in each
transcoding operation to ideally get full branch coverage there.
Additionally a suite of "negative" tests have also been added to ensure
that validity of encoding is actually checked.

* Fix a temporarily commented out case

* Fix wasmtime-runtime tests

* Update deny.toml configuration

* Add `BSD-3-Clause` for the `encoding_rs` crate
* Remove some unused licenses

* Add an exemption for `encoding_rs` for now

* Split up the `translate_string` method

Move out all the closures and package up captured state into smaller
lists of arguments.

* Test out-of-bounds for zero-length strings

Loading branch information

alexcrichton authored Aug 8, 2022

1 parent e6d339b commit 650979a

Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

crates/component-util/src/lib.rs

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -88,6 +88,8 @@ pub const REALLOC_AND_FREE: &str = r#"
  
            (param $new_size i32)

            (result i32)

            (local $ret i32)

            ;; Test if the old pointer is non-null

            local.get $old_ptr

            if

    @@ -101,8 +103,8 @@ pub const REALLOC_AND_FREE: &str = r#"
  
                    return

                end

                ;; ... otherwise this is unimplemented

                unreachable

                ;; otherwise fall through to allocate a new chunk which will later

                ;; copy data over

            end

            ;; align up `$last`

    @@ -121,6 +123,7 @@ pub const REALLOC_AND_FREE: &str = r#"
  
            ;; save the current value of `$last` as the return value

            global.get $last

            local.tee $ret

            ;; ensure anything necessary is set to valid data by spraying a bit

            ;; pattern that is invalid

    @@ -129,6 +132,16 @@ pub const REALLOC_AND_FREE: &str = r#"
  
            local.get $new_size

            memory.fill

            ;; If the old pointer is present then that means this was a reallocation

            ;; of an existing chunk which means the existing data must be copied.

            local.get $old_ptr

            if

                local.get $ret          ;; destination

                local.get $old_ptr      ;; source

                local.get $old_size     ;; size

                memory.copy

            end

            ;; bump our pointer

            (global.set $last

                (i32.add

0 comments on commit `650979a`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `650979a`

Commit

There are no files selected for viewing

0 comments on commit 650979a

0 comments on commit `650979a`