You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sufficiently advanced engines tend to have a specialized internal string representation for ASCII-only strings (V8 certainly does; I'm pretty sure other existing engines do too) [1]. There are also some common use cases where a string is being created that's known to be in ASCII range, such as number-to-string conversions. Of course it is possible to use string.new_utf8[_array] or string.new_wtf16[_array] in these situations, but they both require an engine to scan the memory/array for non-ASCII elements before deciding which kind of string to allocate and copy the characters into, which has significant overhead [2]. We could avoid this by adding instructions to create strings from ASCII data.
There's partial, but only partial, overlap of this suggestion and #51, insofar as number-to-string conversions are a use case that could benefit from either instruction set addition but is unlikely to benefit from both. That said, if we e.g. decide that integer-to-string is sufficiently common and standard (in the sense that everyone does it the same way) to warrant its own instruction, whereas float-to-string is sufficiently uncommon and/or language specific that we'll leave it up to languages to ship their own implementation for it, then the latter would still benefit from a string.new_ascii_array instruction. Also, there might well be common uses cases aside from number conversion that know on the producer side that they're creating ASCII strings.
I wouldn't mind adding such instructions to the MVP; I'm also fine with postponing them to a post-MVP follow-up.
[1] Strictly speaking, any form of "one-byte" string representation is relevant here, e.g. "Latin1"; ASCII is the lowest common denominator of these. In fact, in V8, our "one-byte" strings actually support the Latin1 range, yet I'm suggesting ASCII (i.e. character codes 0 through 127) for standardization here, because I believe that's the subset that maximizes the intersection of usefulness to applications and freedom of implementation choice to engines.
[2] To illustrate with specific numbers: on a particular microbenchmark I'm looking at, which converts 32-bit integers to strings, the score is 20 when I check for ASCII-only characters, and 27 (+35%) when I blindly copy i16 array elements to 2-byte string characters, which wastes memory. There may be potential for (minor?) improvements using SIMD instructions or similar for faster checking, but why bet on engine magic/heroics when it's so trivial to add a Wasm-level primitive that makes it easy and reliable to get high performance?
The text was updated successfully, but these errors were encountered:
Sufficiently advanced engines tend to have a specialized internal string representation for ASCII-only strings (V8 certainly does; I'm pretty sure other existing engines do too) [1]. There are also some common use cases where a string is being created that's known to be in ASCII range, such as number-to-string conversions. Of course it is possible to use
string.new_utf8[_array]
orstring.new_wtf16[_array]
in these situations, but they both require an engine to scan the memory/array for non-ASCII elements before deciding which kind of string to allocate and copy the characters into, which has significant overhead [2]. We could avoid this by adding instructions to create strings from ASCII data.There's partial, but only partial, overlap of this suggestion and #51, insofar as number-to-string conversions are a use case that could benefit from either instruction set addition but is unlikely to benefit from both. That said, if we e.g. decide that integer-to-string is sufficiently common and standard (in the sense that everyone does it the same way) to warrant its own instruction, whereas float-to-string is sufficiently uncommon and/or language specific that we'll leave it up to languages to ship their own implementation for it, then the latter would still benefit from a
string.new_ascii_array
instruction. Also, there might well be common uses cases aside from number conversion that know on the producer side that they're creating ASCII strings.I wouldn't mind adding such instructions to the MVP; I'm also fine with postponing them to a post-MVP follow-up.
[1] Strictly speaking, any form of "one-byte" string representation is relevant here, e.g. "Latin1"; ASCII is the lowest common denominator of these. In fact, in V8, our "one-byte" strings actually support the Latin1 range, yet I'm suggesting ASCII (i.e. character codes 0 through 127) for standardization here, because I believe that's the subset that maximizes the intersection of usefulness to applications and freedom of implementation choice to engines.
[2] To illustrate with specific numbers: on a particular microbenchmark I'm looking at, which converts 32-bit integers to strings, the score is 20 when I check for ASCII-only characters, and 27 (+35%) when I blindly copy i16 array elements to 2-byte string characters, which wastes memory. There may be potential for (minor?) improvements using SIMD instructions or similar for faster checking, but why bet on engine magic/heroics when it's so trivial to add a Wasm-level primitive that makes it easy and reliable to get high performance?
The text was updated successfully, but these errors were encountered: