Skip to content

It should be possible to get a string from a stringview #44

@wingo

Description

@wingo

Right now you can get a stringview from a string, but not a string from a stringview. We should change the proposal to ensure that you can go back to string from view.

The motivation is that if depending on the Opinions™ that a source language has about what a string consists of, you might want to make views the primary representation for a string.

For example Java/C#/JS/Dart/Kotlin, which consider a string to be any sequence of 16-bit code units, probably want to live in the WTF-16 world. When a string comes in from outside, you'll generally eagerly convert it to a stringview_wtf16, and then operate on it like that.

However there are some operations that are common between strings and which don't logically relate to the view, for example string.concat or string.eq or even string.new_wtf8 (which specifies an external encoding without necessarily caring about internal encoding). Right now if you just use stringview_wtf16 you don't have access to these. We need a way to go from view back to string.

There are three options that I see:

  1. Hang all the view functionality off of string. I.e. string.get_wtf16_codeunit, string.advance_wtf8, and so on.
  2. A string view is-a string. string.concat implicitly works on views. See Subtyping relationship between stringref and stringviews #3 and friends.
  3. A string view has-a string. There's an instruction for each kind of view that can get the string. (Whether the string would actually be held by reference or not is related to Is stringref a subtype of eqref? #20.)

I think that (1) would be fine if we were in a WTF-16-only situation -- both for source languages and implementations. If we keep the goal of allowing WTF-8 implementations and codepoint/WTF-8 access for source languages, having views does have the good property of making conversion costs explicit. Basically what @jakobkummerow said here: #12 (comment)

For (2) I am less up-to-date on what GC people are thinking -- is it assumed that upcasting always keeps the same value representation? I.e. casting from view to string is just a type question and doesn't generate any code? If this is the case then I think that rules out (2) in practice. No browser JS implementation has a native WTF-8 string representation. It would also rule out any stateful iterator view (for better or for worse; I am not married to that choice).

I think (3) is doable. FWIW currently the V8 implementation I had doesn't include a link from WTF-8 view to string, but it would be no big deal to add. Therefore I would propose to add:

(stringview_wtf8.as_string view:stringview_wtf8)
  -> str:stringref
(stringview_wtf16.as_string view:stringview_wtf16)
  -> str:stringref
(stringview_iter.as_string view:stringview_iter)
  -> str:stringref

Of course names could change; see #12.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions