Skip to content

Consider specifying WTF-8 variant when creating WTF-8 string views #38

@wingo

Description

@wingo

Currently you can make a WTF-8 view on a string with string.as_wtf8 and read string contents by stringview_wtf8.encode $wtf8_policy, or indeed stringview_wtf8.slice (which doesn't take a policy). The intention is that you can process the WTF-8 contents of a string in a streaming way with a fixed-size buffer. However might it make sense to instead pass the policy argument to string.as_wtf8 ? Or in the spirit of #35, perhaps the names would be string.as_utf8, string.as_wtf8, string.as_lossy_utf8, all resulting in the stringview_wtf8 type.

I think the essential thing this allows you is to move when any trap/assertion might take place, for the strict UTF-8 variant, to the point where you create the view. An encode would never trap unless the memory is out of range.

For an implementation that doesn't use WTF-8 internally and which eagerly transcodes (substrings of) to WTF-8 when creating a stringview_wtf8, having the policy up-front would allow the policy to be applied when the view is created, and stringview_wtf8.encode becomes a simple memcpy. But, this might not be important. I don't know how viable this "MVP" kind of implementation will be in the long term -- perhaps breadcrumbs will be a comprehensively better solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions