-
Notifications
You must be signed in to change notification settings - Fork 2
Consider changing entry points for string creations, return types and their names #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree that the views are odd, as of this comment, and would prefer a structure akin to this suggestion as well. Perhaps, could this be modeled as
so that a |
If I understand correctly, in Java you really want to be treating strings as WTF-16 all the time and anything else is a distraction. But there are other languages, Scheme or ML or Python for example won't care what encoding is used internally as long as there is by-codepoint access, and C and Rust will want WTF-8 but usually that is just as a stopping point to linear memory, and so on. So what good would splitting to In general I think that given any specific source language, there are parts of stringrefs that are not interesting. But there is common functionality, too. Should that common functionality be duplicated for each view? Probably not, right? All that said, I think @gkdn you highlight an issue that is well-described by @dcodeIO here also: #12 (comment) -- that either you decide to use e.g. |
To clarify, @gkdn and @dcodeIO, is this mostly an aesthetic/stylistic issue ("I find it odd"), or do you have specific concerns about efficiency of the current design? For the record:
|
Yes it is mostly stylistic and understanding the model (i.e. I find it odd). There will be some extra instructions though probably won't add much to code size at the end. The underlying problem is leaking this "view" concept to the resulting API deeply where that doesn't seem necessary nor very natural to consumer. re. a wtf16-based app dealing with utf8 from external source: IIUC, there are 2 approaches to this problem:
The optimization around both scenarios looks similarly complicated to me so that is why it is not clear to me how the current proposed modeling helps to the engine nor compilers in any way (like the wft-16 heavy app consuming utf8). In the ideal world (since I'm coming from OO perspective); there would be an inheritance like relation between 2 types and the factory methods would go to relevant subtypes and common methods would go to parent type. I proposed the |
To me it seems that the Ultimately, it doesn't really matter to me how things are named, views or not, but I'd prefer if we could avoid such asymmetries that lead to more back and forth between refs and views than necessary, as I expect the final result to be less efficient and less compact than what we'd have achieved if we had thought in terms of specialized string types from the start. |
@gkdn: To give another shot at explaining the model, follow this train of thought:
Further illustrations/examples on these points:
I'm not saying that things couldn't be designed differently, especially when looking at only one of these points at a time. But it does seem difficult to come up with a concept that doesn't regress at least one of the benefits that the current design provides. As a (counter-) example: the idea to have |
To make sure we are on the same page and I have the correct assumptions; If the module uses a stringview to represent their Strings, it doesn't matter if there is an explicit "view" concept or if they are called view or not, right? If it matches encoding, it will work directly and if it is not; it needs to delay the conversion (or other optimization) until it really needs to. If the module uses stringref to represent their Strings, then it gives the engine a potential hint to make the conversion (or create breadcrumbs etc.). And I'm thinking that you are thinking about the later scenario. And given that both kind of modules can exist (i.e. modules with stringrefs or eager stringviews); I'm not sure if the engine is left with many optimization options except delaying the conversion as late as it can until the actual API is called that requires the conversion. Anyway; this is mostly a speculation based on my limited perspective on the topic. If you think that the engines get meaning opportunities, you are the expert and I trust your judgement :) |
Yes, the benefits I described mostly apply when module producers choose to represent strings as But that's just a guess: data to be gained from experimentation may prove it right or wrong. In particular, I don't know what fraction of strings in a typical application can avoid conversions. Also, as mentioned before, in V8 (and probably other browser engines too), This is one aspect of the general theme that the proposed stringref design aims to support much more than the Java-in-the-browser use case, and if we only cared about that case we could indeed simplify the design by a lot. An incremental approach would be very interesting, but in particular something like the ref/view split is difficult to retrofit if we wanted to have an MVP version where it doesn't exist yet. We could conceivably drop the |
Currently,
stringref
is the main entry point of all string creations and related high level methods. On the other side, there is a bunch of specialized functions in the views. Once you have a view instance you cannot go back to stringref so one needs to keep stringref as the main reference to carry around. With that every specialized operation requires calling a view creation function to get the view and then calling the function on them likeget_codeunit
.I found this setup quite odd and asked why we don't simply have 2 types;
string_wtf8
andstring_wtf16
with conversion functions in between them. Jakob pointed out that it is designed this way so that we can keep/encourage a portable representation around strings.When I changed my mental model to think of
stringref
as the portable abstraction, I still find the current style odd. One callsnew_wtf16 or
new_wtf16_arrayetc. and ends up with
stringref` where you need to create a view for wtf16. This not very intuitive and the portable representation leaks in a lot places where one doesn't care.It feels a lot more natural to me to have
string_wtf8
andstring_wtf16
as the entry points with all necessary APIs (e.g.string_wtf16.new
,string_wtf16.encode
,string_wtf16.eq
etc.) and having an instruction likestring_wtf16.as_portable
that returns astring_portable
when you need to cross the module boundary. And it seems to me in this model, the engine could still perform the delayed encoding/decoding and other similar optimizations that can be done in the original design.This can also help with avoiding other naming inconsistencies around these types.
Anyway I wanted to move the discussion here and hear your thoughts.
The text was updated successfully, but these errors were encountered: